Cross-lingual alignment and completion of Wikipedia templates

by Gosse Bouma, Sergio Duarte Torres and Zahurul Islam. 

For many languages, the size of Wikipedia is an order of magnitude smaller than the English Wikipedia. We present a method for cross-lingual alignment of template and infobox attributes in Wikipedia. The alignment is used to add and complete templates and infoboxes in one language with information derived from Wikipedia in another language. We show that alignment between English and Dutch Wikipedia is accurate and that the result can be used to expand the number of template attribute-value pairs in Dutch Wikipedia by 50%. Furthermore, the alignment provides valuable information for normalization of template and attribute names and can be used to detect potential inconsistencies. Read the paper.

Wikipedia entity retrieval for Dutch and Spanish

by Gosse Bouma and Sergio Duarte Torres.

We developed two systems (for Dutch and Spanish) for the GikiCLEF task, in which Wikipedia pages have to be found that match a description in natural language. We concentrated on linguistic analysis of the query, for mapping the question onto the most relevant Wikipedia categories, and for extracting additional constraints that matching pages have to satisfy. In addition, for Spanish we experimented with query expansion for improved recall of the IR process. In both the Dutch and Spanish system we tried to incorporate additional knowledge sources (WordNet, Yago, DbPedia) for better question analysis and retrieval results. The Dutch system obtained a GikiCLEF score of 2.5 (7th overall and 7th for Dutch). The Spanish system was still under development at the time of the official evaluation, and performed poorly. We show that the completed system would have performed well at the 2009 task. Read the paper.