Building and using comparable corpora for domain-specific bilingual lexicon extraction (CROSBI ID 581482)
Prilog sa skupa u zborniku | izvorni znanstveni rad | međunarodna recenzija
Podaci o odgovornosti
Fišer, Darja ; Ljubešić, Nikola ; Vintar, Špela ; Pollak, Senja
engleski
Building and using comparable corpora for domain-specific bilingual lexicon extraction
This paper presents a series of experiments aimed at inducing and evaluating domain- specific bilingual lexica from comparable corpora. First, a small English-Slovene comparable corpus from health magazines was manually constructed and then used to compile a large comparable corpus on health-related topics from web corpora. Next, a bilingual lexicon for the domain was extracted from the corpus by comparing context vectors in the two languages. Evaluation of the results shows that a 2-way translation of context vectors significantly improves precision of the extracted translation equivalents. We also show that it is sufficient to increase the corpus for one language in order to obtain a higher recall, and that the increase of the number of new words is linear in the size of the corpus. Finally, we demonstrate that by lowering the frequency threshold for context vectors, the drop in precision is much slower than the increase of recall.
comparable corpora; bilingual lexicon extraction; domain lexicons
nije evidentirano
nije evidentirano
nije evidentirano
nije evidentirano
nije evidentirano
nije evidentirano
Podaci o prilogu
19-26.
2011.
objavljeno
Podaci o matičnoj publikaciji
Portland (ME): Association for Computational Linguistics (ACL)
Podaci o skupu
predavanje
24.07.2011-24.07.2011
Portland (OR), Sjedinjene Američke Države