Bootstrapping Bilingual Lexicons from Comparable Corpora for Closely Related Languages (CROSBI ID 45034)
Prilog u knjizi | izvorni znanstveni rad
Podaci o odgovornosti
Ljubešić, Nikola ; Fišer, Darja
engleski
Bootstrapping Bilingual Lexicons from Comparable Corpora for Closely Related Languages
In this paper we present an approach to bootstrap a Croatian- Slovene bilingual lexicon from comparable news corpora from scratch, without relying on any external bilingual knowledge resource. Instead of using a dictionary to translate context vectors, we build a seed lexicon from identical words in both languages and extend it with context-based cognates and translation candidates of the most frequent words. By enlarging the seed dictionary for only 7% we were able to improve the baseline precision from 0.597 to 0.731 on the mean reciprocal rank for the ten top-ranking translation candidates with a 50.4% recall on the gold standard of 500 entries.
comparable corpora, bilingual lexicon extraction, bootstrapping
nije evidentirano
nije evidentirano
nije evidentirano
nije evidentirano
nije evidentirano
nije evidentirano
Podaci o prilogu
91-98.
objavljeno
Podaci o knjizi
Text, Speech and Dialogue
Habernal, Ivan ; Matoušek, Václav
Berlin : Heidelberg: Springer
2011.
978-3-642-23537-5