Towards Obtaining High Quality Sentence-Aligned English-Croatian Parallel Corpus (CROSBI ID 573587)
Prilog sa skupa u zborniku | izvorni znanstveni rad | međunarodna recenzija
Podaci o odgovornosti
Brkić, Marija ; Matetić, Maja ; Seljan, Sanja
engleski
Towards Obtaining High Quality Sentence-Aligned English-Croatian Parallel Corpus
This paper presents the acquisition of parallel bilingual corpus and all the steps involved in the process of unsupervised sentence alignment, such as tokenization, lowercasing, etc. The problem of sentence alignment is not trivial because translators do not necessarily translate one sentence in the source language into one sentence in the target language. Three different unsupervised and language independent approaches to sentence alignment are presented and implementations of these approaches through three different freely available tools are tested. A gold standard for English-Croatian automatic sentence alignment evaluation is created. Finally, a detailed analysis of the acquired corpus is given.
Sentence alignment ; alignment tools ; sentence alignment evaluation ; parallel corpus ; sentence-length ; word-correspondence
nije evidentirano
nije evidentirano
nije evidentirano
nije evidentirano
nije evidentirano
nije evidentirano
Podaci o prilogu
1068-1070.
2011.
objavljeno
Podaci o matičnoj publikaciji
Podaci o skupu
predavanje
10.06.2011-12.06.2011
Sichuan, Kina