Cross-language information retrieval by reduced k- means (CROSBI ID 256412)
Prilog u časopisu | izvorni znanstveni rad | međunarodna recenzija
Podaci o odgovornosti
Dobša, Jasminka ; Mladenić, Dunja ; Rupnik, Jan ; Radošević, Danijel ; Magdalenić, Ivan
engleski
Cross-language information retrieval by reduced k- means
Cross-language information retrieval aims at retrieving relevant documents in one language for a query set in another language. Here we propose a new approach to the problem of cross-language information retrieval based on factorization of a term-document matrix by an iterative method of Reduced k-means clustering. Method of Reduced k- means intended at simultaneous reduction of objects (documents) and variables (index terms). Proposed method is compared to standard machine learning techniques of cross-language information retrieval by usage of latent semantic indexing and canonical correlation analysis. Motivation for usage of Reduced k-means method for a task of cross-language information retrieval comes from an observation that documents in a semantic space obtained by method of latent semantic indexing are clustered by their language and not by their topics in the first place. As Reduced k-means aims at preserving clustering structure of data, the idea is that the proposed method could address the mentioned problem.
cross-language information retrieval, dimensionality reduction, latent semantic indexing, canonical correlation analysis, Reduced k-means
nije evidentirano
nije evidentirano
nije evidentirano
nije evidentirano
nije evidentirano
nije evidentirano
Podaci o izdanju
10 (1)
2018.
314-322
objavljeno
2150-7988