Abstract
This paper describes approaches to the vocabulary normalization and cross-language identity resolution problems that arise when the LOD datasets are used to populate the content of scholarly knowledge bases. We have proposed several new heuristics, using additional information extracted from the full text sources of data. The first heuristics uses the full record track of a person, the second uses self-citation networks and the third uses the textual analysis of documents. The dataset of the Open Archive of the Russian Academy of Sciences and several bibliographic datasets are used as test examples.
Keywords
DOI
10.31144/bncc.cs.2542-1972.2014.n37.p41-54
File
sross.pdf641.98 KB
Pages
41-54