Сross-language identity resolution and approaches to its solution

Author(s):

Zinaida Apanovich
Dmitry Cherepanov
Alexander Marchuk

Abstract:

This paper describes approaches to the vocabulary normalization and cross-language identity resolution problems that arise when the LOD datasets are used to populate the content of scholarly knowledge bases. We have proposed several new heuristics, using additional information extracted from the full text sources of data. The first heuristics uses the full record track of a person, the second uses self-citation networks and the third uses the textual analysis of documents. The dataset of the Open Archive of the Russian Academy of Sciences and several bibliographic datasets are used as test examples.

Keywords:

Linked Open Data
SPARQL
ontology alignment
cross-language identity resolution
self-citation network
tf-idf
LDA

DOI:

10.31144/bncc.cs.2542-1972.2014.n37.p41-54

Issue

Computer Science. — 2014 . — # 37.

Pages:

41-54

File:

sross.pdf (641.98 KB)