Link key extraction and relational concept analysis

PhD position / Sujet de thèse

The goal of the semantic web is to take advantage of formalised knowledge at the scale of the worldwide web. This has led to the release of a vast quantity of data expressed in semantic web formalisms (RDF) [Heath 2011a]. Part of the added value of linked data lies in the links identifying the same entity in different data sets as it allows for making inference between data sets. For instance, they may identify the same books and articles in different bibliographical data sources. So finding the manifestation of the same entity across several data sets is an important task of linked data.

One way of identifying entities is to use link keys which are a generalisation of keys usually found in data bases to several data sets. A link key [Atencia 2014b] is a statement of the form:

( {⟨p1, q1⟩,... ⟨pn, qn⟩} link key ⟨c, d⟩ )
stating that whatever an instance of the class c has the same values for properties p1,... pn as an instance of class d has for properties q1,... qn, then these two are the same entity. For example, it may be that a instance of the class Livre is equivalent to an instance of the class Novel as soon as their properties auteur and titre on the one side and creator and title on the other side have the same values.

Formal concept analysis (FCA) is a technique to extract concepts between two interdependent ordered sets [Ganter 1999a]. It as been used for infering database keys by providing the dependencies between maximal sets of attributes and the partitions of the data that they generate. We provided the generalisation needed for database link keys [Atencia 2014d]. For RDF link keys there are several issues:

These features require further investigating the approach. We already have considered Relational Concept Analysis [Hacene 2013a] which can extract dependent concepts descriptions from data.

The goal of the thesis is to fully develop relational concept analysis and associated techniques to extract link keys. Relational concept analysis proved effective, however, there remain varied tracks for further investigation, such as:


[Atencia 2014b] Manuel Atencia, Jérôme David, Jérôme Euzenat, Data interlinking through robust link key extraction, Proc. 21st ECAI, Prague (CK), pp15-20, 2014
[Atencia 2014c] Manuel Atencia, Michel Chein, Madalina Croitoru, Michel Chein, Jérôme David, Michel Leclère, Nathalie Pernelle, Fatiha Saïs, François Scharffe, Danai Symeonidou, Defining key semantics for the RDF datasets: experiments and evaluations, in: Proc. 21st ICCS, Iasi (RO), pp65-78, 2014
[Atencia 2014d] Manuel Atencia, Jérôme David, Jérôme Euzenat, What can FCA do for database link key extraction?, Proc. ECAI workshop on "what can FCA do for AI?", Prague (CK), 2014
[Ganter 1999a] Bernhard Ganter, Rudolf Wille, Formal concept analysis: mathematical foundations, Springer, Berlin (DE), 1999
[Heath 2011a] Tom Heath and Christian Bizer, Linked Data: Evolving the Web into a Global Data Space, Morgan & Claypool, 2011
[Hacene 2013a] Mohamed Rouane Hacene, Marianne Huchard, Amedeo Napoli, Petko Valtchev, Relational concept analysis: mining concept lattices from multi-relational data, Annals of Mathematics and Artificial Intelligence


Qualification: Master or equivalent in computer science.

Researched skills:

Doctoral school: IAEM, Université de Lorraine

Advisor: Amedeo Napoli (Amedeo:Napoli#loria:fr) and Jérôme David (Jerome:David#inria:fr).

Group: The work will be carried out within the ANR Elker project aiming at developing link keys. It will be held jointly between the Orpailleur team (LORIA) and the mOeX (LIG) teams.

Place of work: The position is located at INRIA Nancy Grand-Est, near Nancy, a main computer science research lab, in a stimulating research environment.

Hiring date: Fourth quarter 2017 (October 1st in principle).

Duration: 36 months

Salary: From 1600 EUR/month (benefits included, net before income tax), i.e., 2000 EUR/month gross.

Contact: the advisors.

Procedure: Contact us.

File: Provide Vitæ, motivation letter and references. It is very good if you can provide a Master report and we will ask for your marks in Master, so if you have them, you can join them.