MOSIG Master 2ND YEAR Research
YEAR 2020—2021

Relational concept analysis through constraint satisfaction

Master topic / Sujet de master recherche

Formal concept analysis, extracting concepts found in a data set, can be expressed as a constraint satisfaction problem. Relational concept analysis deals simultaneously with several interrelated data sets. We aim at answering if it, or which part of it, can be expressed with constraints.

Formal concept analysis (FCA, [Ganter 1999]) is a data mining approach that identifies concepts within a data set. More precisely, from a set of data items, e.g. Researchers, identified by attributes, e.g. topic, age, it finds out concepts which are characterized by the set of objects sharing a set of attributes, e.g. Researchers in artificial intelligence of less than 25 years old.

Relational concept analysis (RCA, [Rouane-Hacene 2013]) enables to find interdependent concepts within related data sets, i.e. that Researchers are also characterised by the Papers they wrote and the Labs they belong to which, in turns, depend on other Researchers. For that purpose, it splits the data in related datasets (here Researchers, Papers, Labs) which are processed individually by formal concept analysis. The resulting concepts are then injected back to the initial FCA problems with the help of 'scaling operators' and formal concept analysis is performed again on these problems, for instance new attributes would be: has-some-members-as-Reserachers-in-AI-of-less-than-25-years-old, or wrote-all-her-papers-with-AI-Researchers. This process is iterated until it reaches a fixed point, i.e. until no more concepts are added.

We use such techniques, in the ANR Elker project, to extract sameAs links in RDF data sets [Atencia 2014, 2019]

A recent trend is to express data mining algorithms as constraint satisfaction problems (CSP) [Bessiere 2017, Guns 2017]. Such problems can then be solved automatically through techniques known as constraint programming. In that context, each solution of the constraint satisfaction problem corresponds to a concept for the corresponding data mining problem. This has been achieved for formal concept analysis whose principles may be expressed in a few constraints.

The goal of this work is to find out if it is possible to express relational concept analysis as a constraint satisfaction problem. For datasets without cycles, the answer should be positive as it simply consists in performing several FCA steps in a row. It however remains to express the mapping between both problems.

More interestingly, relational concept analysis is designed to work when there are cyclic dependencies between data sets and concepts, that a Researcher belongs to a Lab whose members are Researchers. This is what requires fixed point computation. This is not an obstacle to using constraint satisfaction since FCA is already the computation of a fixed point. However, this requires to be cleanly defined.

Hence, the contribution of this work would be to answer positively or negatively the raised question: is it possible or not to express relational concept analysis to constraint satisfaction. There is an option to answer this question partially, e.g. YES for acyclic RCA and NO for cyclic RCA, or to develop specific techniques to precise cases of RCA. The question could be investigated theoretically, by designing a constraint satisfaction problem equivalent to RCA, or practically, by finding encodings for such problems to CSP.

Answering this question, beside providing the opportunity to use constraint programming to perform relational concept analysis, would lead to new questions such as 'Can relational concept analysis be then reduced to formal concept analysis?' and especially provide an encoding for that.

References:

[Atencia 2014] Manuel Atencia, Jérôme David, Jérôme Euzenat, What can FCA do for database link key extraction?, Proc. ECAI workshop on "what can FCA do for AI?", Prague (CK), 2014
[Atencia 2019] Manuel Atencia, Jérôme David, Jérôme Euzenat, Amedeo Napoli, Jérémy Vizzini, Link key candidate extraction with relational concept analysis, Discrete applied mathematics, 2019
[Bessiere 2017] Christian Bessiere, Luc De Raedt, Tias Guns, Lars Kotthoff, Mirco Nanni, Siegfried Nijssen, Barry O'Sullivan, Anastasia Paparrizou, Dino Pedreschi, Helmut Simonis, The Inductive Constraint Programming Loop, IEEE Intelligent systems 32(5):44-52, 2017
[Ganter 1999] Bernhard Ganter, Rudolf Wille, Formal concept analysis: mathematical foundations, Springer, Berlin (DE), 1999
[Guns 2017] Tias Guns, Anton Dries, Siegfried Nijssen, Guido Tack, Luc De Raedt, MiningZinc: A declarative framework for constraint-based mining, Artificial intelligence 244:6-29, 2017
[Hacene 2013a] Mohamed Rouane Hacene, Marianne Huchard, Amedeo Napoli, Petko Valtchev, Relational concept analysis: mining concept lattices from multi-relational data, Annals of Mathematics and Artificial Intelligence

Links:

mOeX web site: https://moex.inria.fr
Elker web site: https://project.inria.fr/elker/

Reference number: Proposal n°2751

Master profile: M2R MOSIG, M2R MSIAM, M2R Informatics.

Advisor: Jérôme Euzenat (Jerome:Euzenat#inria:fr) and Jérôme David (Jerome:David#inria:fr).

Team: The work will be carried out in the mOeX team common to INRIA & Université Grenoble Alpes. mOeX is dedicated to study knowledge evolution through adaptation. It gather permanent researchers from the Exmo team which has taken an active part these past 15 years in the development of the semantic web and more specifically ontology matching.

Laboratory: LIG.

Procedure: Contact us and provide vitæ and possibly motivation letter and references.