# Research

## Motivation

Our societies produce knowledge and data at an ever increasing pace. These knowledge and data are generated in an independent manner by autonomous individuals or companies. They are heterogeneous and their joint exploitation requires connecting them.

However, data and knowledge have to evolve, facing changes in what they represent, changes in the context in which they are used and connections to new data and knowledge sources. These sources are currently mostly maintained by hand. As they grow and get more interconnected, this becomes less sustainable. But if knowledge does not evolve, it will freeze leading to sure obsolescence [Euzenat2020a].

Beyond the production of knowledge on the semantic web and linked data, this problem applies to any domain in which knowledge is produced in a way usable by computers. For instance, smart cities or the internet of things produce a wealth of changing data. The knowledge about this data has to evolve continuously to remain up-to-date as new data sources are encountered and conditions are changing. Knowledge must evolve organically with the life of its users.

This problem lies in the lack of autonomous evolution of heterogeneous knowledge. No one waits for knowledge to be perfect before using it and agents and societies cannot be interrupted for upgrading their knowledge. Hence, knowledge has to be situated, i.e. considered with respect to its use (called situation), and evolve continuously, i.e. without interruption.

## Objectives

mOeX addresses the seamless evolution of knowledge representations in individuals and populations. The question at the core of our proposal is to understand how to make knowledge representation continuously evolve in presence of environment changes and new knowledge sources. Currently, no satisfactory solution to this problem exists.

To tackle this problem, we start from two specific hypotheses:

1. Knowledge is shaped by both experience and communication,
2. which act as selective pressure.
More precisely, knowledge is subject to internal constraints, imposed by logical coherence, and external constraints, imposed by the environment and communication with others.

Abstract context.

Based on such hypotheses, we study populations of agents sharing knowledge through interaction. The interactions may be carried out through precisely specified modalities (which may involve direct knowledge exchange, talking, acting together or in presence). After interacting, when they discover that constraints have changed, agents will not relearn knowledge from scratch. Instead, adaptation operators, taking into account the current knowledge and other constraints, will adapt it to the new constraints. We study how knowledge evolve when these populations:

1. experience changes in the environment in which they operate, or
2. encounter other populations with which they have to communicate.
Our goal is to establish the properties satisfied by the resulting knowledge at the scale of a population of agents. Hence, mOeX aims at establishing the global properties achieved by local operators.

Pressure on knowledge and evolution in the face of disruptive events.

The highly difficult problem is not to have procedures allowing such agents to converge towards a common state of knowledge, but to characterise this state by the properties satisfied by the resulting knowledge. Such properties may, for instance, be:

• Agents converge to a common knowledge representation (a convergence property).
• Agents converge towards different but compatible (logically consistent) knowledge (a logical epistemic property), or towards closer knowledge (a metric epistemic property).
• That under the threat of a changing environment, agents which have operators that preserve diverse knowledge recover faster from the changes than those which have operators that converge towards a single representation (a differential property under environment change).
Moreover, this has to be guaranteed in the long term involving both environment change and population encounter. Hence, it is critical that convergence towards a particular state does not turn into a handicap when the environment changes. This means that a delicate balance has to be found between adapting optimally to the current situation and interlocutors and evolvability in the long run.

What is radically new here is that these problems are approached from the standpoint of the resulting knowledge representations. mOeX work will contribute to answer the following questions:

• How can populations with different knowledge (representation) communicate?
• How is their representation influenced by their environment and communication with others?
• How may knowledge diversity be preserved and is it useful?
In their whole generality, such questions apply to human beings, possibly animals, as well as software agents. We propose to study them chiefly in a well-controlled computer science context.

Our ambition is to spark a new approach to knowledge evolution that we call cultural knowledge evolution. It designs, studies, and experiments with mechanisms for making knowledge representations serendipitously evolve through their use. This should enable developing and sharing complex knowledge in a more robust way.

Now is the right time to start such a research programme: on the one hand, developments on the semantic web provide us with proven knowledge representation formalisms and tools which have been designed for sharing knowledge; on the other hand, work on experimental cultural evolution provides a solid methodology for carrying out this type of research. This approach has not been applied yet to knowledge representation directly. Both fields are mature enough to be associated.

## Approach

To investigate the foundations of situated knowledge evolution we need an approach that:

• is general enough so that results can encompass various cases;
• is flexible enough so that many specific settings may be established;
• provides an explicit representation of knowledge in order to communicate it to others;
• allows for continuous local adaptation depending on the situation;
• allows for both theoretical and experimental work.
No single approach of the state of the art offers all these features.

Thus, mOeX will develop the unique combination of knowledge representation and experimental cultural evolution methods. Knowledge representation provides formal models of knowledge; experimental cultural evolution provides a well-defined framework for studying situated evolution. We do not intend to replace symbolic representation, but to complement it.

The reasons why these approaches are well adapted are the following:

• Agents usually cannot wait for slow processes to terminate: they will apply adaptation operators allowing them to communicate;
• Agents need an explicit representation of knowledge in order to communicate it to others;
• Agents do not need that all knowledge is correct before acting;
• Agents do not have a global view of other agents' knowledge: they need a distributed solution with partial knowledge.

Our methodology involves the following three tasks interacting together in a constant feedback:

• Designing games and knowledge adaptation operators;
• Performing experiments for testing potential properties of such operators;
• Analytically proving such properties or their conditions.
Thus, mOeX adopts a dual theoretical and experimental strategy: determining which adaptation operators provide expected knowledge properties and designing actual mechanisms such that results can be assessed experimentally. Both approaches cross-fertilise: theory by suggesting adequate operators and properties to test and experiments by providing results to explain.

Methodological approach.

Finally, in order to ensure the repeatability and reusability of experiments we aim at developing a software platform to support this approach.

## Results

Our cultural knowledge evolution work first applied to alignment evolution before being extended to ontology evolution. Experiments have revealed that, by playing simple interaction games, agents can effectively repair random networks of ontologies or even create new alignments and ontologies.

### Cultural alignment repair through evolution experiments

Alignments between ontologies may be established through agents holding such ontologies attempting at communicating and taking appropriate actions when communication fails. We have tested this approach on alignment repair, i.e. the improvement of incorrect alignments. For that purpose, we performed a series of experiments in which agents react to mistakes in alignments. Agents may use ontology alignments to communicate when they represent knowledge with different ontologies: alignments help reclassifying objects from one ontology to the other. Such alignments may be provided by dedicated algorithms [Da Silva 2020a], but their accuracy is far from satisfying. Yet agents have to proceed. Agents only know about their ontologies and alignments with others and they act in a fully decentralised way. They can take advantage of their experience in order to evolve alignments: upon communication failure, they will adapt the alignments to avoid reproducing the same mistake.

The alignment repair game setting.

Such repair experiments have been performed [Euzenat 2014c] and revealed that, by playing simple interaction games, agents can effectively repair random networks of ontologies.

Comparison of different adaptation operators on F-measure (dashed) and success rate (plain).

#### Expansion and relaxation modalities for cultural alignment repair

We repeated these experiments and, using new measures, showed that the quality of previous results was underestimated. We introduced new adaptation operators (refine, addjoin and refadd) that improve those previously considered (delete, replace and add). We also allowed agents to go beyond the initial operators in two ways [Euzenat 2017a]: they can generate new correspondences when they discard incorrect ones, and they can provide less precise answers. The combination of these modalities satisfy the following properties:

1. agents still converge to a state in which no mistake occurs,
2. they achieve results far closer to the correct alignments than previously found,
3. they reach again 100% precision and coherent alignments.

#### Strengthening modality for cultural alignment repair

The results above show 100% precision for all adaptation operators, i.e. all the correspondences in the alignments were correct, but were still missing some correspondences, and did not achieve 100% recall. We had conjectured that this was due to a phenomenon called reverse shadowing [Euzenat 2017a], avoiding to find specific correspondences.

We introduced a new adaptation modality, strengthening, to test this hypothesis. The strengthening modality replaces a successful correspondence by one of its subsumed correspondences covering the current instance. This modality is different from those developed so far, because it leads agents to adapt their alignment when the game played has been a success (previously, it was always when a failure occurred). We defined three alternative definitions of this modality depending on if the agent chooses the most general, most specific or a random such correspondence.

We experimentally showed that it was not interferring with the other modalities as soon as the add operator was used. This means that all properties of the previous adaptation operators are preserved. Moreover, as expected, recall was greatly increased, to the point that some operators achieve 99% F-measure. However, the agents still do not reach 100% recall.

#### Starting with empty alignments in cultural alignment repair

The work on expansion suggests that, with the expansion modality, agents could develop alignments from scratch. We explored the use of expanding repair operators for that purpose. When starting from empty alignments, agents fail to create them as they have nothing to repair. Hence, we introduced the capability for agents to risk adding new correspondences when no existing one is useful [Euzenat 2017b]. We compared and discussed the results provided by this modality and showed that, due to this generative capability, agents reach better results than without it in terms of the accuracy of their alignments. When starting with empty alignments, alignments reach the same quality level as when starting with random alignments, thus providing a reliable way for agents to build alignment from scratch through communication. The evolution curves of both approaches (random and empty alignments), passed a starting phase in which figures correspond to this initial conditions, superimpose nearly exactly. This comfort a posteriori the experiments with random initialisation.

#### Populations

To adopt a population standpoint on experimental cultural evolution, we introduced the concept of population within the experiments. So far, a population is characterised as a set of agents sharing the same ontology. Such agents play the same alignment repair games as before with agents of other populations.

Alignment synchronisation by a population.

The notion of population enables to experiment with different transmission mechanisms: social transmission, in which culture spreads among agents of the same population, and wide transmission, in which it spreads across populations. We implemented explicit social transmission through a synchronisation procedure: at a given interval, agents of the same population exchange their knowledge, i.e. alignments. Each population builds a consensus and agents integrate the consensus into their local knowledge. The consensus consisting of merging alignments, may be obtained by vote or by preserving the most specific or most general correspondences.

Initially it was hypothesised that such knowledge transmission can help agent achieve faster convergence, but the results suggest otherwise. Agents regularly replace their (repaired) local alignment by the (conservative) population consensus which slows down the convergence significantly.

We have proposed three different ways in which agents can be critical towards the population consensus: They will adopt the consensus either based on a probability law, based on the distance between the consensus and their local alignment or based on the memory that the agent have of discarded correspondences. We tested all three approaches and found that agents can achieve faster convergence and produce alignments of comparable quality.

#### Evolving learned ontologies

So far our experiments in cultural knowledge evolution dealt with adapting alignments. However, agent knowledge is primarily represented in their ontologies which may also be adapted. In order to study ontology evolution, we designed a two-stage experiment in which:

1. agents learn ontologies based on examples of decision making, and
2. then they interactively compare their decisions on different objects and adapt their ontologies when they disagree.
This framework may be instantiated in many ways. We used decision tree induction as learning method and an approximation of ontology accuracy to guide adaptation.

#### Knowledge transmission across agent generations

In cultural evolution, transmission is achieved in different ways [
Cavalli-Sforza 1981a]: vertical transmission, in which culture spreads, like genes, from parents to sibblings, oblique transmission, in which it spreads, like by education, from agents from the generation of parents to agent of the next generation, and horizontal transmission, in which it spreads among all members of a population.

Knowledge transmission occurring between agents of the same generation, as described above, is considered as horizontal transmission. Other work has shown that variation generated through vertical, or inter-generation, transmission allows agents to exceed that level [Acerbi 2006a]. Such results have been obtained under the drastic selection of (less than 20%) agents allowed to transmit their knowledge or introducing artificial noise during transmission.

In order to study the impact of such measures on the quality of transmitted knowledge, we combined the settings of these two previous work and relaxed these assumptions (no strong selection of teachers, no fully correct seed, no introduction of artificial noise). Under this setting, we confirmed that vertical transmission improves on horizontal transmission even without drastic selection and oriented learning. We also showed that horizontal transmission is able to compensate for the lack of parent selection if it is maintained for long enough [Bourahla 2022a].

#### Pluripotent agents

The work on cultural knowledge evolution reported above concentrated on agents performing a single task. This is not a natural condition, thus we are developing agents able to carry out several tasks and to adapt their knowledge with the same protocol. We studied the impact of having different tasks on the knowledge elaborated by agents performing such tasks. We showed that agents tackling several tasks are more successful than their single task counterparts: the higher the number of tasks, the higher the average accuracy demonstrated by a population of agents.

#### Modelling the alignment repair game in dynamic epistemic logic

The Alignment Repair Game (ARG) has been proposed for agents to simultaneously communicate and repair their alignments through adaptation operators when communication failures occur [Euzenat 2017a]. ARG was evaluated experimentally and the experiments showed that agents converge towards successful communication and improve their alignments. However, the logical properties of such operators, i.e. whether they are formally correct, complete or redundant, could not be established by experiments. We introduced Dynamic Epistemic Ontology Logic (DEOL) to answer these questions. It allows us [van den Berg 2020a, 2021a] (1) to express the ontologies and alignments used, (2) to model the ARG adaptation operators through announcements and conservative upgrades and (3) to formally establish the correctness, partial redundancy and incompleteness of the adaptation operators in ARG.

These results raise interesting issues about how closely a multi-agent process should be modelled logically (indeed, it is possible to make agents closer to the logic or try to have a logical model closer to the agents). They also open the perspective to model cultural (knowledge) evolution as a whole with dynamic epistemic logics [van den Berg 2021b].

#### Signature awareness in dynamic epistemic logic

In the DEOL modelling, agents are aware of the vocabulary that the other agents may use (we call this Public signature awareness). However, assuming that agents are fully aware of each other's signatures prevents them from adapting their vocabularies to newly gained information, from the environment or learned through agent communication. Therefore this is not realistic for open multi-agent systems. We have proposed a novel way to model awareness with partial valuations [van den Berg 2020b]. Partial Dynamic Epistemic Logic allows agents to use their own vocabularies to reason and talk about the world. These vocabularies may be extended through a new modality for raising awareness. We gave a first view on defining the dynamics of raising awareness on this framework. We also started investigating an associated forgetting operator [van den Berg 2020d].

Cultural evolution may be studied at a macro' level, inspired from population dynamics, or at a micro' level, inspired from genetics. The replicator-interactor model generalises the genotype-phenotype distinction of genetic evolution. We considered how it can be applied to cultural knowledge evolution experiments [Euzenat 2019a]. More specifically, we consider knowledge as the replicator and the behaviour it induces as the interactor. We showed that this requires to address problems concerning transmission. We discussed the introduction of horizontal transmission within the replicator-interactor model and/or differential reproduction within cultural evolution experiments.

Comparison of the replicator-interactor scheme (left) and the architecture of adaptive agents (right).
 Publications on cultural knowledge evolution

We are pursuing our work on link keys for data interlinking [Atencia 2021a] in two specific directions:

• Link key extraction in relation with formal concept analysis.

Link key processing work flow involving extraction and reasoning.

#### Tableau method for $\mathcal{ALC}$+LK reasoning

Link keys can also be thought of as axioms in a description logic. As such, they can contribute to infer ABox axioms, such as links, terminological axioms, or other link keys. This has important practical applications, such as link key inference, link key consistency and link key redundancy checking. Yet, no reasoning support existed for link keys.

We previously extended the tableau method designed for the $\mathcal{ALC}$ description logic to support reasoning with link keys in $\mathcal{ALC}$ [Gmati 2016a]. We showed how this extension enables combining link keys with classical terminological reasoning with and without ABox and TBox and generating non-trivial link keys. We further extended the method and have proven that this extended method terminates, is sound, complete, and that its complexity is 2ExpTime [Atencia 2019d].

We have also designed a worst-case optimal tableau algorithm, based on compressed tableau and directed by rule application, for reasoning in the description logic $\mathcal{ALC}$ with individuals and link keys. This algorithm is sound, complete, and has exponential complexity. A proof of concept for this algorithm has been implemented and evaluations were carried out. These evaluations show the importance of reasoning for the task of data interlinking. In particular, it is possible to use the reasoner, during link key extraction, to discard link keys introducing inconsistencies.

#### Link key extraction with relational concept analysis

A first method has been designed to extract and select link keys from two classes which deals with multiple values but not object values [Atencia 2014b]. Moreover, the extraction step has been rephrased in formal concept analysis (FCA) allowing to generate link keys across relational tables [Atencia 2014d].

Two concept lattices corresponding to circularly dependent link keys (generated by Linkky).

We have extended this latter work so that it can deal with multiple object values when the data set is cycle free. This encoding does not necessarily generate the optimal link keys. Hence, we use relational concept analysis (RCA), an extension of FCA taking relations between concepts into account [Atencia 2019z]. We showed that a new expression of this problem is able to extract the optimal link keys even in the presence of cyclic dependencies. Moreover, the proposed process does not require information about the alignments of the ontologies to find out from which pairs of classes to extract link keys.

We implemented these methods and evaluated them by reproducing the experiments made in previous studies [Vizzini 2017a]. This shows that the method extracts the expected results as well as (also expected) scalability issues.

#### Fixed-point semantics for a reduced version of relational concept analysis

We have used relational concept analysis (RCA) to extract link keys. This led us to notice that when there exist circular dependencies between objects, it extracts a unique stable concept lattice family grounded on the initial formal contexts. However, other stable families may exist whose structure depends on the same relational context. These may be useful in applications that need to extract a richer structure than the minimal grounded one.

We first considered this issue in a reduced version of RCA, which only retains the relational structure. We redefined the semantics of RCA in terms of concept lattice families closed by a fixed-point operation induced by this relational structure. We showed that these families admit a least and greatest fixed point and that the well-grounded RCA semantics is characterised by the least fixed point [Euzenat 2021a]. We then characterised the interesting lattices as the self-supported fixed points. We provided an algorithm to compute the greatest fixed point (dual to the RCA algorithm) and discussed strategies to extract all self-supported fixed points [Atencia 2021b].

#### Link key extraction with pattern structures

We also used pattern structures, an extension of formal concept analysis (FCA) with ordered structures, to reformulate the simple link key extraction problem [Abbas 2019a]. The strategies used to automatically discover link keys are not able to select the pair of classes on which the link key candidate applies. Indeed, a link key may be relevant for some pair of classes and not relevant for another. Then, discovering link keys for one pair of classes at a time may be computationally expensive if all pairs have to be considered. To overcome this limitation, we have introduced a specific and original pattern structure where link keys can be discovered in one pass while specifying the pair of classes associated with each link key, focusing on the discovery process and allowing more flexibility [Abbas 2020a]. This approach has been implemented in the linkex tool.

#### Strategies for identifying high-quality link keys

We also introduced the notion of non-redundant link key candidate [Abbas 2021b]. Some link keys are redundant from an extensional standpoint when the closure of the sameAs relation on the links they induce is the same as another link key.

#### Link key extraction under ontological constraints

We investigated the use of link keys taking advantage of ontologies. This can be carried out in two different directions: exploiting the ontologies under which data sets are published, and extracting link keys using ontology constructors for combining attribute and class names.

Following the first approach, we extended our existing algorithms to extract link keys involving inverse (-1), union (⊔), intersection (⊓) and paths (˙) of properties. This helps providing link keys when it is not possible otherwise (without inverse, there is no possible correspondence if one data set is using parents and the other is using children). We showed how the paths could be normalised to reduce the search space. Extracting link keys under these conditions required to introduce better indexing techniques to avoid unnecessary link key generation and even looping.

For certain data sets, it may be necessary to use several link keys, even on the same pair of classes, for retrieving a more complete link set. We introduced operators to combine link keys over the same pair of classes, investigated their relations and extended measures to evaluate their quality.

We specifically proposed strategies to extract disjunctions from RDF data and apply existing quality measures to evaluate them. We experimented with these strategies showing their benefits [
Atencia 2019c].