Our societies produce knowledge and data at an ever increasing pace. These knowledge and data are generated in an independent manner by autonomous individuals or companies. They are heterogeneous and their joint exploitation requires connecting them.
However, data and knowledge have to evolve, facing changes in what they represent, changes in the context in which they are used and connections to new data and knowledge sources. These sources are currently mostly maintained by hand. As they grow and get more interconnected, this becomes less sustainable. But if knowledge does not evolve, it will freeze leading to sure obsolescence.
Beyond the production of knowledge on the semantic web and linked data, this problem applies to any domain in which knowledge is produced in a way usable by computers. For instance, smart cities or the internet of things produce a wealth of changing data. The knowledge about this data has to evolve continuously to remain up-to-date as new data sources are encountered and conditions are changing. Knowledge must evolve organically with the life of its users.
This problem lies in the lack of autonomous evolution of heterogeneous knowledge. No one waits for knowledge to be perfect before using it and agents and societies cannot be interrupted for upgrading their knowledge. Hence, knowledge has to be situated, i.e. considered with respect to its use (called situation), and evolve continuously, i.e. without interruption.
mOeX addresses the seamless evolution of knowledge representations in individuals and populations. The question at the core of our proposal is to understand how to make knowledge representation continuously evolve in presence of environment changes and new knowledge sources. Currently, no satisfactory solution to this problem exists.
To tackle this problem, we start from two specific hypotheses:
Based on such hypotheses, we study populations of agents sharing knowledge through interaction. The interactions may be carried out through precisely specified modalities (which may involve direct knowledge exchange, talking, acting together or in presence). After interacting, when they discover that constraints have changed, agents will not relearn knowledge from scratch. Instead, adaptation operators, taking into account the current knowledge and other constraints, will adapt it to the new constraints. We study how knowledge evolve when these populations:
The highly difficult problem is not to have procedures allowing such agents to converge towards a common state of knowledge, but to characterise this state by the properties satisfied by the resulting knowledge. Such properties may, for instance, be:
What is radically new here is that these problems are approached from the standpoint of the resulting knowledge representations. mOeX work will contribute to answer the following questions:
Our ambition is to spark a new approach to knowledge evolution that we call cultural knowledge evolution. It designs, studies, and experiments with mechanisms for making knowledge representations serendipitously evolve through their use. This should enable developing and sharing complex knowledge in a more robust way.
Now is the right time to start such a research programme: on the one hand, developments on the semantic web provide us with proven knowledge representation formalisms and tools which have been designed for sharing knowledge; on the other hand, work on experimental cultural evolution provides a solid methodology for carrying out this type of research. This approach has not been applied yet to knowledge representation directly. Both fields are mature enough to be associated.
To investigate the foundations of situated knowledge evolution we need an approach that:
Thus, mOeX will develop the unique combination of knowledge representation and experimental cultural evolution methods. Knowledge representation provides formal models of knowledge; experimental cultural evolution provides a well-defined framework for studying situated evolution. We do not intend to replace symbolic representation, but to complement it.
The reasons why these approaches are well adapted are the following:
Our methodology involves the following three tasks interacting together in a constant feedback:
Finally, in order to ensure the repeatability and reusability of experiments we aim at developing a software platform to support this approach.
Our cultural knowledge evolution work currently focusses on alignment evolution. Such repair experiments have been revealed that, by playing simple interaction games, agents can effectively repair random networks of ontologies or even create new alignments.
Alignments between ontologies may be established through agents holding such ontologies attempting at communicating and taking appropriate actions when communication fails. We have tested this approach on alignment repair, i.e. the improvement of incorrect alignments. For that purpose, we performed a series of experiments in which agents react to mistakes in alignments. Agents may use ontology alignments to communicate when they represent knowledge with different ontologies: alignments help reclassifying objects from one ontology to the other. Such alignments may be provided by dedicated algorithms [Da Silva 2017a], but their accuracy is far from satisfying. Yet agents have to proceed. Agents only know about their ontologies and alignments with others and they act in a fully decentralised way. They can take advantage of their experience in order to evolve alignments: upon communication failure, they will adapt the alignments to avoid reproducing the same mistake.
Such repair experiments have been performed [Euzenat 2014c] and revealed that, by playing simple interaction games, agents can effectively repair random networks of ontologies.
We repeated these experiments and, using new measures, showed that the quality of previous results was underestimated. We introduced new adaptation operators (refine, addjoin and refadd) that improve those previously considered (delete, replace and add). We also allowed agents to go beyond the initial operators in two ways [Euzenat 2017a]: they can generate new correspondences when they discard incorrect ones, and they can provide less precise answers. The combination of these modalities satisfy the following properties:
The results above show 100% precision for all adaptation operators, i.e. all the correspondences in the alignments were correct, but were still missing some correspondences, and did not achieve 100% recall. We had conjectured that this was due to a phenomenon called reverse shadowing [Euzenat 2017a], avoiding to find specific correspondences.
We introduced a new adaptation modality, strengthening, to test this hypothesis. The strengthening modality replaces a successful correspondence by one of its subsumed correspondences covering the current instance. This modality is different from those developed so far, because it leads agents to adapt their alignment when the game played has been a success (previously, it was always when a failure occurred). We defined three alternative definitions of this modality depending on if the agent chooses the most general, most specific or a random such correspondence.
We experimentally showed that it was not interferring with the other modalities as soon as the add operator was used. This means that all properties of the previous adaptation operators are preserved. Moreover, as expected, recall was greatly increased, to the point that some operators achieve 99% F-measure. However, the agents still do not reach 100% recall.
The work on expansion suggests that, with the expansion modality, agents could develop alignments from scratch. We explored the use of expanding repair operators for that purpose. When starting from empty alignments, agents fail to create them as they have nothing to repair. Hence, we introduced the capability for agents to risk adding new correspondences when no existing one is useful [Euzenat 2017b]. We compared and discussed the results provided by this modality and showed that, due to this generative capability, agents reach better results than without it in terms of the accuracy of their alignments. When starting with empty alignments, alignments reach the same quality level as when starting with random alignments, thus providing a reliable way for agents to build alignment from scratch through communication. The evolution curves of both approaches (random and empty alignments), passed a starting phase in which figures correspond to this initial conditions, superimpose nearly exactly. This comfort a posteriori the experiments with random initialisation.The benchmarks, results and software are available at http://lazylav.gforge.inria.fr.
At that stage, we have developed a workable methodology and tools to investigate cultural knowledge evolution and specifically alignment repair. We have gathered a wide range of adaptation operators and modalities. We can work towards experimenting how operators and modalities can be selected by agents. We can also consider how they may be linked to 'cultural values' and what consequences this entails.
|Publications on cultural knowledge evolution|
We are continuing our work on link keys for data interlinking in two specific directions:
Link keys can also be thought of as axioms in a description logic. As such, they can contribute to infer ABox axioms, such as links, or terminological axioms, and other link keys. We have extended the tableau method designed for the ALC description logic to support reasoning with link keys in ALC [Gmati 2016a]. We have proven that this extended method is sound, complete and that it always terminates.
A first method has been designed to extract and select link keys from two classes which deals with multiple values but not object values [Atencia 2014b]. Moreover, the extraction step has been rephrased in formal concept analysis (FCA) allowing to generate link keys across relational tables [Atencia 2014d].
We have extended this latter work so that it can deal with multiple object values when the data set is cycle free. This encoding does not necessarily generate the optimal link keys. Hence, we use relational concept analysis (RCA), an extension of FCA taking relations between concepts into account. We show that a new expression of this problem is able to extract the optimal link keys even in the presence of cyclic dependencies. Moreover, the proposed process does not require information about the alignments of the ontologies to find out from which pairs of classes to extract link keys.
We implemented these methods and evaluated them by reproducing the experiments made in previous studies [Vizzini 2017a]. This shows that the method extracts the expected results as well as (also expected) scalability issues.
We investigated the use of link keys taking advantage of ontologies. This can be carried out in two different directions: exploiting the ontologies under which data sets are published, and extracting link keys using ontology constructors for combining attribute and class names.
Following the first approach, we extended our existing algorithms to extract link keys involving inverse (
|Publications on data interlinking|
Initial project proposal (2016): pdf
Activity report 2017: pdf, html;
Publications: our paper section (from which references are taken)
mOeX is building on top of the results of the Exmo project whose pages may provide some background information on previous work.