Ontologies for research processes

PhD position / Sujet de thèse

Description of experimental and simulation settings is key to interpretation and reproducibility of scientific results. However, they are not currently described in a way that would make them exploitable automatically. We aim to define representations of scientific processes enabling their query, analysis, comparison and reproduction.

Research reliability relies partly on data recording and communication. Although data is important, it is not less important to record and publish the processes that led to the production of this data. These data may be collected through confirmatory experiments, simulations or evaluations. In order to be useful, process descriptions must refer to many facets of the process such as hypotheses, code and model, parameters, measure collected.

Recording such processes in a relatively formal way brings many opportunities:

Reproducibility: automatic process rerun and data re-analysis
Repurposability: production of new processes by modifying the description [Werner et. al., 2024];
Presentation: automatic generation of process reports;
Collection: aggregating experiment descriptions for retrieving, querying and comparing them [Euzenat, 2022]. Ideally, it will be possible to generate a meta-analysis on a specific topic from a set of descriptions.

This contributes to the objective to make research data Findable, Interoperable, Accessible and Reproducible, i.e. FAIR [Wilkinson et. al., 2016].

We aim at developing formal descriptions of research processes that enable this. The goal of this thesis proposal is to design, develop and evaluate descriptions expressed with relevant `semantic' technologies.

Defining ontologies for research process description, using existing ontologies and field ontologies should help answering such queries.

For that purpose, it will be necessary to identify generic experiment life cycles, based on design, execution, analysis corresponding to as many stages of experiments and to leverage on semantic technologies (RDF, OWL, SPARQL). Semantic models (ontologies) that match a general notion of experiment will have to be designed. They can take inspiration and build on existing generic ontologies:

frbr: for documenting work of thought;
prov: for recording the provenance of resources;
researchobjects: for describing research artefacts;
i-adopt: for describing scientific variables;
etc.

and existing efforts:

ontologies for representing experiment protocols [Giraldo et. al., 2017],;
ODD for describing agent-based simulations [Grimm et. al., 2020];
the COMSES computational model library [Rollins et. al., 2014];
etc.

Simulation of various (human) processes in a territory digital twin will constitute a central use case for the project. Digital twins may be used to measure or to simulate phenomena occurring in the actual artifact; this is called experiments here. It is necessary that such assessment many be properly indexed so as to to guarantee their highest usability: that they can be retrieved on various criteria. In the specific case of digital twins, it can also be used to compare effects of simulations to what happens in the actual process.

However, developed technologies should be sufficiently general to apply to other use cases. We plan to consider other fields, such as machine learning for weather forecast and experiments in cultural evolution [Bourahla et. al., 2021].

The proposal relies on ontological modelling but it would benefit from epistemological thinking about the nature and role of scientific processes.

References:

[Bourahla et. al., 2021] Yasser Bourahla, Manuel Atencia, Jérôme Euzenat, Knowledge improvement and diversity under interaction-driven adaptation of learned ontologies, Proc. 20th AAMAS, London (UK), pp242-250, 2021 https://moex.inria.fr/files/papers/bourahla2021a.pdf
[Euzenat, 2022] Jérôme Euzenat, Beyond reproduction, experiments want to be understood, in: Proc. 2nd workshop on Scientific knowledge: representation, discovery, and assessment (SciK), Lyon (FR), pp774-778, 2022 https://moex.inria.fr/files/papers/euzenat2022a.pdf
[Giraldo et. al., 2017] Olga Giraldo, Alexander Garcia, Federico Lopez, Oscar Corcho. 2017. Using semantics for representing experimental protocols. Journal of Biomedical Semantics 8 (2017), 52. https://doi.org/10.1186/s13326-017-0160-y
[Grimm et. al., 2020] Volker Grimm, Steven Railsback, Christian Vincenot, Uta Bergee, Cara Gallagher, Donald DeAngelis, Bruce Edmonds, Jiaqi Ge, Jarl Giske, Jürgen Groeneveld, Alice Johnston, Alexander Milles, Jacob Nabe-Nielsen, J. Gareth Polhill, Viktoriia Radchuk, Marie-Sophie Rohwäder, Richard A. Stillman, Jan C. Thiele, Daniel Ayllón, The ODD Protocol for Describing Agent-Based and Other Simulation Models: A Second Update to Improve Clarity, Replication, and Structural Realism, Journal of Artificial societies and social simulation 23(2):7, 2020 https://www.jasss.org/23/2/7.html
[Rollins et. al., 2014] Nathan Rollins, Michael Barton, Sean Bergin, Marco Janssen, Allen Lee, A Computational Model Library for publishing model documentation and code, Environmental modelling and software 61.4:59–64, 2014 https://doi.org/10.1016/j.envsoft.2014.06.022
[Werner et. al., 2024] Luisa Werner, Pierre Genevès, Nabil Layaïda, Jérôme Euzenat, Damien Graux, Reproduce, replicate, reevaluate: the long but safe way to extend machine learning methods, in: Proc. 38th AAAI Conference on Artificial Intelligence (AAAI), Vancouver (CA), pp15850-15858, 2024 https://moex.inria.fr/files/papers/werner2024a.pdf
[Wilkinson et. al., 2016] Mark Wilkinson, Michel Dumontier, IJsbrand Jan Aalbersberg, Gabrielle Appleton, Myles Axton, Arie Baak, Niklas Blomberg, Jan-Willem Boiten, Luiz Bonino da Silva Santos, Philip Bourne, Jildau Bouwman, Anthony Brookes, Tim Clark, Mercè Crosas, Ingrid Dillo, Olivier Dumon, Scott Edmunds, Chris Evelo, Richard Finkers, Alejandra Gonzalez-Beltran, Alasdair Gray, Paul Groth, Carole Goble, Jeffrey Grethe, Jaap Heringa, Peter A.C ’t Hoen, Rob Hooft, Tobias Kuhn, Ruben Kok, Joost Kok, Scott Lusher, Maryann Martone, Albert Mons, Abel Packer, Bengt Persson, Philippe Rocca-Serra, Marco Roos, Rene van Schaik, Susanna-Assunta Sansone, Erik Schultes, Thierry Sengstag, Ted Slater, George Strawn, Morris Swertz, Mark Thompson, Johan van der Lei, Erik van Mulligen, Jan Velterop, Andra Waagmeester, Peter Wittenburg, Katherine Wolstencroft, Jun Zhao, Barend Mons, The FAIR Guiding Principles for scientific data management and stewardship, Scientific data 3:160018, 2016 https://doi.org/10.1038/sdata.2016.18

Links:

mOeX web site: https://moex.inria.fr
Experiment repository: https://sake.re

Qualification: Master or equivalent in computer science.

Researched skills:

Curiosity and openness.
Interaction with other researchers.
Autonomous researcher.
Interests in epistemology or the methodology of sciences.
Innovative.

Doctoral school: MSTII, Université Grenoble Alpes.

Advisor: The thesis will be advised by Cássia Trojahn dos Santos (Cassia:Trojahn-dos-Santos#univ-grenoble-alpes.fr) and Maxime Collomb (maxime:colomb#ign:fr, LASTIG).

Group: The work will be carried out in the mOeX team common to INRIA & LIG. mOeX is dedicated to study knowledge evolution through adaptation. It gathers researchers which have taken an active part these past 15 years in the development of the semantic web and more specifically ontology matching and data interlinking.

Place of work: The position is located at INRIA Montbonnot (near Grenoble) a main computer science research lab, in a stimulating research environment.

Hiring date: October 2026.

Funding and employer: The project is funded by the JUNN projet. The employer will be INRIA; the candidate will be subject to ZRR clearance.

Duration: 36 months

Deadline: as soon as possible.

Contact: For further information, contact us.

Procedure: Contact us and apply to offer 2026-10110. See also here.

File: Provide Vitæ, motivation letter and references. It is very good if you can provide a Master report and we will ask for your marks in Master, so if you have them, you can join them.