There are many ways to assess diversity in a population. We aim at implementing a normalised entropy-based measure that takes into account the similarity between categories.

The diversity of many phenomena is worth measuring because diversity is often related to fairness and sometimes to resilience. It is also good to measure how our actions increase or decrease diversity. We are interested in knowledge diversity: the fact that different people may held different standpoints [1]. However, the considerations below apply to many more diversity measures.

Measuring diversity amounts to associate a number to the knowledge diversity of a population of agents. We only consider agent populations of the same size whose individuals are distributed in different categories. Diversity may be considered from three different dimensions:

- Variety: how many categories of individuals are represented? (insects, molluscs, vertebrate vs. insects, molluscs)
- Balance: how many representatives of each category are there? (1 vertebrates, 99 insects vs 50 vertebrate, 50 insects)
- Disparity: how different are these categories? (33 ants, 33 bugs, 33 grasshoppers vs 33 rabbits, 33 ants, 33 snakes)

We implemented such measure families in Python [3] and they provide the expected results.

In principle, it is interesting to normalise such measures, i.e. to scale them to the [0 1] interval.
This can be easily achieved by dividing all diversity values by the maximum such value.
The problem is to determine this maximum value.
With non similarity-based measures, this is easy: maximal diversity is obtained when categories are equally represented, i.e. the distribution is equiprobable.
However, as can be observed on the notebook [3], with similarity-based entropy measures the equiprobable distribution is not necessarily the maximum.
[2, Section 6.3] discusses this and presents a useful theorem, that however, this maximum is independent from *q* (so it has to be computed only once).
The sketch of an algorithm is also provided.

The goal of the project is to implement the proposed algorithm in order to compute normalised measures.

This should require some knowledge on Python programming and linear algebra.

The work could be expected to unfold as follows:

- Studying of Leinster algorithm to understand what is needed.
- Designing a possible implementation.
- Implementing it.
- Testing it.

**References:**

[1] Yasser Bourahla, Jérôme David, Jérôme Euzenat, Meryem Naciri, Measuring and controlling knowledge diversity, Proc. 1st JOWO workshop on formal models of knowledge diversity (FMKD), Jönköping (SE), 2022 https://moex.inria.fr/files/papers/bourahla2022c.pdf

[2] Tom Leinster, Entropy and diversity: the axiomatic approach, Cambridge university press, Cambridge (UK), 2021
https://arxiv.org/pdf/2012.02113.pdf

**Links:**

[3] Knowledge diversity notebook: https://moex.inria.fr/software/kdiv/index.html

[4] mOeX web site: https://moex.inria.fr

**Advisor:**
Jérôme Euzenat (Jerome:Euzenat#inria:fr).

**Team:** The work will be carried out in the mOeX team common to
INRIA & Université Grenoble Alpes.
mOeX is dedicated to study knowledge evolution through adaptation.
.

**Laboratory:**
LIG.

**Place of work:** The position is located
at INRIA Grenoble Rhône-Alpes, Montbonnot
(near Grenoble, France) a main computer science research lab, in a
stimulating research environment.

**Procedure:**
Contact us and provide vitæ and possibly motivation letter and references.