Jérôme Euzenat, Yasser Bourahla, 08/2022
This notebook contains code and results for the paper 'Measuring and controlling diversity'.
If you see this notebook as a simple HTML page, then it has been generated by the notebook found in this archive.
This is not a maintained software package, technically just a notebook. But if you want to use it, feel free to do it under the MIT License.
Here are the 7 distributions of the paper (a,b,c,d,e,f,g) of 5 ontologies (A, B, C, D, E) among 10 agents. They are encoded as arrays.
We provide three extra distributions (h, i, j), for the sake of trying.
The distances between the 5 ontologies are coded into arrays.
So there are no program connection between knowledge distance and diversity.
Such measures may be found in:
The code for computing various diversity measures is provided here. It could be made a separate Python library if needed, but this is not yet the case.
They implement a signature: diversity( distrib, dissimilarity ): float
These are:
structdist
: computes the average distance between the categories of the distribution;calcdiam
: computes the diameter of the distribution;median
: computes the median of the distribution.The entropy-based diversity measures are provided into two favours:
entropy
(additional parameter q
): compute the generalised entropy-based diversity measure. This is the initial naïve version;diversity
(additional parameter q
): a better implemented version of diversity-based entropy which also includes the implementation of the limit case $q=1$.The normalised versions are included but must be used with care as they are only correct if the maximal value is given by equi-distributed distributions (which is not necessary the case).
Finally the results to be found in Table 2 of the paper are gathered here.
These results include, in addition of those submitted:
Here is a tentative to induce a partial order from the order of diversity.
The algorithm is quite simple:
Note: Tom Leinster mentions that he restricts this to $q\geq 0$ (for reasons he does not explain, but which are discussed on page 121 of his book).
The result is as follows:
We start with a distribution and generate distributions with lower diversity. Ideally, it should be possible to start with a high diversity distribution. Then we want to achieve some levels of diversity. This is always with respect to a specific diversity measure.
For that purpose, the algorithm modifies the distribution one agent at a time. It does it so that the diversity decreases minimally at each stage (this is local).
It can be called by
selectdistribs( [2,2,2,2,2], unstructdist, 4 )
which will provide a sequence of 4 distributions evenly spread (from the standpoint of the diversity of the non structured distance and $q=2$), from the [2,2,2,2,2]
distribution.
It returns the distributions and their (non normalised) diversity level.
The result is:
Something interesting in these modest results: