<?xml version="1.0" encoding="UTF-8" standalone="yes"?><oembed><version><![CDATA[1.0]]></version><provider_name><![CDATA[Azimuth]]></provider_name><provider_url><![CDATA[https://johncarlosbaez.wordpress.com]]></provider_url><author_name><![CDATA[John Baez]]></author_name><author_url><![CDATA[https://johncarlosbaez.wordpress.com/author/johncarlosbaez/]]></author_url><title><![CDATA[Measuring Biodiversity]]></title><type><![CDATA[link]]></type><html><![CDATA[<p><i>guest post by <b><a href="http://www.maths.ed.ac.uk/~tl/">Tom Leinster</a></b></i></p>
<p>Even if there weren&#8217;t a <a href="http://www.azimuthproject.org/azimuth/show/Biodiversity">global biodiversity crisis</a>, we&#8217;d  want to know how to put a number on biodiversity. As Lord Kelvin famously put it:</p>
<blockquote>
<p>When you can measure what you are speaking about, and express it in numbers, you know something about it; but when you cannot measure it, your knowledge is of a meagre and unsatisfactory kind: it may be the beginning of knowledge, but you have scarcely, in your thoughts, advanced to the stage of <i>science</i>.</p>
</blockquote>
<p>In this post, I&#8217;ll talk about what happens when you take a mass of biological data and try to turn it into a single <i>number</i>, intended to measure biodiversity.</p>
<p>There have been more than 50 years of debate about how to measure diversity.  While the idea of putting a number on biological diversity goes back to the 1940s at least, the debate really seems to have got going in the wake of  pioneering work by the great ecologist <a href="http://en.wikipedia.org/wiki/Robert_Whittaker">Robert Whittaker</a> in the 1960s.</p>
<p>There followed several decades in which progress was made&#8230; but there was a lot of talking at cross-purposes. In fact, there was so much confusion that some people gave up on the diversity concept altogether. The mood is summed up by the title of an excellent and much-cited paper of <a href="http://www.bio.sdsu.edu/pub/stuart/stuart.html">Stuart Hurlbert</a>:</p>
<p>&bull; S. H. Hurlbert, The nonconcept of species diversity: A critique and alternative parameters. <i>Ecology</i> <b>52</b>:577&ndash;586, 1971.</p>
<p>So why all the confusion?</p>
<p>One reason is that the word &#8220;diversity&#8221; is used by different people in many different ways.  We all know that diversity is important: so if you found a quantity that seemed to measure biological variation in a sensible way, you might be tempted to call it &#8220;diversity&#8221; and publish a paper promoting your quantity over all other quantities that have ever been given that name.  There are literally dozens of measures of diversity in the literature.  Here are two simple ones:</p>
<ul>
<li>
<i>Species richness</i> is simply the number of species in the community concerned.
</li>
<li>
The <i>Shannon entropy</i> is <img src='https://s0.wp.com/latex.php?latex=-%5Csum_%7Bi+%3D+1%7D%5ES+p_i+%5Clog%28p_i%29&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='-&#92;sum_{i = 1}^S p_i &#92;log(p_i)' title='-&#92;sum_{i = 1}^S p_i &#92;log(p_i)' class='latex' />, where our community consists of <img src='https://s0.wp.com/latex.php?latex=S&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='S' title='S' class='latex' /> species in proportions <img src='https://s0.wp.com/latex.php?latex=p_1%2C+%5Cldots%2C+p_S&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='p_1, &#92;ldots, p_S' title='p_1, &#92;ldots, p_S' class='latex' />.
</li>
</ul>
<p>Which quantity should we call &#8220;diversity&#8221;?  Do all these quantities really measure the same kind of thing?  If community A has greater than species richness than community B, but lower Shannon entropy, what does it <i>mean</i>?</p>
<p>Another cause for confusion is a blurring between the questions</p>
<blockquote>
<p><i>Which quantities deserve to be called diversity?</i></p>
</blockquote>
<p>and</p>
<blockquote>
<p><i>Which quantities are we capable of measuring experimentally?</i></p>
</blockquote>
<p>For example, we might all agree that species richness is an important quantity, but that doesn&#8217;t mean that species richness is easy to measure in practice.  (In fact, it&#8217;s not, more on which below.)  My own view is that the two questions should be kept separate:</p>
<blockquote>
<p>The statistical problem of designing appropriate estimators becomes relevant only after the measure to be estimated is accepted to be meaningful.</p>
</blockquote>
<p>(Hans-Rolf Gregorius, Elizabeth M. Gillet, Generalized Simpson-diversity, <i>Ecological Modelling</i> <b>211</b>:90&ndash;96, 2008.)</p>
<p>The problems involved in quantifying diversity are of three types: <b>practical</b>, <b>statistical</b> and <b>conceptual</b>. I&#8217;ll say a little about the first two, and rather more about the third.</p>
<p><b>Practical</b>&nbsp; Suppose that you&#8217;re doing a survey of the vertebrates in a forest.  Perhaps one important species is brightly coloured and noisy, while another is silent, shy, and well-camouflaged.  How do you prevent the first from being recorded disproportionately?</p>
<p>Or suppose that you&#8217;re carrying out a survey, with multiple people doing the fieldwork.  Different people have a tendency to spot different things: for example, one person might be short-sighted and another long-sighted.  How do you ensure that this doesn&#8217;t affect your results?</p>
<p><b>Statistical</b>&nbsp; Imagine that you want to know how many distinct species of insect live in a particular area &mdash; the &#8220;species richness&#8221;, in the terminology introduced above.  You go out collecting, and you come back with 100 specimens representing 10 species.</p>
<p>But your survey might have missed some species altogether, so you go out and get a bigger sample.  This time, you get 200 specimens representing 15 species.  Does this help you discover how many species there <i>really</i> are?  </p>
<p>Logically, not at all.  The only certainty is that there are at least 15 species.  Maybe there are thousands of species, but almost all of them are extremely rare.  Or maybe there are really only 15.  Unless you collect <i>all</i> the insects, you&#8217;ll never know for sure exactly how many species there are.</p>
<p>However, it may be that you can make reasonable assumptions about the frequency distribution of the species.  People sometimes do exactly this, to try to overcome the difficulty of estimating species richness.</p>
<p><b>Conceptual</b>&nbsp; This is what I really want to talk about.</p>
<p>I mentioned earlier that different people mean different things by &#8220;diversity&#8221;.  Here&#8217;s an example. </p>
<p>Consider two bird communities.  The first looks like this:</p>
<p><img src="https://i0.wp.com/www.maths.ed.ac.uk/~tl/birds_top.png" alt="" /></p>
<p>It contains four species, one of which is responsible for most of the population, and three of which are quite rare.  The second looks like this:</p>
<p><img src="https://i0.wp.com/www.maths.ed.ac.uk/~tl/birds_bottom.png" alt="" /></p>
<p>It has only three species, but they&#8217;re evenly balanced.</p>
<p>Which community is the more diverse?  It&#8217;s a matter of opinion. Mostly in the press, and in many scholarly articles too, &#8220;biodiversity&#8221; is used as a synonym for &#8220;species richness&#8221;.  On this count, the first community is more diverse.  But if you&#8217;re more concerned with the healthy functioning of the whole community, the presence of rare species might not be particularly important: it&#8217;s <i>balance</i> that matters, and the second community has more of that.</p>
<p>Different people using the word &#8220;diversity&#8221; attach different amounts of significance to rare species.  There&#8217;s a spectrum of points of view, ranging from those who give rare species the same weight as common ones (as in the definition of species richness) to those who are only interested in the most common species of all.  Every point on this spectrum of viewpoints is reasonable.  None should have a monopoly on the word &#8220;diversity&#8221;. </p>
<p>At least, that&#8217;s what <a href="http://www.maths.gla.ac.uk/~cc">Christina Cobbold</a> and I argue in our new paper:</p>
<p>&bull; Tom Leinster, Christina A. Cobbold, <a href="http://www.maths.ed.ac.uk/~tl/mdiss.pdf">Measuring diversity: the importance of species similarity</a>, <i>Ecology</i>, in press (<a href="http://www.esajournals.org/doi/abs/10.1890/10-2402.1">doi:10.1890/10-2402.1</a>).</p>
<p>But that&#8217;s not actually our main point.  As the title suggests, the real purpose of our paper is to show how to measure diversity in a way that reflects <i>the varying differences between species</i>.  I&#8217;ll explain. </p>
<p>Most of the existing approaches to measuring biodiversity go like this.</p>
<p>We have a &#8220;community&#8221; of organisms &mdash; the fish in a lake, the fungi in a forest, or the bacteria on your skin.  This community is divided into <img src='https://s0.wp.com/latex.php?latex=S&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='S' title='S' class='latex' /> groups, conventionally called <b>species</b>, though they needn&#8217;t be species in the ordinary sense.</p>
<p>We assume that we know the <b>relative abundances</b>, or relative frequencies, of the species.  Write them as <img src='https://s0.wp.com/latex.php?latex=p_1%2C+%5Cldots%2C+p_S&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='p_1, &#92;ldots, p_S' title='p_1, &#92;ldots, p_S' class='latex' />.  Thus, <img src='https://s0.wp.com/latex.php?latex=p_i&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='p_i' title='p_i' class='latex' /> is the proportion of the total population that belongs to the <img src='https://s0.wp.com/latex.php?latex=i&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='i' title='i' class='latex' />th species, where &#8220;proportion&#8221; is measured in any way you think sensible (number of individuals, total mass, etc).</p>
<p>We only care about <i>relative</i> abundances here, not <i>absolute</i> abundances: so <img src='https://s0.wp.com/latex.php?latex=p_1+%2B+%5Ccdots+%2B+p_S+%3D+1&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='p_1 + &#92;cdots + p_S = 1' title='p_1 + &#92;cdots + p_S = 1' class='latex' />. If half of a forest is destroyed, it might be a catastrophe, but on the (unrealistic) assumption that all the flora and fauna in the forest were distributed homogeneously, it won&#8217;t actually change the biodiversity.  (That&#8217;s not a statement about what&#8217;s important in life; it&#8217;s only a statement about the usage of a word.)  </p>
<p>This model is common but crude.  It can&#8217;t detect the difference between a community of six dramatically different species and a community consisting of six species of barnacle.</p>
<p>So, Christina and I use a refined model, as follows.  We assume that we also have a measure of the <b>similarity</b> between each pair of species.  This is a real number between 0 and 1, with 0 indicating that the species are as dissimilar as could be, and 1 indicating that they&#8217;re identical.  Writing the similarity between the <img src='https://s0.wp.com/latex.php?latex=i&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='i' title='i' class='latex' />th and <img src='https://s0.wp.com/latex.php?latex=j&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='j' title='j' class='latex' />th species as <img src='https://s0.wp.com/latex.php?latex=Z_%7Bij%7D&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='Z_{ij}' title='Z_{ij}' class='latex' />, this gives an <img src='https://s0.wp.com/latex.php?latex=S+%5Ctimes+S&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='S &#92;times S' title='S &#92;times S' class='latex' /> matrix <img src='https://s0.wp.com/latex.php?latex=%5Cmathbf%7BZ%7D&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='&#92;mathbf{Z}' title='&#92;mathbf{Z}' class='latex' />.  Our only assumption on <img src='https://s0.wp.com/latex.php?latex=%5Cmathbf%7BZ%7D&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='&#92;mathbf{Z}' title='&#92;mathbf{Z}' class='latex' /> is that its diagonal entries are all 1: every species is identical to itself.</p>
<p>There are many ways of measuring inter-species similarity. Probably the most familiar approach is genetic, as in &#8220;you share 98% of your DNA with a chimpanzee&#8221;.  But there are many other possibilities: functional, phylogenetic, morphological, taxonomic, &#8230;.  Diversity is a measure of the variety of life; having to choose a measure of similarity forces you to get clear exactly what you mean by &#8220;variety&#8221;.</p>
<p>Christina and I are by no means the first people to incorporate species similarity into the model of an ecological community.  The main new thing in our paper is this measure of the community&#8217;s diversity:</p>
<p><img src='https://s0.wp.com/latex.php?latex=%7B%7D%5Eq+D%5E%7B%5Cmathbf%7BZ%7D%7D%28%5Cmathbf%7Bp%7D%29+%3D+%28+%5Csum_i+p_i+%28%5Cmathbf%7BZ%7D%5Cmathbf%7Bp%7D%29_i%5E%7Bq+-+1%7D+%29%5E%7B1%2F%281+-+q%29%7D.&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='{}^q D^{&#92;mathbf{Z}}(&#92;mathbf{p}) = ( &#92;sum_i p_i (&#92;mathbf{Z}&#92;mathbf{p})_i^{q - 1} )^{1/(1 - q)}.' title='{}^q D^{&#92;mathbf{Z}}(&#92;mathbf{p}) = ( &#92;sum_i p_i (&#92;mathbf{Z}&#92;mathbf{p})_i^{q - 1} )^{1/(1 - q)}.' class='latex' /></p>
<p>What does this mean?</p>
<ul>
<li>
<img src='https://s0.wp.com/latex.php?latex=%7B%7D%5Eq+D%5E%7B%5Cmathbf%7BZ%7D%7D%28%5Cmathbf%7Bp%7D%29&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='{}^q D^{&#92;mathbf{Z}}(&#92;mathbf{p})' title='{}^q D^{&#92;mathbf{Z}}(&#92;mathbf{p})' class='latex' /> is what we call the <b>diversity of order <img src='https://s0.wp.com/latex.php?latex=q&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='q' title='q' class='latex' /></b> of the community.  Here <img src='https://s0.wp.com/latex.php?latex=q&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='q' title='q' class='latex' /> is a parameter between <img src='https://s0.wp.com/latex.php?latex=0&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='0' title='0' class='latex' /> and <img src='https://s0.wp.com/latex.php?latex=%5Cinfty&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='&#92;infty' title='&#92;infty' class='latex' />, which you get to choose.  Different values of <img src='https://s0.wp.com/latex.php?latex=q&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='q' title='q' class='latex' /> represent different points on the spectrum of viewpoints described above.  Small values of <img src='https://s0.wp.com/latex.php?latex=q&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='q' title='q' class='latex' /> give high importance to rare species; large values of <img src='https://s0.wp.com/latex.php?latex=q&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='q' title='q' class='latex' /> give high importance to common species.
</li>
<li>
<img src='https://s0.wp.com/latex.php?latex=%5Cmathbf%7Bp%7D&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='&#92;mathbf{p}' title='&#92;mathbf{p}' class='latex' /> is shorthand for the relative abundances <img src='https://s0.wp.com/latex.php?latex=p_1%2C+%5Cldots%2C+p_S&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='p_1, &#92;ldots, p_S' title='p_1, &#92;ldots, p_S' class='latex' />, and <img src='https://s0.wp.com/latex.php?latex=%5Cmathbf%7BZ%7D&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='&#92;mathbf{Z}' title='&#92;mathbf{Z}' class='latex' /> is the matrix of similarities.
</li>
<li>
<img src='https://s0.wp.com/latex.php?latex=%28%5Cmathbf%7BZ%7D%5Cmathbf%7Bp%7D%29_i&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='(&#92;mathbf{Z}&#92;mathbf{p})_i' title='(&#92;mathbf{Z}&#92;mathbf{p})_i' class='latex' /> means <img src='https://s0.wp.com/latex.php?latex=%5Csum_j+Z_%7Bij%7D+p_j&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='&#92;sum_j Z_{ij} p_j' title='&#92;sum_j Z_{ij} p_j' class='latex' />.
</li>
</ul>
<p>The expression doesn&#8217;t make sense if <img src='https://s0.wp.com/latex.php?latex=q+%3D+1&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='q = 1' title='q = 1' class='latex' /> or <img src='https://s0.wp.com/latex.php?latex=q+%3D+%5Cinfty&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='q = &#92;infty' title='q = &#92;infty' class='latex' />, but can be made sense of by taking limits.  For <img src='https://s0.wp.com/latex.php?latex=q+%3D+1&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='q = 1' title='q = 1' class='latex' />, this gives</p>
<p><img src='https://s0.wp.com/latex.php?latex=%7B%7D%5E1+D%5E%7B%5Cmathbf%7BZ%7D%7D%28%5Cmathbf%7Bp%7D%29+%3D+1%2F%28%5Cmathbf%7BZ+p%7D%29_1%5E%7Bp_1%7D+%28%5Cmathbf%7BZ+p%7D%29_2%5E%7Bp_2%7D+%5Ccdots+%28%5Cmathbf%7BZ+p%7D%29_S%5E%7Bp_S%7D+%3D+%5Cexp%28-%5Csum_i+p_i+%5Clog%28%5Cmathbf%7BZ+p%7D%29_i%29&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='{}^1 D^{&#92;mathbf{Z}}(&#92;mathbf{p}) = 1/(&#92;mathbf{Z p})_1^{p_1} (&#92;mathbf{Z p})_2^{p_2} &#92;cdots (&#92;mathbf{Z p})_S^{p_S} = &#92;exp(-&#92;sum_i p_i &#92;log(&#92;mathbf{Z p})_i)' title='{}^1 D^{&#92;mathbf{Z}}(&#92;mathbf{p}) = 1/(&#92;mathbf{Z p})_1^{p_1} (&#92;mathbf{Z p})_2^{p_2} &#92;cdots (&#92;mathbf{Z p})_S^{p_S} = &#92;exp(-&#92;sum_i p_i &#92;log(&#92;mathbf{Z p})_i)' class='latex' /></p>
<p>If you want to know the value at <img src='https://s0.wp.com/latex.php?latex=q+%3D+%5Cinfty&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='q = &#92;infty' title='q = &#92;infty' class='latex' />, or any of the other mathematical details, you can read <a href="http://golem.ph.utexas.edu/category/2011/10/measuring_diversity.html">this post at the <i>n</i>-Category Caf&eacute;</a>, or of course our paper.  In both places, you&#8217;ll also find an explanation of what motivates this formula.  What&#8217;s more, you&#8217;ll see that many existing measures of diversity are special cases of ours, obtained by taking particular values for <img src='https://s0.wp.com/latex.php?latex=q&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='q' title='q' class='latex' /> and/or <img src='https://s0.wp.com/latex.php?latex=%5Cmathbf%7BZ%7D&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='&#92;mathbf{Z}' title='&#92;mathbf{Z}' class='latex' />.</p>
<p>But I won&#8217;t talk about any of that here.  Instead, I&#8217;ll tell you  how taking species similarity into account can radically alter the assessment of diversity.</p>
<p>I&#8217;ll do this using an example: butterflies of subfamily Charaxinae at a site in an Ecuadorian rainforest. The data is from here:</p>
<p>&bull; P. J. DeVries, D. Murray, R. Lande, Species diversity in vertical, horizontal and temporal dimensions of a fruit-feeding butterfly community in an Ecuadorian rainforest.  <i>Biological Journal of the Linnean Society</i> <b>62</b>:343&ndash;364, 1997.</p>
<p>They measured the butterfly abundances in both the canopy (top level) and understorey (lower level) at this site, with the following results:</p>
<table cellpadding="1" cellspacing="0">
<tr>
<td><b>Species</b></td>
<td><b>Canopy&nbsp;&nbsp;</b></td>
<td><b>Understorey</b></td>
</tr>
<tr>
<td><i>Prepona laertes</i></td>
<td>15</td>
<td>0</td>
</tr>
<tr>
<td><i>Archaeoprepona demophon&nbsp;&nbsp;</i></td>
<td>14</td>
<td>37</td>
</tr>
<tr>
<td><i>Zaretis itys</i></td>
<td>25</td>
<td>11</td>
</tr>
<tr>
<td><i>Memphis arachne</i></td>
<td>89</td>
<td>23</td>
</tr>
<tr>
<td><i>Memphis offa</i></td>
<td>21</td>
<td>3</td>
</tr>
<tr>
<td><i>Memphis xenocles</i></td>
<td>32</td>
<td>8</td>
</tr>
</table>
<p>Which is more diverse: canopy or understorey?</p>
<p>We&#8217;ve already seen that the answer is going to depend on what exactly we mean by &#8220;diverse&#8221;.</p>
<p>First let&#8217;s answer the question under the (crude!) assumption that different species have nothing whatsoever in common.  This means taking our similarity matrix <img src='https://s0.wp.com/latex.php?latex=%5Cmathbf%7BZ%7D&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='&#92;mathbf{Z}' title='&#92;mathbf{Z}' class='latex' /> to be the identity matrix: if <img src='https://s0.wp.com/latex.php?latex=i+%5Cneq+j&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='i &#92;neq j' title='i &#92;neq j' class='latex' /> then <img src='https://s0.wp.com/latex.php?latex=Z_%7Bij%7D+%3D+0&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='Z_{ij} = 0' title='Z_{ij} = 0' class='latex' /> (totally dissimilar), and if <img src='https://s0.wp.com/latex.php?latex=i+%3D+j&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='i = j' title='i = j' class='latex' /> then <img src='https://s0.wp.com/latex.php?latex=Z_%7Bii%7D+%3D+1&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='Z_{ii} = 1' title='Z_{ii} = 1' class='latex' /> (totally identical).</p>
<p>Now, remember that there&#8217;s a spectrum of viewpoints on how much importance to give to rare species when measuring diversity. Rather than choosing a particular viewpoint, we&#8217;ll calculate the diversity from <i>all</i> viewpoints, and display it on a graph. In other words, we&#8217;ll draw the graph of <img src='https://s0.wp.com/latex.php?latex=%7B%7D%5Eq+D%5E%7B%5Cmathbf%7BZ%7D%7D%28%5Cmathbf%7Bp%7D%29&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='{}^q D^{&#92;mathbf{Z}}(&#92;mathbf{p})' title='{}^q D^{&#92;mathbf{Z}}(&#92;mathbf{p})' class='latex' /> (the diversity of order <img src='https://s0.wp.com/latex.php?latex=q&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='q' title='q' class='latex' />) against <img src='https://s0.wp.com/latex.php?latex=q&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='q' title='q' class='latex' /> (the viewpoint).  Here&#8217;s what we get:</p>
<p><img width="450" src="https://i0.wp.com/www.maths.ed.ac.uk/~tl/naive_bflies.jpg" alt="" /></p>
<p>(the horizontal axis should be labelled with a <img src='https://s0.wp.com/latex.php?latex=q&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='q' title='q' class='latex' />.)</p>
<p>Conclusion: from <i>all</i> viewpoints, the butterfly population in the canopy is at least as diverse as that in the understorey.</p>
<p>Now let&#8217;s do it again, but this time taking account of the varying <i>similarities</i> between species of butterflies.  We don&#8217;t have much to go on: how do we know whether <i>Prepona laertes</i> is very similar to, or very different from, <i>Archaeoprepona demophon</i>?   With only the data above, we don&#8217;t.  So what can we do?</p>
<p>All we have to go on is the taxonomy.  Remember your high school biology: for the butterfly <i>Prepona laertes</i>, the genus is <i>Prepona</i> and the species is <i>laertes</i>.  We&#8217;d expect species in the same genus to have more in common than species in different genera.  So let&#8217;s define the similarity between two species as follows:</p>
<ul>
<li>
the similarity is 1 if the species are the same
</li>
<li>
the similarity is 0.5 if the species are different but in the same genus
</li>
<li>
the similarity is 0 if they are not even in the same genus.
</li>
</ul>
<p>This is still crude, but in the absence of further information, it&#8217;s about the best we can do.  And it&#8217;s better than the first approach, where we ignored the taxonomy entirely.  Throwing away biologically relevant information is unlikely to lead to a better assessment of diversity.</p>
<p>Using this taxonomic matrix <img src='https://s0.wp.com/latex.php?latex=%5Cmathbf%7BZ%7D&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='&#92;mathbf{Z}' title='&#92;mathbf{Z}' class='latex' />, and the same abundances, the diversity graphs become:</p>
<p><img width="450" src="https://i1.wp.com/www.maths.ed.ac.uk/~tl/taxo_bflies.jpg" alt="" /></p>
<p>This is more interesting!  For <img src='https://s0.wp.com/latex.php?latex=q+%3E+1&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='q &gt; 1' title='q &gt; 1' class='latex' />, the understorey looks <i>more</i> diverse than the canopy &mdash; the opposite conclusion to our first approach.</p>
<p>It&#8217;s not hard to see why.  Look again at the table of abundances, but paying attention to the <i>genera</i> of the butterflies.  In the canopy, nearly three-quarters of the butterflies are of genus <i>Memphis</i>.  So when we take into account the fact that species in the same genus tend to be somewhat similar, the canopy looks much less diverse than it did before.  In the understorey, however, the species are spread more evenly between genera, so taking similarity into account leaves its diversity relatively unchanged.</p>
<p>Taking account of species similarity opens up a world of uncertainty.  How should we measure similarity?  There are as many possibilities as there are quantifiable characteristics of living organisms.  It&#8217;s much more reassuring to stay in the black-and-white world where distinct species are always assigned a similarity of 0, no matter how similar they might actually be.  (This is, effectively, what most existing measures do.)  But that&#8217;s just hiding from reality. </p>
<p>Maybe you disagree!  If so, try the the Discussion section of our <a href="http://www.maths.ed.ac.uk/~tl/mdiss.pdf">paper</a>, where we lay out our arguments in more detail.  Or let me know by leaving a comment.</p>
]]></html><thumbnail_url><![CDATA[https://i0.wp.com/www.maths.ed.ac.uk/~tl/birds_top.png?fit=440%2C330]]></thumbnail_url><thumbnail_height><![CDATA[304]]></thumbnail_height><thumbnail_width><![CDATA[294]]></thumbnail_width></oembed>