<?xml version="1.0" encoding="UTF-8" standalone="yes"?><oembed><version><![CDATA[1.0]]></version><provider_name><![CDATA[Azimuth]]></provider_name><provider_url><![CDATA[https://johncarlosbaez.wordpress.com]]></provider_url><author_name><![CDATA[John Baez]]></author_name><author_url><![CDATA[https://johncarlosbaez.wordpress.com/author/johncarlosbaez/]]></author_url><title><![CDATA[Information Geometry (Part&nbsp;10)]]></title><type><![CDATA[link]]></type><html><![CDATA[<p><a href="https://johncarlosbaez.wordpress.com/2012/06/01/information-geometry-part-9/">Last time</a> I began explaining the tight relation between three concepts:</p>
<p>&bull; entropy, </p>
<p>&bull; information&mdash;or more precisely, lack of information, </p>
<p>and</p>
<p>&bull; biodiversity.</p>
<p>The idea is to consider <img src='https://s0.wp.com/latex.php?latex=n&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='n' title='n' class='latex' /> different species of &#8216;replicators&#8217;.  A replicator is any entity that can reproduce itself, like an organism, a gene, or a meme.  A replicator can come in different kinds, and a &#8216;species&#8217; is just our name for one of these kinds.  If <img src='https://s0.wp.com/latex.php?latex=P_i&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='P_i' title='P_i' class='latex' /> is the population of the <img src='https://s0.wp.com/latex.php?latex=i&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='i' title='i' class='latex' />th species, we can interpret the fraction</p>
<p><img src='https://s0.wp.com/latex.php?latex=%5Cdisplaystyle%7B+p_i+%3D+%5Cfrac%7BP_i%7D%7B%5Csum_j+P_j%7D+%7D+&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='&#92;displaystyle{ p_i = &#92;frac{P_i}{&#92;sum_j P_j} } ' title='&#92;displaystyle{ p_i = &#92;frac{P_i}{&#92;sum_j P_j} } ' class='latex' /></p>
<p>as a probability: the probability that a randomly chosen replicator belongs to the <img src='https://s0.wp.com/latex.php?latex=i&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='i' title='i' class='latex' />th species.  This suggests that we define <i><a href="http://en.wikipedia.org/wiki/Entropy_%28statistical_thermodynamics%29">entropy</a></i> just as we do in statistical mechanics:</p>
<p><img src='https://s0.wp.com/latex.php?latex=%5Cdisplaystyle%7B+S+%3D+-+%5Csum_i+p_i+%5Cln%28p_i%29+%7D+&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='&#92;displaystyle{ S = - &#92;sum_i p_i &#92;ln(p_i) } ' title='&#92;displaystyle{ S = - &#92;sum_i p_i &#92;ln(p_i) } ' class='latex' /></p>
<p>In the study of statistical inference, entropy is a measure of uncertainty, or lack of <i><a href="http://en.wikipedia.org/wiki/Entropy_%28information_theory%29">information</a></i>.  But now we can interpret it as a measure of <i><a href="http://www.loujost.com/Statistics%20and%20Physics/Diversity%20and%20Similarity/JostEntropy%20AndDiversity.pdf">biodiversity</a></i>: it&#8217;s zero when just one species is present, and small when a few species have much larger populations than all the rest, but gets big otherwise.  </p>
<p>Our goal here is play these viewpoints off against each other.  In short, we want to think of natural selection, and even biological evolution, as a process of statistical inference&mdash;or in simple terms, <i>learning</i>.  </p>
<p>To do this, let&#8217;s think about how entropy changes with time.  Last time we introduced a simple model called the <b><a href="http://en.wikipedia.org/wiki/Replicator_equation">replicator equation</a></b>:</p>
<p><img src='https://s0.wp.com/latex.php?latex=%5Cdisplaystyle%7B+%5Cfrac%7Bd+P_i%7D%7Bd+t%7D+%3D+f_i%28P_1%2C+%5Cdots%2C+P_n%29+%5C%2C+P_i+%7D+&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='&#92;displaystyle{ &#92;frac{d P_i}{d t} = f_i(P_1, &#92;dots, P_n) &#92;, P_i } ' title='&#92;displaystyle{ &#92;frac{d P_i}{d t} = f_i(P_1, &#92;dots, P_n) &#92;, P_i } ' class='latex' /></p>
<p>where each population grows at a rate proportional to some &#8216;fitness functions&#8217; <img src='https://s0.wp.com/latex.php?latex=f_i&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='f_i' title='f_i' class='latex' />.  We can get some intuition by looking at the pathetically simple case where these functions are actually <i>constants</i>, so</p>
<p><img src='https://s0.wp.com/latex.php?latex=%5Cdisplaystyle%7B+%5Cfrac%7Bd+P_i%7D%7Bd+t%7D+%3D+f_i+%5C%2C+P_i+%7D+&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='&#92;displaystyle{ &#92;frac{d P_i}{d t} = f_i &#92;, P_i } ' title='&#92;displaystyle{ &#92;frac{d P_i}{d t} = f_i &#92;, P_i } ' class='latex' /></p>
<p>The equation then becomes trivial to solve:</p>
<p><img src='https://s0.wp.com/latex.php?latex=%5Cdisplaystyle%7B+P_i%28t%29+%3D+e%5E%7Bt+f_i+%7D+P_i%280%29%7D+&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='&#92;displaystyle{ P_i(t) = e^{t f_i } P_i(0)} ' title='&#92;displaystyle{ P_i(t) = e^{t f_i } P_i(0)} ' class='latex' /></p>
<p>Last time I showed that in this case, the entropy will eventually decrease.  It will go to zero as <img src='https://s0.wp.com/latex.php?latex=t+%5Cto+%2B%5Cinfty&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='t &#92;to +&#92;infty' title='t &#92;to +&#92;infty' class='latex' /> whenever one species is fitter than all the rest and starts out with a nonzero population&#8212;since then this species will eventually take over.  </p>
<p>But remember, the entropy of a probability distribution is its <i>lack</i> of information.  So the decrease in entropy signals an increase in information.  And last time I argued that this makes perfect sense.   As the fittest species takes over and biodiversity drops, <i>the population is acquiring information about its environment</i>.  </p>
<p>However, I never said the entropy is <i>always</i> decreasing, because that&#8217;s false!  Even in this pathetically simple case, entropy can increase.</p>
<p>Suppose we start with many replicators belonging to one very unfit species, and a few belonging to various more fit species.  The probability distribution <img src='https://s0.wp.com/latex.php?latex=p_i&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='p_i' title='p_i' class='latex' /> will start out sharply peaked, so the entropy will start out low:</p>
<div align="center">
<img width="250" src="https://i2.wp.com/math.ucr.edu/home/baez/mathematical/biodiversity_0.png" />
</div>
<p>Now think about what happens when time passes.  At first the unfit species will rapidly die off, while the population of the other species slowly grows:</p>
<div align="center">
<img width="250" src="https://i1.wp.com/math.ucr.edu/home/baez/mathematical/biodiversity_1.png" />
</div>
<p>&nbsp;</p>
<div align="center">
<img width="250" src="https://i1.wp.com/math.ucr.edu/home/baez/mathematical/biodiversity_2.png" />
</div>
<p>So the probability distribution will, for a while, become less sharply peaked.  Thus, <i>for a while</i>, the entropy will increase!  </p>
<p>This seems to conflict with our idea that the population&#8217;s entropy should decrease as it acquires information about its environment.  But in fact this phenomenon is familiar in the study of statistical inference.  If you start out with strongly held <i>false</i> beliefs about a situation, the first effect of learning more is to become <i>less</i> certain about what&#8217;s going on! </p>
<p>Get it?  Say you start out by assigning a high probability to some wrong guess about a situation.   The entropy of your probability distribution is low: you&#8217;re quite certain about what&#8217;s going on.  But you&#8217;re wrong.  When you first start suspecting you&#8217;re wrong, you become more uncertain about what&#8217;s going on.   Your probability distribution flattens out, and the entropy goes up. </p>
<p>So, sometimes learning involves a decrease in information&#8212;<i>false</i> information.  There&#8217;s nothing about the mathematical concept of information that says this information is <i>true</i>.</p>
<p>Given this, it&#8217;s good to work out a formula for the rate of change of entropy, which will let us see more clearly when it goes down and when it goes up.  To do this, first let&#8217;s derive a completely general formula for the time derivative of the entropy of a probability distribution.  Following Sir Isaac Newton, we&#8217;ll use a dot to stand for a time derivative:</p>
<p><img src='https://s0.wp.com/latex.php?latex=%5Cbegin%7Barray%7D%7Bccl%7D+%5Cdisplaystyle%7B++%5Cdot%7BS%7D%7D+%26%3D%26+%5Cdisplaystyle%7B+-++%5Cfrac%7Bd%7D%7Bdt%7D+%5Csum_i+p_i+%5Cln+%28p_i%29%7D+%5C%5C+++%5C%5C++%26%3D%26+-+%5Cdisplaystyle%7B+%5Csum_i+%5Cdot%7Bp%7D_i+%5Cln+%28p_i%29+%2B+%5Cdot%7Bp%7D_i+%7D++%5Cend%7Barray%7D&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='&#92;begin{array}{ccl} &#92;displaystyle{  &#92;dot{S}} &amp;=&amp; &#92;displaystyle{ -  &#92;frac{d}{dt} &#92;sum_i p_i &#92;ln (p_i)} &#92;&#92;   &#92;&#92;  &amp;=&amp; - &#92;displaystyle{ &#92;sum_i &#92;dot{p}_i &#92;ln (p_i) + &#92;dot{p}_i }  &#92;end{array}' title='&#92;begin{array}{ccl} &#92;displaystyle{  &#92;dot{S}} &amp;=&amp; &#92;displaystyle{ -  &#92;frac{d}{dt} &#92;sum_i p_i &#92;ln (p_i)} &#92;&#92;   &#92;&#92;  &amp;=&amp; - &#92;displaystyle{ &#92;sum_i &#92;dot{p}_i &#92;ln (p_i) + &#92;dot{p}_i }  &#92;end{array}' class='latex' /></p>
<p>In the last term we took the derivative of the logarithm and got a factor of <img src='https://s0.wp.com/latex.php?latex=1%2Fp_i&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='1/p_i' title='1/p_i' class='latex' /> which cancelled the factor of <img src='https://s0.wp.com/latex.php?latex=p_i&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='p_i' title='p_i' class='latex' />.  But since </p>
<p><img src='https://s0.wp.com/latex.php?latex=%5Cdisplaystyle%7B++%5Csum_i+p_i+%3D+1+%7D+&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='&#92;displaystyle{  &#92;sum_i p_i = 1 } ' title='&#92;displaystyle{  &#92;sum_i p_i = 1 } ' class='latex' /></p>
<p>we know</p>
<p><img src='https://s0.wp.com/latex.php?latex=%5Cdisplaystyle%7B+%5Csum_i+%5Cdot%7Bp%7D_i+%3D+0+%7D+&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='&#92;displaystyle{ &#92;sum_i &#92;dot{p}_i = 0 } ' title='&#92;displaystyle{ &#92;sum_i &#92;dot{p}_i = 0 } ' class='latex' /></p>
<p>so this last term vanishes:</p>
<p><img src='https://s0.wp.com/latex.php?latex=%5Cdisplaystyle%7B+%5Cdot%7BS%7D%3D+-%5Csum_i+%5Cdot%7Bp%7D_i+%5Cln+%28p_i%29+%7D+&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='&#92;displaystyle{ &#92;dot{S}= -&#92;sum_i &#92;dot{p}_i &#92;ln (p_i) } ' title='&#92;displaystyle{ &#92;dot{S}= -&#92;sum_i &#92;dot{p}_i &#92;ln (p_i) } ' class='latex' /></p>
<p>Nice!   To go further, we need a formula for <img src='https://s0.wp.com/latex.php?latex=%5Cdot%7Bp%7D_i&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='&#92;dot{p}_i' title='&#92;dot{p}_i' class='latex' />.  For this we might as well return to the general replicator equation, dropping the pathetically special assumption that the fitness functions are actually constants.   Then we saw last time that</p>
<p><img src='https://s0.wp.com/latex.php?latex=%5Cdisplaystyle%7B+%5Cdot%7Bp%7D_i+%3D+%5CBig%28+f_i%28P%29+-+%5Clangle+f%28P%29+%5Crangle++%5CBig%29+%5C%2C+p_i+%7D&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='&#92;displaystyle{ &#92;dot{p}_i = &#92;Big( f_i(P) - &#92;langle f(P) &#92;rangle  &#92;Big) &#92;, p_i }' title='&#92;displaystyle{ &#92;dot{p}_i = &#92;Big( f_i(P) - &#92;langle f(P) &#92;rangle  &#92;Big) &#92;, p_i }' class='latex' /></p>
<p>where we used the abbreviation</p>
<p><img src='https://s0.wp.com/latex.php?latex=f_i%28P%29+%3D+f_i%28P_1%2C+%5Cdots%2C+P_n%29+&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='f_i(P) = f_i(P_1, &#92;dots, P_n) ' title='f_i(P) = f_i(P_1, &#92;dots, P_n) ' class='latex' /></p>
<p>for the fitness of the <img src='https://s0.wp.com/latex.php?latex=i&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='i' title='i' class='latex' />th species, and defined the <b>mean fitness</b> to be</p>
<p><img src='https://s0.wp.com/latex.php?latex=%5Cdisplaystyle%7B+%5Clangle+f%28P%29+%5Crangle+%3D+%5Csum_i+f_i%28P%29+p_i++%7D+&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='&#92;displaystyle{ &#92;langle f(P) &#92;rangle = &#92;sum_i f_i(P) p_i  } ' title='&#92;displaystyle{ &#92;langle f(P) &#92;rangle = &#92;sum_i f_i(P) p_i  } ' class='latex' /></p>
<p>Using this cute formula for <img src='https://s0.wp.com/latex.php?latex=%5Cdot%7Bp%7D_i&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='&#92;dot{p}_i' title='&#92;dot{p}_i' class='latex' />, we get the final result:</p>
<p><img src='https://s0.wp.com/latex.php?latex=%5Cdisplaystyle%7B+%5Cdot%7BS%7D+%3D+-+%5Csum_i+%5CBig%28+f_i%28P%29+-+%5Clangle+f%28P%29+%5Crangle+%5CBig%29+%5C%2C+p_i+%5Cln+%28p_i%29+%7D+&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='&#92;displaystyle{ &#92;dot{S} = - &#92;sum_i &#92;Big( f_i(P) - &#92;langle f(P) &#92;rangle &#92;Big) &#92;, p_i &#92;ln (p_i) } ' title='&#92;displaystyle{ &#92;dot{S} = - &#92;sum_i &#92;Big( f_i(P) - &#92;langle f(P) &#92;rangle &#92;Big) &#92;, p_i &#92;ln (p_i) } ' class='latex' /></p>
<p>This is strikingly similar to the formula for entropy itself.  But now each term in the sum includes a factor saying how much more fit than average, or less fit, that species is.  The quantity <img src='https://s0.wp.com/latex.php?latex=-+p_i+%5Cln%28p_i%29&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='- p_i &#92;ln(p_i)' title='- p_i &#92;ln(p_i)' class='latex' /> is always nonnegative, since the graph of <img src='https://s0.wp.com/latex.php?latex=-x+%5Cln%28x%29&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='-x &#92;ln(x)' title='-x &#92;ln(x)' class='latex' /> looks like this:</p>
<div align="center">
<img src="https://i2.wp.com/math.ucr.edu/home/baez/mathematical/-xlnx.png" />
</div>
<p>So, the <img src='https://s0.wp.com/latex.php?latex=i&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='i' title='i' class='latex' />th term contributes positively to the change in entropy if the <img src='https://s0.wp.com/latex.php?latex=i&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='i' title='i' class='latex' />th species is fitter than average, but negatively if it&#8217;s less fit than average.</p>
<p>This may seem counterintuitive!</p>
<p><b>Puzzle 1.</b> How can we reconcile this fact with our earlier observations about the case when the fitness of each species is population-independent?  Namely: a) if initially most of the replicators belong to one very unfit species, the entropy will rise at first, but b) in the long run, when the fittest species present take over, the entropy drops?  </p>
<p>If this seems too tricky, look at some examples!  The first illustrates observation a); the second illustrates observation b):</p>
<p><b>Puzzle 2.</b>  Suppose we have two species, one with fitness equal to 1 initially constituting 90% of the population, the other with fitness equal to 10 initially constituting just 10% of the population:</p>
<p><img src='https://s0.wp.com/latex.php?latex=%5Cbegin%7Barray%7D%7Bccc%7D+f_1+%3D+1%2C+%26+%26++p_1%280%29+%3D+0.9+%5C%5C+%5C%5C++++++++++++++++++++++++++++f_2+%3D+10+%2C+%26+%26+p_2%280%29+%3D+0.1+++%5Cend%7Barray%7D+&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='&#92;begin{array}{ccc} f_1 = 1, &amp; &amp;  p_1(0) = 0.9 &#92;&#92; &#92;&#92;                            f_2 = 10 , &amp; &amp; p_2(0) = 0.1   &#92;end{array} ' title='&#92;begin{array}{ccc} f_1 = 1, &amp; &amp;  p_1(0) = 0.9 &#92;&#92; &#92;&#92;                            f_2 = 10 , &amp; &amp; p_2(0) = 0.1   &#92;end{array} ' class='latex' /></p>
<p>At what rate does the entropy change at <img src='https://s0.wp.com/latex.php?latex=t+%3D+0&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='t = 0' title='t = 0' class='latex' />?  Which species is responsible for most of this change?</p>
<p><b>Puzzle 3.</b>  Suppose we have two species, one with fitness equal to 10 initially constituting 90% of the population, and the other with fitness equal to 1 initially constituting just 10% of the population:</p>
<p><img src='https://s0.wp.com/latex.php?latex=%5Cbegin%7Barray%7D%7Bccc%7D+f_1+%3D+10%2C+%26+%26++p_1%280%29+%3D+0.9+%5C%5C+%5C%5C++++++++++++++++++++++++++++f_2+%3D+1+%2C+%26+%26+p_2%280%29+%3D+0.1+++%5Cend%7Barray%7D+&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='&#92;begin{array}{ccc} f_1 = 10, &amp; &amp;  p_1(0) = 0.9 &#92;&#92; &#92;&#92;                            f_2 = 1 , &amp; &amp; p_2(0) = 0.1   &#92;end{array} ' title='&#92;begin{array}{ccc} f_1 = 10, &amp; &amp;  p_1(0) = 0.9 &#92;&#92; &#92;&#92;                            f_2 = 1 , &amp; &amp; p_2(0) = 0.1   &#92;end{array} ' class='latex' /></p>
<p>At what rate does the entropy change at <img src='https://s0.wp.com/latex.php?latex=t+%3D+0&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='t = 0' title='t = 0' class='latex' />?  Which species is responsible for most of this change?</p>
<p>I had to work through these examples to understand what&#8217;s going on.  Now I do, and it all makes sense.</p>
<h3> Next time </h3>
<p>Still, it would be nice if there were some quantity that <i>always goes down</i> with the passage of time, reflecting our naive idea that the population gains information from its environment, and thus loses entropy, as time goes by.  </p>
<p>Often there <i>is</i> such a quantity. But it&#8217;s not the naive entropy: it&#8217;s the <i>relative</i> entropy.  I&#8217;ll talk about that next time.  In the meantime, if you want to prepare, please reread <a href="http://math.ucr.edu/home/baez/information/information_geometry_6.html">Part 6</a> of this series, where I explained this concept.  Back then, I argued that <i>whenever you&#8217;re tempted to talk about entropy, you should talk about relative entropy</i>.  So, we should try that here.</p>
<p>There&#8217;s a big idea lurking here: <i>information is relative</i>.  How much information a signal gives you depends on your prior assumptions about what that signal is likely to be.  If this is true, perhaps biodiversity is relative too.  </p>
]]></html><thumbnail_url><![CDATA[https://i2.wp.com/math.ucr.edu/home/baez/mathematical/biodiversity_0.png?fit=440%2C330]]></thumbnail_url><thumbnail_height><![CDATA[330]]></thumbnail_height><thumbnail_width><![CDATA[353]]></thumbnail_width></oembed>