<?xml version="1.0" encoding="UTF-8" standalone="yes"?><oembed><version><![CDATA[1.0]]></version><provider_name><![CDATA[Azimuth]]></provider_name><provider_url><![CDATA[https://johncarlosbaez.wordpress.com]]></provider_url><author_name><![CDATA[John Baez]]></author_name><author_url><![CDATA[https://johncarlosbaez.wordpress.com/author/johncarlosbaez/]]></author_url><title><![CDATA[Relative Entropy (Part&nbsp;3)]]></title><type><![CDATA[link]]></type><html><![CDATA[<p>Holidays are great. There&#8217;s nothing I need to do!  Everybody is celebrating!  So, I can finally get some real work done.  </p>
<p>In the last couple of days I&#8217;ve finished a paper with Jamie Vicary on wormholes and entanglement&#8230; subject to his approval and corrections.  More on that later.  And now I&#8217;ve returned to working on a paper with Tobias Fritz where we give a Bayesian characterization of the concept of &#8216;relative entropy&#8217;.  This summer I wrote two blog articles about this paper:</p>
<p>&bull; <a href="https://johncarlosbaez.wordpress.com/2013/06/20/relative-entropy-part-1/">Relative Entropy (Part 1)</a>: how various structures important in probability theory arise naturally when you do linear algebra using only the nonnegative real numbers. </p>
<p>&bull; <a href="https://johncarlosbaez.wordpress.com/2013/07/02/relative-entropy-part-2/">Relative Entropy (Part 2)</a>: a category related to statistical inference, <img src='https://s0.wp.com/latex.php?latex=%5Cmathrm%7BFinStat%7D%2C&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='&#92;mathrm{FinStat},' title='&#92;mathrm{FinStat},' class='latex' /> and how relative entropy defines a functor on this category.</p>
<p>But then Tobias Fritz noticed a big problem.  Our characterization of relative entropy was inspired by this paper:</p>
<p>&bull; D. Petz, <a href="http://www.renyi.hu/~petz/pdf/52.pdf">Characterization of the relative entropy of states of matrix algebras</a>, <i>Acta Math. Hungar. </i> <b>59</b> (1992), 449&#8211;455.  </p>
<p>Here Petz sought to characterize relative entropy both in the &#8216;classical&#8217; case we are concerned with and in the more general &#8216;quantum&#8217; setting.  Our original goal was merely to express his results in a more category-theoretic framework!  Unfortunately Petz&#8217;s proof contained a significant flaw.  Tobias noticed this and spent a lot of time fixing it, with no help from me.  </p>
<p>Our paper is now self-contained, and considerably longer.   My job now is to polish it up and make it pretty.  What follows is the introduction, which should explain the basic ideas.  </p>
<h3> A Bayesian characterization of relative entropy </h3>
<p>This paper gives a new characterization of the concept of relative entropy, also known as &#8216;relative information&#8217;, &#8216;information gain&#8217; or &#8216;Kullback-Leibler divergence&#8217;.   Whenever we have two probability distributions <img src='https://s0.wp.com/latex.php?latex=p&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='p' title='p' class='latex' /> and <img src='https://s0.wp.com/latex.php?latex=q&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='q' title='q' class='latex' /> on the same finite set <img src='https://s0.wp.com/latex.php?latex=X%2C&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='X,' title='X,' class='latex' /> we can define the entropy of <img src='https://s0.wp.com/latex.php?latex=q&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='q' title='q' class='latex' /> relative to <img src='https://s0.wp.com/latex.php?latex=p&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='p' title='p' class='latex' />:</p>
<p><img src='https://s0.wp.com/latex.php?latex=S%28q%2Cp%29+%3D+%5Csum_%7Bx%5Cin+X%7D+q_x+%5Cln%5Cleft%28+%5Cfrac%7Bq_x%7D%7Bp_x%7D+%5Cright%29&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='S(q,p) = &#92;sum_{x&#92;in X} q_x &#92;ln&#92;left( &#92;frac{q_x}{p_x} &#92;right)' title='S(q,p) = &#92;sum_{x&#92;in X} q_x &#92;ln&#92;left( &#92;frac{q_x}{p_x} &#92;right)' class='latex' /></p>
<p>Here we set </p>
<p><img src='https://s0.wp.com/latex.php?latex=q_x+%5Cln%5Cleft%28+%5Cfrac%7Bq_x%7D%7Bp_x%7D+%5Cright%29&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='q_x &#92;ln&#92;left( &#92;frac{q_x}{p_x} &#92;right)' title='q_x &#92;ln&#92;left( &#92;frac{q_x}{p_x} &#92;right)' class='latex' /></p>
<p>equal to <img src='https://s0.wp.com/latex.php?latex=%5Cinfty&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='&#92;infty' title='&#92;infty' class='latex' /> when <img src='https://s0.wp.com/latex.php?latex=p_x+%3D+0%2C&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='p_x = 0,' title='p_x = 0,' class='latex' /> unless <img src='https://s0.wp.com/latex.php?latex=q_x&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='q_x' title='q_x' class='latex' /> is also zero, in which case we set it equal to 0.  Relative entropy thus takes values in <img src='https://s0.wp.com/latex.php?latex=%5B0%2C%5Cinfty%5D.&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='[0,&#92;infty].' title='[0,&#92;infty].' class='latex' />  </p>
<p>Intuitively speaking, <img src='https://s0.wp.com/latex.php?latex=S%28q%2Cp%29&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='S(q,p)' title='S(q,p)' class='latex' /> is the expected amount of information gained when we discover the probability distribution is really <img src='https://s0.wp.com/latex.php?latex=q%2C&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='q,' title='q,' class='latex' /> when we had thought it was <img src='https://s0.wp.com/latex.php?latex=p.&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='p.' title='p.' class='latex' />   We should think of <img src='https://s0.wp.com/latex.php?latex=p&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='p' title='p' class='latex' /> as a &#8216;prior&#8217; and <img src='https://s0.wp.com/latex.php?latex=q&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='q' title='q' class='latex' /> as a &#8216;posterior&#8217;.  When we take <img src='https://s0.wp.com/latex.php?latex=p&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='p' title='p' class='latex' /> to be the uniform distribution on <img src='https://s0.wp.com/latex.php?latex=X%2C&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='X,' title='X,' class='latex' /> relative entropy reduces to the ordinary Shannon entropy, up to an additive constant.  The advantage of relative entropy is that it makes the role of the prior explicit.</p>
<p>Since Bayesian probability theory emphasizes the role of the prior, relative entropy naturally lends itself to a Bayesian interpretation: it measures how much information we gain <i>given a certain prior</i>.  Our goal here is to make this precise in a mathematical characterization of relative entropy.  We do this using a category <img src='https://s0.wp.com/latex.php?latex=%5Cmathrm%7BFinStat%7D&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='&#92;mathrm{FinStat}' title='&#92;mathrm{FinStat}' class='latex' /> where:</p>
<p>&bull; an object <img src='https://s0.wp.com/latex.php?latex=%28X%2Cq%29&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='(X,q)' title='(X,q)' class='latex' /> consists of a finite set <img src='https://s0.wp.com/latex.php?latex=X&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='X' title='X' class='latex' /> and a probability distribution <img src='https://s0.wp.com/latex.php?latex=x+%5Cmapsto+q_x&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='x &#92;mapsto q_x' title='x &#92;mapsto q_x' class='latex' /> on that set;</p>
<p>&bull; a morphism $(f,s) : (X,q) \to (Y,r)$ consists of a measure-preserving function $f$ from $X$ to $Y,$ together with a probability distribution $x \mapsto s_{x y}$ on $X$ for each element <img src='https://s0.wp.com/latex.php?latex=y+%5Cin+Y%2C&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='y &#92;in Y,' title='y &#92;in Y,' class='latex' /> with the property that <img src='https://s0.wp.com/latex.php?latex=s_%7Bxy%7D+%3D+0&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='s_{xy} = 0' title='s_{xy} = 0' class='latex' /> unless <img src='https://s0.wp.com/latex.php?latex=f%28x%29+%3D+y.&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='f(x) = y.' title='f(x) = y.' class='latex' /></p>
<p>We can think of an object of <img src='https://s0.wp.com/latex.php?latex=%5Cmathrm%7BFinStat%7D&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='&#92;mathrm{FinStat}' title='&#92;mathrm{FinStat}' class='latex' /> as a system with some finite set of <b>states</b> together with a probability distribution on its states.   A morphism </p>
<p><img src='https://s0.wp.com/latex.php?latex=%28f%2Cs%29+%3A+%28X%2Cq%29+%5Cto+%28Y%2Cr%29&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='(f,s) : (X,q) &#92;to (Y,r)' title='(f,s) : (X,q) &#92;to (Y,r)' class='latex' />  </p>
<p>then consists of two parts.  First, there is a deterministic <b>measurement process</b> <img src='https://s0.wp.com/latex.php?latex=f+%3A+X+%5Cto+Y&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='f : X &#92;to Y' title='f : X &#92;to Y' class='latex' /> mapping states of the system being measured, <img src='https://s0.wp.com/latex.php?latex=X%2C&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='X,' title='X,' class='latex' /> to states of the measurement apparatus, <img src='https://s0.wp.com/latex.php?latex=Y.&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='Y.' title='Y.' class='latex' />  The condition that <img src='https://s0.wp.com/latex.php?latex=f&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='f' title='f' class='latex' /> be measure-preserving says that, after the measurement, the probability that the apparatus be in any state <img src='https://s0.wp.com/latex.php?latex=y+%5Cin+Y&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='y &#92;in Y' title='y &#92;in Y' class='latex' /> is the sum of the probabilities of all states of <img src='https://s0.wp.com/latex.php?latex=X&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='X' title='X' class='latex' /> leading to that outcome:</p>
<p><img src='https://s0.wp.com/latex.php?latex=%5Cdisplaystyle%7B++r_y+%3D+%5Csum_%7Bx+%5Cin+f%5E%7B-1%7D%28y%29%7D+q_x+%7D+&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='&#92;displaystyle{  r_y = &#92;sum_{x &#92;in f^{-1}(y)} q_x } ' title='&#92;displaystyle{  r_y = &#92;sum_{x &#92;in f^{-1}(y)} q_x } ' class='latex' /></p>
<p>Second, there is a <b>hypothesis</b> <img src='https://s0.wp.com/latex.php?latex=s&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='s' title='s' class='latex' />: an assumption about the probability <img src='https://s0.wp.com/latex.php?latex=s_%7Bxy%7D&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='s_{xy}' title='s_{xy}' class='latex' /> that the system being measured is in the state <img src='https://s0.wp.com/latex.php?latex=x+%5Cin+X&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='x &#92;in X' title='x &#92;in X' class='latex' /> given any measurement outcome <img src='https://s0.wp.com/latex.php?latex=y+%5Cin+Y.&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='y &#92;in Y.' title='y &#92;in Y.' class='latex' /></p>
<p>Suppose we have any morphism </p>
<p><img src='https://s0.wp.com/latex.php?latex=%28f%2Cs%29+%3A+%28X%2Cq%29+%5Cto+%28Y%2Cr%29&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='(f,s) : (X,q) &#92;to (Y,r)' title='(f,s) : (X,q) &#92;to (Y,r)' class='latex' />  </p>
<p>in <img src='https://s0.wp.com/latex.php?latex=%5Cmathrm%7BFinStat%7D.&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='&#92;mathrm{FinStat}.' title='&#92;mathrm{FinStat}.' class='latex' />   From this we obtain two probability distributions on the states of the system being measured.   First, we have the probability distribution <img src='https://s0.wp.com/latex.php?latex=p&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='p' title='p' class='latex' /> given by</p>
<p><img src='https://s0.wp.com/latex.php?latex=%5Cdisplaystyle%7B+++p_x+%3D+%5Csum_%7By+%5Cin+Y%7D+s_%7Bxy%7D+r_y+%7D+%5Cqquad+%5Cqquad+%281%29+&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='&#92;displaystyle{   p_x = &#92;sum_{y &#92;in Y} s_{xy} r_y } &#92;qquad &#92;qquad (1) ' title='&#92;displaystyle{   p_x = &#92;sum_{y &#92;in Y} s_{xy} r_y } &#92;qquad &#92;qquad (1) ' class='latex' /></p>
<p>This is our <b>prior</b>, given our hypothesis and the probability distribution of measurement results.  Second we have the &#8216;true&#8217; probability distribution <img src='https://s0.wp.com/latex.php?latex=q%2C&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='q,' title='q,' class='latex' /> which would be the <b>posterior</b> if we updated our prior using complete direct knowledge of the system being measured.  </p>
<p>It follows that any morphism in <img src='https://s0.wp.com/latex.php?latex=%5Cmathrm%7BFinStat%7D&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='&#92;mathrm{FinStat}' title='&#92;mathrm{FinStat}' class='latex' /> has a relative entropy <img src='https://s0.wp.com/latex.php?latex=S%28q%2Cp%29&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='S(q,p)' title='S(q,p)' class='latex' /> associated to it.   This is the expected amount of information we gain when we update our prior <img src='https://s0.wp.com/latex.php?latex=p&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='p' title='p' class='latex' /> to the posterior <img src='https://s0.wp.com/latex.php?latex=q.&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='q.' title='q.' class='latex' /></p>
<p>In fact, this way of assigning relative entropies to morphisms defines a functor</p>
<p><img src='https://s0.wp.com/latex.php?latex=F_0+%3A+%5Cmathrm%7BFinStat%7D+%5Cto+%5B0%2C%5Cinfty%5D+&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='F_0 : &#92;mathrm{FinStat} &#92;to [0,&#92;infty] ' title='F_0 : &#92;mathrm{FinStat} &#92;to [0,&#92;infty] ' class='latex' /> </p>
<p>where we use <img src='https://s0.wp.com/latex.php?latex=%5B0%2C%5Cinfty%5D&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='[0,&#92;infty]' title='[0,&#92;infty]' class='latex' /> to denote the category with one object, the numbers <img src='https://s0.wp.com/latex.php?latex=0+%5Cle+x+%5Cle+%5Cinfty&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='0 &#92;le x &#92;le &#92;infty' title='0 &#92;le x &#92;le &#92;infty' class='latex' /> as morphisms, and addition as composition.   More precisely, if </p>
<p><img src='https://s0.wp.com/latex.php?latex=%28f%2Cs%29+%3A+%28X%2Cq%29+%5Cto+%28Y%2Cr%29&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='(f,s) : (X,q) &#92;to (Y,r)' title='(f,s) : (X,q) &#92;to (Y,r)' class='latex' /> </p>
<p>is any morphism in <img src='https://s0.wp.com/latex.php?latex=%5Cmathrm%7BFinStat%7D%2C&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='&#92;mathrm{FinStat},' title='&#92;mathrm{FinStat},' class='latex' /> we define</p>
<p><img src='https://s0.wp.com/latex.php?latex=F_0%28f%2Cs%29+%3D+S%28q%2Cp%29++&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='F_0(f,s) = S(q,p)  ' title='F_0(f,s) = S(q,p)  ' class='latex' /></p>
<p>where the prior <img src='https://s0.wp.com/latex.php?latex=p&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='p' title='p' class='latex' /> is defined as in the equation (1).  </p>
<p>The fact that <img src='https://s0.wp.com/latex.php?latex=F_0&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='F_0' title='F_0' class='latex' /> is a functor is nontrivial and rather interesting.   It says that given any composable pair of measurement processes:</p>
<p><img src='https://s0.wp.com/latex.php?latex=%28X%2Cq%29+%5Cstackrel%7B%28f%2Cs%29%7D%7B%5Clongrightarrow%7D+%28Y%2Cr%29+%5Cstackrel%7B%28g%2Ct%29%7D%7B%5Clongrightarrow%7D+%28Z%2Cu%29+&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='(X,q) &#92;stackrel{(f,s)}{&#92;longrightarrow} (Y,r) &#92;stackrel{(g,t)}{&#92;longrightarrow} (Z,u) ' title='(X,q) &#92;stackrel{(f,s)}{&#92;longrightarrow} (Y,r) &#92;stackrel{(g,t)}{&#92;longrightarrow} (Z,u) ' class='latex' /></p>
<p>the relative entropy of their composite is the sum of the relative entropies of the two parts:</p>
<p><img src='https://s0.wp.com/latex.php?latex=F_0%28%28g%2Ct%29+%5Ccirc+%28f%2Cs%29%29+%3D+F_0%28g%2Ct%29+%2B+F_0%28f%2Cs%29+.&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='F_0((g,t) &#92;circ (f,s)) = F_0(g,t) + F_0(f,s) .' title='F_0((g,t) &#92;circ (f,s)) = F_0(g,t) + F_0(f,s) .' class='latex' /></p>
<p>We prove that <img src='https://s0.wp.com/latex.php?latex=F_0&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='F_0' title='F_0' class='latex' /> is a functor.  However, we go much further: we <i>characterize</i> relative entropy by saying that up to a constant multiple, <img src='https://s0.wp.com/latex.php?latex=F_0&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='F_0' title='F_0' class='latex' /> is the <i>unique</i> functor from <img src='https://s0.wp.com/latex.php?latex=%5Cmathrm%7BFinStat%7D&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='&#92;mathrm{FinStat}' title='&#92;mathrm{FinStat}' class='latex' /> to <img src='https://s0.wp.com/latex.php?latex=%5B0%2C%5Cinfty%5D&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='[0,&#92;infty]' title='[0,&#92;infty]' class='latex' /> obeying three reasonable conditions.</p>
<p>The first condition is that <img src='https://s0.wp.com/latex.php?latex=F_0&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='F_0' title='F_0' class='latex' /> vanishes on morphisms  <img src='https://s0.wp.com/latex.php?latex=%28f%2Cs%29+%3A+%28X%2Cq%29+%5Cto+%28Y%2Cr%29&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='(f,s) : (X,q) &#92;to (Y,r)' title='(f,s) : (X,q) &#92;to (Y,r)' class='latex' />   where the hypothesis <img src='https://s0.wp.com/latex.php?latex=s&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='s' title='s' class='latex' /> is <b>optimal</b>.  By this, we mean that Equation (1) gives a prior <img src='https://s0.wp.com/latex.php?latex=p&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='p' title='p' class='latex' /> equal to the &#8216;true&#8217; probability distribution <img src='https://s0.wp.com/latex.php?latex=q&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='q' title='q' class='latex' /> on the states of the system being measured.  </p>
<p>The second condition is that <img src='https://s0.wp.com/latex.php?latex=F_0&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='F_0' title='F_0' class='latex' /> is lower semicontinuous. The set <img src='https://s0.wp.com/latex.php?latex=P%28X%29&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='P(X)' title='P(X)' class='latex' /> of probability distibutions on a finite set <img src='https://s0.wp.com/latex.php?latex=X&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='X' title='X' class='latex' /> naturally has the topology of an <img src='https://s0.wp.com/latex.php?latex=%28n-1%29&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='(n-1)' title='(n-1)' class='latex' />-simplex when <img src='https://s0.wp.com/latex.php?latex=X&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='X' title='X' class='latex' /> has <img src='https://s0.wp.com/latex.php?latex=n&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='n' title='n' class='latex' /> elements.   The set <img src='https://s0.wp.com/latex.php?latex=%5B0%2C%5Cinfty%5D&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='[0,&#92;infty]' title='[0,&#92;infty]' class='latex' /> has an obvious topology where it&#8217;s homeomorphic to a closed interval.   However, with these topologies, the relative entropy does not define a continuous function</p>
<p><img src='https://s0.wp.com/latex.php?latex=%5Cbegin%7Barray%7D%7Brcl%7D+++++++++S+%3A+P%28X%29+%5Ctimes+P%28X%29+%26%5Cto%26+%5B0%2C%5Cinfty%5D++%5C%5C++++++++++++++++++++++++++++++++++++++++++++%28q%2Cp%29+%26%5Cmapsto+%26+S%28q%2Cp%29+.++%5Cend%7Barray%7D&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='&#92;begin{array}{rcl}         S : P(X) &#92;times P(X) &amp;&#92;to&amp; [0,&#92;infty]  &#92;&#92;                                            (q,p) &amp;&#92;mapsto &amp; S(q,p) .  &#92;end{array}' title='&#92;begin{array}{rcl}         S : P(X) &#92;times P(X) &amp;&#92;to&amp; [0,&#92;infty]  &#92;&#92;                                            (q,p) &amp;&#92;mapsto &amp; S(q,p) .  &#92;end{array}' class='latex' /></p>
<p>The problem is that </p>
<p><img src='https://s0.wp.com/latex.php?latex=%5Cdisplaystyle%7B+S%28q%2Cp%29+%3D+%5Csum_%7Bx%5Cin+X%7D+q_x+%5Cln%5Cleft%28+%5Cfrac%7Bq_x%7D%7Bp_x%7D+%5Cright%29+%7D+&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='&#92;displaystyle{ S(q,p) = &#92;sum_{x&#92;in X} q_x &#92;ln&#92;left( &#92;frac{q_x}{p_x} &#92;right) } ' title='&#92;displaystyle{ S(q,p) = &#92;sum_{x&#92;in X} q_x &#92;ln&#92;left( &#92;frac{q_x}{p_x} &#92;right) } ' class='latex' /></p>
<p>and we define <img src='https://s0.wp.com/latex.php?latex=q_x+%5Cln%28q_x%2Fp_x%29&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='q_x &#92;ln(q_x/p_x)' title='q_x &#92;ln(q_x/p_x)' class='latex' /> to be <img src='https://s0.wp.com/latex.php?latex=%5Cinfty&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='&#92;infty' title='&#92;infty' class='latex' /> when <img src='https://s0.wp.com/latex.php?latex=p_x+%3D+0&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='p_x = 0' title='p_x = 0' class='latex' /> and <img src='https://s0.wp.com/latex.php?latex=q_x+%3E+0&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='q_x &gt; 0' title='q_x &gt; 0' class='latex' /> but <img src='https://s0.wp.com/latex.php?latex=0&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='0' title='0' class='latex' /> when <img src='https://s0.wp.com/latex.php?latex=p_x+%3D+q_x+%3D+0.&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='p_x = q_x = 0.' title='p_x = q_x = 0.' class='latex' />  So, it turns out that <img src='https://s0.wp.com/latex.php?latex=S&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='S' title='S' class='latex' /> is only <b>lower semicontinuous</b>, meaning that if <img src='https://s0.wp.com/latex.php?latex=p%5Ei+%2C+q%5Ei&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='p^i , q^i' title='p^i , q^i' class='latex' /> are sequences of probability distributions on <img src='https://s0.wp.com/latex.php?latex=X&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='X' title='X' class='latex' /> with <img src='https://s0.wp.com/latex.php?latex=p%5Ei+%5Cto+p&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='p^i &#92;to p' title='p^i &#92;to p' class='latex' /> and <img src='https://s0.wp.com/latex.php?latex=q%5Ei+%5Cto+q&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='q^i &#92;to q' title='q^i &#92;to q' class='latex' /> then</p>
<p><img src='https://s0.wp.com/latex.php?latex=S%28q%2Cp%29+%5Cle+%5Climinf_%7Bi+%5Cto+%5Cinfty%7D+S%28q%5Ei%2C+p%5Ei%29++&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='S(q,p) &#92;le &#92;liminf_{i &#92;to &#92;infty} S(q^i, p^i)  ' title='S(q,p) &#92;le &#92;liminf_{i &#92;to &#92;infty} S(q^i, p^i)  ' class='latex' /></p>
<p>We give the set of morphisms in <img src='https://s0.wp.com/latex.php?latex=%5Cmathrm%7BFinStat%7D&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='&#92;mathrm{FinStat}' title='&#92;mathrm{FinStat}' class='latex' />  its most obvious topology, and show that with this topology, <img src='https://s0.wp.com/latex.php?latex=F_0&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='F_0' title='F_0' class='latex' /> maps morphisms to morphisms in a lower semicontinuous way.  </p>
<p>The third condition is that <img src='https://s0.wp.com/latex.php?latex=F_0&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='F_0' title='F_0' class='latex' /> is convex linear.  We describe how to take convex linear combinations of morphisms in <img src='https://s0.wp.com/latex.php?latex=%5Cmathrm%7BFinStat%7D%2C&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='&#92;mathrm{FinStat},' title='&#92;mathrm{FinStat},' class='latex' /> and then the functor <img src='https://s0.wp.com/latex.php?latex=F_0&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='F_0' title='F_0' class='latex' /> is convex linear in the sense that it maps any convex linear combination of morphisms in <img src='https://s0.wp.com/latex.php?latex=%5Cmathrm%7BFinStat%7D&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='&#92;mathrm{FinStat}' title='&#92;mathrm{FinStat}' class='latex' /> to the corresponding convex linear combination of numbers in <img src='https://s0.wp.com/latex.php?latex=%5B0%2C%5Cinfty%5D.&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='[0,&#92;infty].' title='[0,&#92;infty].' class='latex' />  Intuitively, this means that if we take a coin with probability <img src='https://s0.wp.com/latex.php?latex=P&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='P' title='P' class='latex' /> of landing heads up, and flip it to decide whether to perform one measurement process or another, the expected information gained is <img src='https://s0.wp.com/latex.php?latex=P&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='P' title='P' class='latex' /> times the expected information gain of the first process plus <img src='https://s0.wp.com/latex.php?latex=1-P&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='1-P' title='1-P' class='latex' /> times the expected information gain of the second process.</p>
<p>Here, then, is our main theorem:</p>
<p><b>Theorem.</b>  Any lower semicontinuous, convex-linear functor</p>
<p><img src='https://s0.wp.com/latex.php?latex=F+%3A+%5Cmathrm%7BFinStat%7D+%5Cto+%5B0%2C%5Cinfty%5D+&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='F : &#92;mathrm{FinStat} &#92;to [0,&#92;infty] ' title='F : &#92;mathrm{FinStat} &#92;to [0,&#92;infty] ' class='latex' /></p>
<p>that vanishes on every morphism with an optimal hypothesis must equal some constant times the relative entropy.  In other words, there exists some constant <img src='https://s0.wp.com/latex.php?latex=c+%5Cin+%5B0%2C%5Cinfty%5D&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='c &#92;in [0,&#92;infty]' title='c &#92;in [0,&#92;infty]' class='latex' /> such that </p>
<p><img src='https://s0.wp.com/latex.php?latex=F%28f%2Cs%29+%3D+c+F_0%28f%2Cs%29++&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='F(f,s) = c F_0(f,s)  ' title='F(f,s) = c F_0(f,s)  ' class='latex' /></p>
<p>for any any morphism <img src='https://s0.wp.com/latex.php?latex=%28f%2Cs%29+%3A+%28X%2Cp%29+%5Cto+%28Y%2Cq%29&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='(f,s) : (X,p) &#92;to (Y,q)' title='(f,s) : (X,p) &#92;to (Y,q)' class='latex' /> in <img src='https://s0.wp.com/latex.php?latex=%5Cmathrm%7BFinStat%7D.&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='&#92;mathrm{FinStat}.' title='&#92;mathrm{FinStat}.' class='latex' /></p>
<h3> Remarks </h3>
<p>If you&#8217;re a maniacally thorough reader of this blog, with a photographic memory, you&#8217;ll recall that our theorem now says &#8216;lower semicontinuous&#8217;, where in <a href="https://johncarlosbaez.wordpress.com/2013/07/02/relative-entropy-part-2/">Part 2</a> of this series I&#8217;d originally said &#8216;continuous&#8217;.  </p>
<p>I&#8217;ve fixed that blog article now&#8230; but it was Tobias who noticed this mistake.  In the process of fixing our proof to address this issue, he eventually noticed that the proof of Petz&#8217;s theorem, which we&#8217;d been planning to use in our work, was also flawed.</p>
<p>Now I just need to finish polishing the rest of the paper!  </p>
]]></html></oembed>