<?xml version="1.0" encoding="UTF-8" standalone="yes"?><oembed><version><![CDATA[1.0]]></version><provider_name><![CDATA[Azimuth]]></provider_name><provider_url><![CDATA[https://johncarlosbaez.wordpress.com]]></provider_url><author_name><![CDATA[John Baez]]></author_name><author_url><![CDATA[https://johncarlosbaez.wordpress.com/author/johncarlosbaez/]]></author_url><title><![CDATA[Relative Entropy (Part&nbsp;2)]]></title><type><![CDATA[link]]></type><html><![CDATA[<p>In the <a href="https://johncarlosbaez.wordpress.com/2013/06/20/relative-entropy-part-1/">first part</a> of this mini-series, I describe how various ideas important in probability theory arise naturally when you start doing linear algebra using only the nonnegative real numbers.  </p>
<p>But after writing it, I got an email from a rather famous physicist saying he got &#8220;lost at line two&#8221;.  So, you&#8217;ll be happy to hear that the first part is <i>not a prerequisite</i> for the remaining parts!  I wrote it just to intimidate that guy.</p>
<p>Tobias Fritz and I have proved a theorem characterizing the concept of <a href="http://math.ucr.edu/home/baez/information/information_geometry_6.html">relative entropy</a>, which is also known as &#8216;relative information&#8217;, &#8216;information gain&#8217; or&#8212;most terrifying and least helpful of all&#8212;&#8216;Kullback-Leibler divergence&#8217;.   In this second part I&#8217;ll introduce two key players in this theorem.  The first, <img src='https://s0.wp.com/latex.php?latex=%5Cmathrm%7BFinStat%7D%2C&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='&#92;mathrm{FinStat},' title='&#92;mathrm{FinStat},' class='latex' /> is a category where:</p>
<p>&bull; an object consists of a system with finitely many states, and a probability distribution on those states</p>
<p>and</p>
<p>&bull; a morphism consists of a deterministic &#8216;measurement process&#8217; mapping states of one system to states of another, together with a &#8216;hypothesis&#8217; that lets the observer guess a probability distribution of states of the system being measured, based on what they observe.</p>
<p>The second, <img src='https://s0.wp.com/latex.php?latex=%5Cmathrm%7BFP%7D%2C&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='&#92;mathrm{FP},' title='&#92;mathrm{FP},' class='latex' /> is a subcategory of <img src='https://s0.wp.com/latex.php?latex=%5Cmathrm%7BFinStat%7D.&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='&#92;mathrm{FinStat}.' title='&#92;mathrm{FinStat}.' class='latex' />  It has all the same objects, but only morphisms where the hypothesis is &#8216;optimal&#8217;.  This means that if the observer measures the system many times, and uses the probability distribution of their observations together with their hypothesis to guess the probability distribution of states of the system, they <i>get the correct answer</i> (in the limit of many measurements).</p>
<p>In this part all I will really do is explain precisely what <img src='https://s0.wp.com/latex.php?latex=%5Cmathrm%7BFinStat%7D&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='&#92;mathrm{FinStat}' title='&#92;mathrm{FinStat}' class='latex' /> and <img src='https://s0.wp.com/latex.php?latex=%5Cmathrm%7BFP%7D&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='&#92;mathrm{FP}' title='&#92;mathrm{FP}' class='latex' /> are.  But to whet your appetite, let me explain how we can use them to give a new characterization of relative entropy!</p>
<p>Suppose we have any morphism in <img src='https://s0.wp.com/latex.php?latex=%5Cmathrm%7BFinStat%7D.&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='&#92;mathrm{FinStat}.' title='&#92;mathrm{FinStat}.' class='latex' />  In other words: suppose we have a deterministic measurement process, together with a hypothesis that lets the observer guess a probability distribution of states of the system being measured, based on what they observe.  </p>
<p>Then we have <i>two</i> probability distributions on the states of the system being measured!  First, the &#8216;true&#8217; probability distribution.  Second, the probability that the observer will guess based on their observations.</p>
<p>Whenever we have two probability distributions on the same set, we can compute the entropy of the first <i>relative to</i> to the second.  This describes how surprised you&#8217;ll be if you discover the probability distribution is really the first, when you thought it was the second.</p>
<p>So: any morphism in <img src='https://s0.wp.com/latex.php?latex=%5Cmathrm%7BFinStat%7D&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='&#92;mathrm{FinStat}' title='&#92;mathrm{FinStat}' class='latex' /> will have a relative entropy.  It will describe how surprised the observer will be when they discover the true probability distribution, given what they had guessed.  </p>
<p>But this amount of surprise will be <i>zero</i> if their hypothesis was &#8216;optimal&#8217; in the sense I described.   So, the relative entropy will vanish on morphisms in <img src='https://s0.wp.com/latex.php?latex=%5Cmathrm%7BFP%7D.&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='&#92;mathrm{FP}.' title='&#92;mathrm{FP}.' class='latex' /></p>
<p>Our theorem says this fact almost characterizes the concept of relative entropy!  More precisely, it says that any convex-linear lower semicontinuous functor</p>
<p><img src='https://s0.wp.com/latex.php?latex=F+%3A+%5Cmathrm%7BFinStat%7D+%5Cto+%5B0%2C%5Cinfty%5D+&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='F : &#92;mathrm{FinStat} &#92;to [0,&#92;infty] ' title='F : &#92;mathrm{FinStat} &#92;to [0,&#92;infty] ' class='latex' /></p>
<p>that vanishes on the subcategory <img src='https://s0.wp.com/latex.php?latex=%5Cmathrm%7BFP%7D&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='&#92;mathrm{FP}' title='&#92;mathrm{FP}' class='latex' /> must equal some constant times the relative entropy.  </p>
<p>Don&#8217;t be scared!  This should not make sense to you yet, since I haven&#8217;t said how I&#8217;m thinking of <img src='https://s0.wp.com/latex.php?latex=%5B0%2C%2B%5Cinfty%5D+&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='[0,+&#92;infty] ' title='[0,+&#92;infty] ' class='latex' /> as a category, nor what a &#8216;convex-linear lower semicontinuous functor&#8217; is, nor how relative entropy gives one.  I will explain all that later.  I just want you to get a vague idea of where I&#8217;m going.  </p>
<p>Now let me explain the categories <img src='https://s0.wp.com/latex.php?latex=%5Cmathrm%7BFinStat%7D&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='&#92;mathrm{FinStat}' title='&#92;mathrm{FinStat}' class='latex' /> and <img src='https://s0.wp.com/latex.php?latex=%5Cmathrm%7BFP%7D.&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='&#92;mathrm{FP}.' title='&#92;mathrm{FP}.' class='latex' />  We need to warm up a bit first.</p>
<h3> FinStoch </h3>
<p>A stochastic map <img src='https://s0.wp.com/latex.php?latex=f+%3A+X+%5Cleadsto+Y&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='f : X &#92;leadsto Y' title='f : X &#92;leadsto Y' class='latex' /> is different from an ordinary function, because instead of assigning a unique element of <img src='https://s0.wp.com/latex.php?latex=Y&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='Y' title='Y' class='latex' /> to each element of <img src='https://s0.wp.com/latex.php?latex=X%2C&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='X,' title='X,' class='latex' /> it assigns a <i>probability distribution on</i> <img src='https://s0.wp.com/latex.php?latex=Y&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='Y' title='Y' class='latex' /> to each element of <img src='https://s0.wp.com/latex.php?latex=X.&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='X.' title='X.' class='latex' />  So you should imagine it as being like a function &#8216;with random noise added&#8217;, so that <img src='https://s0.wp.com/latex.php?latex=f%28x%29&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='f(x)' title='f(x)' class='latex' /> is not a specific element of <img src='https://s0.wp.com/latex.php?latex=Y%2C&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='Y,' title='Y,' class='latex' /> but instead has a probability of taking on different values.  This is why I&#8217;m using a weird wiggly arrow to denote a stochastic map.</p>
<p>More formally:</p>
<p><b>Definition.</b>  Given finite sets <img src='https://s0.wp.com/latex.php?latex=X&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='X' title='X' class='latex' /> and <img src='https://s0.wp.com/latex.php?latex=Y%2C&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='Y,' title='Y,' class='latex' /> a <b>stochastic map</b> <img src='https://s0.wp.com/latex.php?latex=f+%3A+X+%5Cleadsto+Y&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='f : X &#92;leadsto Y' title='f : X &#92;leadsto Y' class='latex' /> assigns a real number <img src='https://s0.wp.com/latex.php?latex=f_%7Byx%7D&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='f_{yx}' title='f_{yx}' class='latex' /> to each pair <img src='https://s0.wp.com/latex.php?latex=x+%5Cin+X%2C+y+%5Cin+Y%2C&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='x &#92;in X, y &#92;in Y,' title='x &#92;in X, y &#92;in Y,' class='latex' /> such that fixing any element <img src='https://s0.wp.com/latex.php?latex=x%2C&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='x,' title='x,' class='latex' /> the numbers <img src='https://s0.wp.com/latex.php?latex=f_%7Byx%7D&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='f_{yx}' title='f_{yx}' class='latex' /> form a probability distribution on <img src='https://s0.wp.com/latex.php?latex=Y.&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='Y.' title='Y.' class='latex' />  We call <img src='https://s0.wp.com/latex.php?latex=f_%7Byx%7D&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='f_{yx}' title='f_{yx}' class='latex' /> <b>the probability of <img src='https://s0.wp.com/latex.php?latex=y&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='y' title='y' class='latex' /> given <img src='https://s0.wp.com/latex.php?latex=x.&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='x.' title='x.' class='latex' /></b></p>
<p>In more detail:</p>
<p>&bull; <img src='https://s0.wp.com/latex.php?latex=f_%7Byx%7D+%5Cge+0&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='f_{yx} &#92;ge 0' title='f_{yx} &#92;ge 0' class='latex' /> for all <img src='https://s0.wp.com/latex.php?latex=x+%5Cin+X%2C&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='x &#92;in X,' title='x &#92;in X,' class='latex' /> <img src='https://s0.wp.com/latex.php?latex=y+%5Cin+Y.&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='y &#92;in Y.' title='y &#92;in Y.' class='latex' /></p>
<p>and</p>
<p>&bull; <img src='https://s0.wp.com/latex.php?latex=%5Cdisplaystyle%7B+%5Csum_%7By+%5Cin+Y%7D+f_%7Byx%7D+%3D+1%7D&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='&#92;displaystyle{ &#92;sum_{y &#92;in Y} f_{yx} = 1}' title='&#92;displaystyle{ &#92;sum_{y &#92;in Y} f_{yx} = 1}' class='latex' /> for all <img src='https://s0.wp.com/latex.php?latex=x+%5Cin+X.&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='x &#92;in X.' title='x &#92;in X.' class='latex' /></p>
<p>Note that we can think of <img src='https://s0.wp.com/latex.php?latex=f+%3A+X+%5Cleadsto+Y&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='f : X &#92;leadsto Y' title='f : X &#92;leadsto Y' class='latex' /> as a <img src='https://s0.wp.com/latex.php?latex=Y+%5Ctimes+X&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='Y &#92;times X' title='Y &#92;times X' class='latex' />-shaped matrix of numbers.  A matrix obeying the two properties above is called <b>stochastic</b>.  This viewpoint is nice because it reduces the problem of composing stochastic maps to matrix multiplication.  It&#8217;s easy to check that multiplying two stochastic matrices gives a stochastic matrix.  So, composing stochastic maps gives a stochastic map.  </p>
<p>We thus get a category:</p>
<p><b>Definition.</b>  Let <img src='https://s0.wp.com/latex.php?latex=%5Cmathrm%7BFinStoch%7D&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='&#92;mathrm{FinStoch}' title='&#92;mathrm{FinStoch}' class='latex' /> be the category of finite sets and stochastic maps between them.  </p>
<p>In case you&#8217;re wondering why I&#8217;m restricting attention to <i>finite</i> sets, it&#8217;s merely because I want to keep things simple.  I don&#8217;t want to worry about whether sums or integrals converge.</p>
<h3> FinProb </h3>
<p>Now take your favorite 1-element set and call it <img src='https://s0.wp.com/latex.php?latex=1.&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='1.' title='1.' class='latex' />  A function <img src='https://s0.wp.com/latex.php?latex=p+%3A+1+%5Cto+X&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='p : 1 &#92;to X' title='p : 1 &#92;to X' class='latex' /> is just a point of <img src='https://s0.wp.com/latex.php?latex=X.&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='X.' title='X.' class='latex' />  But a stochastic map <img src='https://s0.wp.com/latex.php?latex=p+%3A+1+%5Cleadsto+X&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='p : 1 &#92;leadsto X' title='p : 1 &#92;leadsto X' class='latex' /> is something more interesting: it&#8217;s a probability distribution on <img src='https://s0.wp.com/latex.php?latex=X.&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='X.' title='X.' class='latex' />  </p>
<p>Why?  Because it gives a probability distribution on <img src='https://s0.wp.com/latex.php?latex=X&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='X' title='X' class='latex' /> for each element of <img src='https://s0.wp.com/latex.php?latex=1%2C&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='1,' title='1,' class='latex' /> but that set has just one element.</p>
<p>Last time I introduced the rather long-winded phrase <b>finite probability measure space</b> to mean a finite set with a probability distribution on it.  But now we&#8217;ve seen a very quick way to describe such a thing within <img src='https://s0.wp.com/latex.php?latex=%5Cmathrm%7BFinStoch%7D%3A&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='&#92;mathrm{FinStoch}:' title='&#92;mathrm{FinStoch}:' class='latex' /></p>
<div align="center">
<img height="150" src="https://i2.wp.com/math.ucr.edu/home/baez/mathematical/FinProb_object.jpg" />
</div>
<p>And this gives a quick way to think about a measure-preserving function between finite probability measure spaces!  It&#8217;s just a commutative triangle like this:</p>
<div align="center">
<img height="200" src="https://i1.wp.com/math.ucr.edu/home/baez/mathematical/FinProb_morphism.jpg" />
</div>
<p>Note that the horizontal arrow <img src='https://s0.wp.com/latex.php?latex=f%3A++X+%5Cto+Y&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='f:  X &#92;to Y' title='f:  X &#92;to Y' class='latex' /> is not wiggly. The straight arrow means it&#8217;s an honest function, not a stochastic map.  But a function is a special case of a stochastic map!  So it makes sense to compose a straight arrow with a wiggly arrow&#8212;and the result is, in general, a wiggly arrow.  So, it makes sense to demand that this triangle commutes, and this says that the function <img src='https://s0.wp.com/latex.php?latex=f%3A+X+%5Cto+Y&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='f: X &#92;to Y' title='f: X &#92;to Y' class='latex' /> is measure-preserving.  </p>
<p>Let me work through the details, in case they&#8217;re not clear.</p>
<p>First: how is a function a special case of a stochastic map?  Here&#8217;s how.  If we start with a function <img src='https://s0.wp.com/latex.php?latex=f%3A+X+%5Cto+Y%2C&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='f: X &#92;to Y,' title='f: X &#92;to Y,' class='latex' /> we get a matrix of numbers</p>
<p><img src='https://s0.wp.com/latex.php?latex=f_%7Byx%7D+%3D+%5Cdelta_%7By%2Cf%28x%29%7D+&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='f_{yx} = &#92;delta_{y,f(x)} ' title='f_{yx} = &#92;delta_{y,f(x)} ' class='latex' /></p>
<p>where <img src='https://s0.wp.com/latex.php?latex=%5Cdelta&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='&#92;delta' title='&#92;delta' class='latex' /> is the Kronecker delta.  So, each element <img src='https://s0.wp.com/latex.php?latex=x+%5Cin+X&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='x &#92;in X' title='x &#92;in X' class='latex' /> gives a probability distribution that&#8217;s zero except at <img src='https://s0.wp.com/latex.php?latex=f%28x%29.&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='f(x).' title='f(x).' class='latex' /></p>
<p>Given this, we can work out what this commuting triangle really says:</p>
<div align="center">
<img height="200" src="https://i1.wp.com/math.ucr.edu/home/baez/mathematical/FinProb_morphism.jpg" />
</div>
<p>If use <img src='https://s0.wp.com/latex.php?latex=p_x&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='p_x' title='p_x' class='latex' /> to stand for the probability distribution that <img src='https://s0.wp.com/latex.php?latex=p%3A+1+%5Cleadsto+X&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='p: 1 &#92;leadsto X' title='p: 1 &#92;leadsto X' class='latex' /> puts on <img src='https://s0.wp.com/latex.php?latex=X%2C&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='X,' title='X,' class='latex' /> and similarly for <img src='https://s0.wp.com/latex.php?latex=q_y%2C&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='q_y,' title='q_y,' class='latex' /> the commuting triangle says</p>
<p><img src='https://s0.wp.com/latex.php?latex=%5Cdisplaystyle%7B+q_y+%3D+%5Csum_%7Bx+%5Cin+X%7D+%5Cdelta_%7By%2Cf%28x%29%7D+p_x%7D+&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='&#92;displaystyle{ q_y = &#92;sum_{x &#92;in X} &#92;delta_{y,f(x)} p_x} ' title='&#92;displaystyle{ q_y = &#92;sum_{x &#92;in X} &#92;delta_{y,f(x)} p_x} ' class='latex' /></p>
<p>or in other words:</p>
<p><img src='https://s0.wp.com/latex.php?latex=%5Cdisplaystyle%7B+q_y+%3D+%5Csum_%7Bx+%5Cin+X+%3A+f%28x%29+%3D+y%7D+p_x+%7D+&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='&#92;displaystyle{ q_y = &#92;sum_{x &#92;in X : f(x) = y} p_x } ' title='&#92;displaystyle{ q_y = &#92;sum_{x &#92;in X : f(x) = y} p_x } ' class='latex' /></p>
<p>or if you like:</p>
<p><img src='https://s0.wp.com/latex.php?latex=%5Cdisplaystyle%7B+q_y+%3D+%5Csum_%7Bx+%5Cin+f%5E%7B-1%7D%28y%29%7D+p_x+%7D+&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='&#92;displaystyle{ q_y = &#92;sum_{x &#92;in f^{-1}(y)} p_x } ' title='&#92;displaystyle{ q_y = &#92;sum_{x &#92;in f^{-1}(y)} p_x } ' class='latex' /></p>
<p>In this situation people say <img src='https://s0.wp.com/latex.php?latex=q&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='q' title='q' class='latex' /> is <img src='https://s0.wp.com/latex.php?latex=p&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='p' title='p' class='latex' /> <b>pushed forward along <img src='https://s0.wp.com/latex.php?latex=f&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='f' title='f' class='latex' /></b>, and they say <img src='https://s0.wp.com/latex.php?latex=f&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='f' title='f' class='latex' /> is a <b>measure-preserving function</b>.</p>
<p>So, we&#8217;ve used <img src='https://s0.wp.com/latex.php?latex=%5Cmathrm%7BFinStoch%7D&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='&#92;mathrm{FinStoch}' title='&#92;mathrm{FinStoch}' class='latex' /> to describe another important category:</p>
<p><b>Definition.</b>  Let <img src='https://s0.wp.com/latex.php?latex=%5Cmathrm%7BFinProb%7D&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='&#92;mathrm{FinProb}' title='&#92;mathrm{FinProb}' class='latex' /> be the category of finite probability measure spaces and measure-preserving functions between them.</p>
<p>I can&#8217;t resist mentioning another variation:</p>
<div align="center">
<img height="200" src="https://i2.wp.com/math.ucr.edu/home/baez/mathematical/measure-preserving_stochastic_map.jpg" />
</div>
<p>A commuting triangle like this is a <b>measure-preserving stochastic map</b>.  In other words, <img src='https://s0.wp.com/latex.php?latex=p&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='p' title='p' class='latex' /> gives a probability measure on <img src='https://s0.wp.com/latex.php?latex=X%2C&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='X,' title='X,' class='latex' /> <img src='https://s0.wp.com/latex.php?latex=q&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='q' title='q' class='latex' /> gives a probability measure on <img src='https://s0.wp.com/latex.php?latex=Y%2C&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='Y,' title='Y,' class='latex' /> and <img src='https://s0.wp.com/latex.php?latex=f%3A+X+%5Cleadsto+Y&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='f: X &#92;leadsto Y' title='f: X &#92;leadsto Y' class='latex' /> is a stochastic map with </p>
<p><img src='https://s0.wp.com/latex.php?latex=%5Cdisplaystyle%7B+q_y+%3D+%5Csum_%7Bx+%5Cin+X%7D+f_%7Byx%7D+p_x+%7D+&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='&#92;displaystyle{ q_y = &#92;sum_{x &#92;in X} f_{yx} p_x } ' title='&#92;displaystyle{ q_y = &#92;sum_{x &#92;in X} f_{yx} p_x } ' class='latex' /></p>
<h3> FinStat </h3>
<p>The category we really need for relative entropy is a bit more subtle.  An object is a finite probability measure space:</p>
<div align="center">
<img height="150" src="https://i2.wp.com/math.ucr.edu/home/baez/mathematical/FinProb_object.jpg" />
</div>
<p>but a morphism looks like this:</p>
<div align="center">
<img height="250" src="https://i1.wp.com/math.ucr.edu/home/baez/mathematical/FinStat_morphism.jpg" />
</div>
<p>The whole diagram doesn&#8217;t commute, but the two equations I wrote down hold.  The first equation says that <img src='https://s0.wp.com/latex.php?latex=f%3A+X+%5Cto+Y&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='f: X &#92;to Y' title='f: X &#92;to Y' class='latex' /> is a measure-preserving function.  In other words, this triangle, which we&#8217;ve seen before, commutes:</p>
<div align="center">
<img height="200" src="https://i1.wp.com/math.ucr.edu/home/baez/mathematical/FinProb_morphism.jpg" />
</div>
<p>The second equation says that <img src='https://s0.wp.com/latex.php?latex=f+%5Ccirc+s&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='f &#92;circ s' title='f &#92;circ s' class='latex' /> is the identity, or in math jargon, <img src='https://s0.wp.com/latex.php?latex=s&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='s' title='s' class='latex' /> is a <a href="https://en.wikipedia.org/wiki/Section_%28category_theory%29"><b>section</b></a> for <img src='https://s0.wp.com/latex.php?latex=f.&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='f.' title='f.' class='latex' /></p>
<p>But what does that <i>really mean?</i></p>
<p>The idea is that <img src='https://s0.wp.com/latex.php?latex=X&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='X' title='X' class='latex' /> is the set of &#8216;states&#8217; of some system, while <img src='https://s0.wp.com/latex.php?latex=Y&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='Y' title='Y' class='latex' /> is a set of possible &#8216;observations&#8217; you might make.  The function <img src='https://s0.wp.com/latex.php?latex=f&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='f' title='f' class='latex' /> is a &#8216;measurement process&#8217;. You &#8216;measure&#8217; the system using <img src='https://s0.wp.com/latex.php?latex=f%2C&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='f,' title='f,' class='latex' /> and if the system is in the the state <img src='https://s0.wp.com/latex.php?latex=x&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='x' title='x' class='latex' /> you get the observation <img src='https://s0.wp.com/latex.php?latex=f%28x%29.&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='f(x).' title='f(x).' class='latex' />  The probability distribution <img src='https://s0.wp.com/latex.php?latex=p&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='p' title='p' class='latex' /> says the probability that the system is any given state, while <img src='https://s0.wp.com/latex.php?latex=q&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='q' title='q' class='latex' /> says the probability that you get any given observation when you do your measurement.</p>
<p>Note: are assuming for now that that there&#8217;s no random noise in the observation process!  That&#8217;s why <img src='https://s0.wp.com/latex.php?latex=f&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='f' title='f' class='latex' /> is a function instead of a stochastic map. </p>
<p>But what about <img src='https://s0.wp.com/latex.php?latex=s%3F&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='s?' title='s?' class='latex' />  That&#8217;s the fun part: <img src='https://s0.wp.com/latex.php?latex=s&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='s' title='s' class='latex' /> describes your &#8216;hypothesis&#8217; about the system&#8217;s state given a particular measurement!  If you measure the system and get a result <img src='https://s0.wp.com/latex.php?latex=y+%5Cin+Y%2C&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='y &#92;in Y,' title='y &#92;in Y,' class='latex' /> you guess it&#8217;s in the state <img src='https://s0.wp.com/latex.php?latex=x&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='x' title='x' class='latex' /> with probability <img src='https://s0.wp.com/latex.php?latex=s_%7Bxy%7D.&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='s_{xy}.' title='s_{xy}.' class='latex' />  </p>
<p>And we don&#8217;t want this hypothesis to be really dumb: that&#8217;s what</p>
<p><img src='https://s0.wp.com/latex.php?latex=f+%5Ccirc+s+%3D+1_Y&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='f &#92;circ s = 1_Y' title='f &#92;circ s = 1_Y' class='latex' /></p>
<p>says.  You see, this equation says that</p>
<p><img src='https://s0.wp.com/latex.php?latex=%5Csum_%7Bx+%5Cin+X%7D+%5Cdelta_%7By%27%2C+f%28x%29%7D+s_%7Bxy%7D+%3D+%5Cdelta_%7By%27+y%7D+&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='&#92;sum_{x &#92;in X} &#92;delta_{y&#039;, f(x)} s_{xy} = &#92;delta_{y&#039; y} ' title='&#92;sum_{x &#92;in X} &#92;delta_{y&#039;, f(x)} s_{xy} = &#92;delta_{y&#039; y} ' class='latex' /></p>
<p>or in other words:</p>
<p><img src='https://s0.wp.com/latex.php?latex=%5Csum_%7Bx+%5Cin+f%5E%7B-1%7D%28y%27%29%7D+s_%7Bxy%7D+%3D+%5Cdelta_%7By%27+y%7D+&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='&#92;sum_{x &#92;in f^{-1}(y&#039;)} s_{xy} = &#92;delta_{y&#039; y} ' title='&#92;sum_{x &#92;in f^{-1}(y&#039;)} s_{xy} = &#92;delta_{y&#039; y} ' class='latex' /></p>
<p>If you think about it, this implies <img src='https://s0.wp.com/latex.php?latex=s_%7Bxy%7D+%3D+0&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='s_{xy} = 0' title='s_{xy} = 0' class='latex' /> unless <img src='https://s0.wp.com/latex.php?latex=f%28x%29+%3D+y.&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='f(x) = y.' title='f(x) = y.' class='latex' />   </p>
<p>So, if you make an observation <img src='https://s0.wp.com/latex.php?latex=y%2C&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='y,' title='y,' class='latex' /> you will guess the system is in state <img src='https://s0.wp.com/latex.php?latex=x&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='x' title='x' class='latex' /> with <i>probability zero</i> unless <img src='https://s0.wp.com/latex.php?latex=f%28x%29+%3D+y.&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='f(x) = y.' title='f(x) = y.' class='latex' />  In short, you won&#8217;t make a really dumb guess about the system&#8217;s state.</p>
<p>Here&#8217;s how we compose morphisms:</p>
<div align="center">
<img height="250" src="https://i1.wp.com/math.ucr.edu/home/baez/mathematical/FinStat_composition.jpg" />
</div>
<p>We get a measure-preserving function <img src='https://s0.wp.com/latex.php?latex=g+%5Ccirc+f+%3A+X+%5Cto+Z&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='g &#92;circ f : X &#92;to Z' title='g &#92;circ f : X &#92;to Z' class='latex' /> and a stochastic map going back, <img src='https://s0.wp.com/latex.php?latex=s+%5Ccirc+t+%3A+Z+%5Cto+Z.&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='s &#92;circ t : Z &#92;to Z.' title='s &#92;circ t : Z &#92;to Z.' class='latex' /> You can check that these obey the required equations:</p>
<p><img src='https://s0.wp.com/latex.php?latex=g+%5Ccirc+f+%5Ccirc+p+%3D+r&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='g &#92;circ f &#92;circ p = r' title='g &#92;circ f &#92;circ p = r' class='latex' /></p>
<p><img src='https://s0.wp.com/latex.php?latex=g+%5Ccirc+f+%5Ccirc+s+%5Ccirc+t+%3D+1_Z&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='g &#92;circ f &#92;circ s &#92;circ t = 1_Z' title='g &#92;circ f &#92;circ s &#92;circ t = 1_Z' class='latex' /></p>
<p>So, we get a category:</p>
<p><b>Definition.</b>  Let <img src='https://s0.wp.com/latex.php?latex=%5Cmathrm%7BFinStat%7D&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='&#92;mathrm{FinStat}' title='&#92;mathrm{FinStat}' class='latex' /> be the category where an object is a finite probability measure space:</p>
<div align="center">
<img height="150" src="https://i2.wp.com/math.ucr.edu/home/baez/mathematical/FinProb_object.jpg" />
</div>
<p>a morphism is a diagram obeying these equations:</p>
<div align="center">
<img height="250" src="https://i1.wp.com/math.ucr.edu/home/baez/mathematical/FinStat_morphism.jpg" />
</div>
<p>and composition is defined as above.</p>
<h3> FP  </h3>
<p>As we&#8217;ve just seen, a morphism in <img src='https://s0.wp.com/latex.php?latex=%5Cmathrm%7BFinStat%7D&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='&#92;mathrm{FinStat}' title='&#92;mathrm{FinStat}' class='latex' /> consists of a &#8216;measurement process&#8217; <img src='https://s0.wp.com/latex.php?latex=f&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='f' title='f' class='latex' /> and a &#8216;hypothesis&#8217; <img src='https://s0.wp.com/latex.php?latex=s%3A&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='s:' title='s:' class='latex' /></p>
<div align="center">
<img height="250" src="https://i1.wp.com/math.ucr.edu/home/baez/mathematical/FinStat_morphism.jpg" />
</div>
<p>But sometimes we&#8217;re lucky and our hypothesis is optimal, in the sense that </p>
<p><img src='https://s0.wp.com/latex.php?latex=s+%5Ccirc+q+%3D+p&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='s &#92;circ q = p' title='s &#92;circ q = p' class='latex' /></p>
<p>Conceptually, this says that if you take the probability distribution <img src='https://s0.wp.com/latex.php?latex=q&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='q' title='q' class='latex' /> on our observations and use it to guess a probability distribution for the system&#8217;s state using our hypothesis <img src='https://s0.wp.com/latex.php?latex=s%2C&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='s,' title='s,' class='latex' /> you <i>get the correct answer</i>: <img src='https://s0.wp.com/latex.php?latex=p.&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='p.' title='p.' class='latex' />   </p>
<p>Mathematically, it says that this diagram commutes:</p>
<div align="center">
<img height="200" src="https://i0.wp.com/math.ucr.edu/home/baez/mathematical/measure-preserving_stochastic_map_2.jpg" />
</div>
<p>In other words, <img src='https://s0.wp.com/latex.php?latex=s&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='s' title='s' class='latex' /> is a measure-preserving stochastic map.</p>
<p>There&#8217;s a subcategory of <img src='https://s0.wp.com/latex.php?latex=%5Cmathrm%7BFinStat%7D&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='&#92;mathrm{FinStat}' title='&#92;mathrm{FinStat}' class='latex' /> with all the same objects, but only these &#8216;optimal&#8217; morphisms.  It&#8217;s important, but the name we have for it is not very exciting:</p>
<p><b>Definition.</b>  Let <img src='https://s0.wp.com/latex.php?latex=%5Cmathrm%7BFP%7D&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='&#92;mathrm{FP}' title='&#92;mathrm{FP}' class='latex' /> be the subcategory of <img src='https://s0.wp.com/latex.php?latex=%5Cmathrm%7BFinStat%7D&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='&#92;mathrm{FinStat}' title='&#92;mathrm{FinStat}' class='latex' /> where an object is a finite probability measure space</p>
<div align="center">
<img height="150" src="https://i2.wp.com/math.ucr.edu/home/baez/mathematical/FinProb_object.jpg" />
</div>
<p>and a morphism is a diagram obeying these equations:</p>
<div align="center">
<img height="260" src="https://i1.wp.com/math.ucr.edu/home/baez/mathematical/FP_morphism.jpg" />
</div>
<p>Why do we call this category <img src='https://s0.wp.com/latex.php?latex=%5Cmathrm%7BFP%7D&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='&#92;mathrm{FP}' title='&#92;mathrm{FP}' class='latex' />?  Because it&#8217;s a close relative of <img src='https://s0.wp.com/latex.php?latex=%5Cmathrm%7BFinProb%7D%2C&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='&#92;mathrm{FinProb},' title='&#92;mathrm{FinProb},' class='latex' /> where a morphism, you&#8217;ll remember, looks like this:</p>
<div align="center">
<img height="200" src="https://i1.wp.com/math.ucr.edu/home/baez/mathematical/FinProb_morphism.jpg" />
</div>
<p>The point is that for a morphism in <img src='https://s0.wp.com/latex.php?latex=%5Cmathrm%7BFP%7D%2C&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='&#92;mathrm{FP},' title='&#92;mathrm{FP},' class='latex' /> the conditions on <img src='https://s0.wp.com/latex.php?latex=s&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='s' title='s' class='latex' /> are so strong that they completely determine it <i>unless there are observations that happen with probability zero</i>&#8212;that is, unless there are <img src='https://s0.wp.com/latex.php?latex=y+%5Cin+Y&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='y &#92;in Y' title='y &#92;in Y' class='latex' /> with <img src='https://s0.wp.com/latex.php?latex=q_y+%3D+0.&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='q_y = 0.' title='q_y = 0.' class='latex' />  To see this, note that </p>
<p><img src='https://s0.wp.com/latex.php?latex=s+%5Ccirc+q+%3D+p&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='s &#92;circ q = p' title='s &#92;circ q = p' class='latex' /></p>
<p>actually says</p>
<p><img src='https://s0.wp.com/latex.php?latex=%5Csum_%7By+%5Cin+Y%7D+s_%7Bxy%7D+q_y+%3D+p_x+&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='&#92;sum_{y &#92;in Y} s_{xy} q_y = p_x ' title='&#92;sum_{y &#92;in Y} s_{xy} q_y = p_x ' class='latex' /></p>
<p>for any choice of <img src='https://s0.wp.com/latex.php?latex=x+%5Cin+X.&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='x &#92;in X.' title='x &#92;in X.' class='latex' />   But we&#8217;ve already seen <img src='https://s0.wp.com/latex.php?latex=s_%7Bxy%7D+%3D+0&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='s_{xy} = 0' title='s_{xy} = 0' class='latex' /> unless <img src='https://s0.wp.com/latex.php?latex=f%28x%29+%3D+y%2C&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='f(x) = y,' title='f(x) = y,' class='latex' /> so the sum has just one term, and the equation says</p>
<p><img src='https://s0.wp.com/latex.php?latex=s_%7Bx%2Cf%28x%29%7D+q_%7Bf%28x%29%7D+%3D+p_x+&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='s_{x,f(x)} q_{f(x)} = p_x ' title='s_{x,f(x)} q_{f(x)} = p_x ' class='latex' />   </p>
<p>We can solve this for <img src='https://s0.wp.com/latex.php?latex=s_%7Bx%2Cf%28x%29%7D%2C&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='s_{x,f(x)},' title='s_{x,f(x)},' class='latex' /> so <img src='https://s0.wp.com/latex.php?latex=s&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='s' title='s' class='latex' /> is completely determined&#8230; <i>unless <img src='https://s0.wp.com/latex.php?latex=q_%7Bf%28x%29%7D+%3D+0.&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='q_{f(x)} = 0.' title='q_{f(x)} = 0.' class='latex' /></i>  </p>
<p>This covers the case when <img src='https://s0.wp.com/latex.php?latex=y+%3D+f%28x%29.&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='y = f(x).' title='y = f(x).' class='latex' />   We also can&#8217;t figure out <img src='https://s0.wp.com/latex.php?latex=s_%7Bx%2Cy%7D&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='s_{x,y}' title='s_{x,y}' class='latex' /> if <img src='https://s0.wp.com/latex.php?latex=y&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='y' title='y' class='latex' /> isn&#8217;t in the image of <img src='https://s0.wp.com/latex.php?latex=f.&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='f.' title='f.' class='latex' /></p>
<p>So, to be utterly precise, <img src='https://s0.wp.com/latex.php?latex=s&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='s' title='s' class='latex' /> is determined by <img src='https://s0.wp.com/latex.php?latex=p%2C+q&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='p, q' title='p, q' class='latex' /> and <img src='https://s0.wp.com/latex.php?latex=f&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='f' title='f' class='latex' /> unless there&#8217;s an element <img src='https://s0.wp.com/latex.php?latex=y+%5Cin+Y&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='y &#92;in Y' title='y &#92;in Y' class='latex' /> that has <img src='https://s0.wp.com/latex.php?latex=q_y+%3D+0.&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='q_y = 0.' title='q_y = 0.' class='latex' />  Except for this special case, a morphism in <img src='https://s0.wp.com/latex.php?latex=%5Cmathrm%7BFP%7D&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='&#92;mathrm{FP}' title='&#92;mathrm{FP}' class='latex' /> is just a morphism in <img src='https://s0.wp.com/latex.php?latex=%5Cmathrm%7BFinProb%7D.&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='&#92;mathrm{FinProb}.' title='&#92;mathrm{FinProb}.' class='latex' />  But in this special case, a morphism in <img src='https://s0.wp.com/latex.php?latex=%5Cmathrm%7BFP%7D&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='&#92;mathrm{FP}' title='&#92;mathrm{FP}' class='latex' /> has a little extra information: an arbitrary probability distribution on the inverse image of each point <img src='https://s0.wp.com/latex.php?latex=y&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='y' title='y' class='latex' /> with this property.</p>
<p>In short, <img src='https://s0.wp.com/latex.php?latex=%5Cmathrm%7BFP%7D&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='&#92;mathrm{FP}' title='&#92;mathrm{FP}' class='latex' /> is the same as <img src='https://s0.wp.com/latex.php?latex=%5Cmathrm%7BFinProb%7D&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='&#92;mathrm{FinProb}' title='&#92;mathrm{FinProb}' class='latex' /> except that our observer&#8217;s &#8216;optimal hypothesis&#8217; must provide a guess about the state of the system given an observation, <i>even in cases of observations that occur with probability zero.</i></p>
<p>I&#8217;m going into these nitpicky details for two reasons.  First, we&#8217;ll need <img src='https://s0.wp.com/latex.php?latex=%5Cmathrm%7BFP%7D&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='&#92;mathrm{FP}' title='&#92;mathrm{FP}' class='latex' /> for our characterization of relative entropy.  But second, Tom Leinster <i>already ran into this category</i> in his work on entropy and category theory!  He discussed it here:</p>
<p>&bull; Tom Leinster, <a href="http://golem.ph.utexas.edu/category/2011/05/an_operadic_introduction_to_en.html">An operadic introduction to entropy</a>.</p>
<p>Despite the common theme of entropy, he arrived at it from a very different starting-point.</p>
<h3> Conclusion </h3>
<p>So, I hope that next time I can show you something like this:</p>
<div align="center">
<img height="150" src="https://i2.wp.com/math.ucr.edu/home/baez/mathematical/FinProb_object.jpg" />
</div>
<p>and you&#8217;ll say &#8220;Oh, that&#8217;s a probability distribution on the states of some system!&#8221;  Intuitively, you should think of the wiggly arrow <img src='https://s0.wp.com/latex.php?latex=p&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='p' title='p' class='latex' /> as picking out a &#8216;random element&#8217; of the set <img src='https://s0.wp.com/latex.php?latex=X.&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='X.' title='X.' class='latex' /></p>
<p>I hope I can show you this:</p>
<div align="center">
<img height="200" src="https://i1.wp.com/math.ucr.edu/home/baez/mathematical/FinProb_morphism.jpg" />
</div>
<p>and you&#8217;ll say &#8220;Oh, that&#8217;s a deterministic measurement process, sending a probability distribution on the states of the measured system to a probability distribution on observations!&#8221;</p>
<p>I hope I can show you this:</p>
<div align="center">
<img height="250" src="https://i1.wp.com/math.ucr.edu/home/baez/mathematical/FinStat_morphism.jpg" />
</div>
<p>and you&#8217;ll say &#8220;Oh, that&#8217;s a deterministic measurement process, together with a hypothesis about the system&#8217;s state, given what is observed!&#8221;</p>
<p>And I hope I can show you this:</p>
<div align="center">
<img height="260" src="https://i1.wp.com/math.ucr.edu/home/baez/mathematical/FP_morphism.jpg" />
</div>
<p>and you&#8217;ll say &#8220;Oh, that&#8217;s a deterministic measurement process, together with an <i>optimal</i> hypothesis about the system&#8217;s state, given what is observed!&#8221;</p>
<p>I don&#8217;t count on it&#8230; but I can hope.</p>
<h3> Postscript </h3>
<p>And speaking of unrealistic hopes, if I were <i>really</i> optimistic I would hope you noticed that <img src='https://s0.wp.com/latex.php?latex=%5Cmathrm%7BFinStoch%7D&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='&#92;mathrm{FinStoch}' title='&#92;mathrm{FinStoch}' class='latex' /> and  <img src='https://s0.wp.com/latex.php?latex=%5Cmathrm%7BFinProb%7D%2C&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='&#92;mathrm{FinProb},' title='&#92;mathrm{FinProb},' class='latex' /> which underlie the more fancy categories I&#8217;ve discussed today, were themselves constructed starting from linear algebra over the nonnegative numbers <img src='https://s0.wp.com/latex.php?latex=%5B0%2C%5Cinfty%29&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='[0,&#92;infty)' title='[0,&#92;infty)' class='latex' /> in <a href="https://johncarlosbaez.wordpress.com/2013/06/20/relative-entropy-part-1/">Part 1</a>.   That &#8216;foundational&#8217; work is not really needed for what we&#8217;re doing now.  However, I like the fact that we&#8217;re ultimately getting the concept of relative entropy starting from very little: just linear algebra, using only nonnegative numbers!</p>
]]></html><thumbnail_url><![CDATA[https://i2.wp.com/math.ucr.edu/home/baez/mathematical/FinProb_object.jpg?fit=440%2C330]]></thumbnail_url><thumbnail_height><![CDATA[219]]></thumbnail_height><thumbnail_width><![CDATA[87]]></thumbnail_width></oembed>