<?xml version="1.0" encoding="UTF-8" standalone="yes"?><oembed><version><![CDATA[1.0]]></version><provider_name><![CDATA[Azimuth]]></provider_name><provider_url><![CDATA[https://johncarlosbaez.wordpress.com]]></provider_url><author_name><![CDATA[John Baez]]></author_name><author_url><![CDATA[https://johncarlosbaez.wordpress.com/author/johncarlosbaez/]]></author_url><title><![CDATA[Sensing and Acting Under Information&nbsp;Constraints]]></title><type><![CDATA[link]]></type><html><![CDATA[<p>I&#8217;m having a great time at a workshop on <a href="https://johncarlosbaez.wordpress.com/2014/01/31/bio-inspired-information-theory/">Biological and Bio-Inspired Information Theory</a> in Banff, Canada.  You can see videos of the talks online.  There have been lots of good talks so far, but this one really blew my mind:</p>
<p>&bull; Naftali Tishby, <a href="http://www.birs.ca/events/2014/5-day-workshops/14w5170/videos/watch/201410281032-Tishby.mp4">Sensing and acting under information constraints&#8212;a principled approach to biology and intelligence</a>, 28 October 2014.</p>
<div align="center">
<img width="200" src="https://i1.wp.com/math.ucr.edu/home/baez/mathematical/naftali_tishby.jpg" />
</div>
<p>Tishby&#8217;s talk wasn&#8217;t easy for me to follow&#8212;he assumed you already knew rate-distortion theory and the Bellman equation, and I didn&#8217;t&#8212;but it was <i>great!</i></p>
<p>It was about the &#8216;action-perception loop&#8217;:</p>
<div align="center">
<a href="http://www.uni-bielefeld.de/biologie/cns/"><br />
<img width="450" src="https://i0.wp.com/math.ucr.edu/home/baez/mathematical/action-perception_loop.jpg" /></a>
</div>
<p>This is the feedback loop in which living organisms&#8212;like us&#8212;<i>take actions</i> depending on our goals and what we perceive, and <i>perceive things</i> depending on the actions we take and the state of the world.</p>
<p>How do we do this so well?  Among other things, we need to balance the <i>cost</i> of storing information about the <i>past</i> against the <i>payoff</i> of achieving our desired goals in the <i>future</i>.</p>
<p>Tishby presented a detailed yet highly general mathematical model of this!  And he ended by testing the model on experiments with cats listening to music and rats swimming to land.</p>
<p>It&#8217;s beautiful stuff.  I want to learn it.  I hope to blog about it as I understand more.  But for now, let me just dive in and say some basic stuff.  I&#8217;ll start with the two buzzwords I dropped on you.  I hate it when people use terminology without ever explaining it.</p>
<h3> Rate-distortion theory </h3>
<p><b><a href="https://en.wikipedia.org/wiki/Rate-distortion_theory">Rate-distortion theory</a></b> is a branch of information theory which seeks to find the minimum rate at which bits must be communicated over a noisy channel so that the signal can be approximately reconstructed at the other end without exceeding a given distortion.  Shannon&#8217;s first big result in this theory, the &#8216;rate-distortion theorem&#8217;, gives a formula for this minimum rate.  Needless to say, it still requires a lot of extra work to determine and achieve this minimum rate in practice.</p>
<p>For the basic definitions and a statement of the theorem, try this:</p>
<p>&bull; Natasha Devroye, <a href="https://www.cs.uic.edu/pub/ECE534/WebHome/ch10.pdf">Rate-distortion theory</a>, course notes, University of Chicago, Illinois, Fall 2009.</p>
<p>One of the people organizing this conference is a big expert on rate-distortion theory, and he wrote a book about it.</p>
<p>&bull; Toby Berger, <i>Rate Distortion Theory: A Mathematical Basis for Data Compression</i>, Prentice&#8211;Hall, 1971.</p>
<p>Unfortunately it&#8217;s out of print and selling for $259 used on Amazon!  An easier option might be this:</p>
<p>&bull; Thomas M. Cover and Joy A. Thomas, <i>Elements of Information Theory</i>, Chapter 10: Rate Distortion Theory, Wiley, New York, 2006.</p>
<h3> The Bellman equation </h3>
<p>The <b><a href="https://en.wikipedia.org/wiki/Bellman_equation">Bellman equation</a></b> reduces the task of finding an optimal course of action to choosing what to do at each step.  So, you&#8217;re trying to maximize the &#8216;total reward&#8217;&#8212;the sum of rewards at each time step&#8212;and Bellman&#8217;s equation says what to do at each time step.</p>
<p>If you&#8217;ve studied physics, this should remind you of how starting from the principle of least action, we can get a differential equation describing the motion of a particle: the <a href="https://en.wikipedia.org/wiki/Euler%E2%80%93Lagrange_equation"><b>Euler&#8211;Lagrange equation</b></a>.</p>
<p>And in fact they&#8217;re deeply related.  The relation is obscured by two little things.  First, Bellman&#8217;s equation is usually formulated in a context where time passes in discrete steps, while the Euler&#8211;Lagrange equation is usually formulated in continuous time.  Second, Bellman&#8217;s equation is really the discrete-time version not of the Euler&#8211;Lagrange equation but a more or less equivalent thing: the &#8216;Hamilton&#8211;Jacobi equation&#8217;.</p>
<p>Ah, another buzzword to demystify!   I was scared of the Hamilton&#8211;Jacobi equation for years, until I taught <a href="http://math.ucr.edu/home/baez/classical/#lagrangian">a course on classical mechanics</a> that covered it.  Now I think it&#8217;s the greatest thing in the world!</p>
<p>Briefly: the Hamilton&#8211;Jacobi equation concerns the least possible action to get from a fixed starting point to a point <img src='https://s0.wp.com/latex.php?latex=q&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='q' title='q' class='latex' /> in space at time <img src='https://s0.wp.com/latex.php?latex=t.&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='t.' title='t.' class='latex' />  If we call this least possible action <img src='https://s0.wp.com/latex.php?latex=W%28t%2Cq%29%2C&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='W(t,q),' title='W(t,q),' class='latex' /> the <a href="https://en.wikipedia.org/wiki/Hamilton%E2%80%93Jacobi_equation"><b>Hamilton&#8211;Jacobi equation</b></a> says</p>
<p><img src='https://s0.wp.com/latex.php?latex=%5Cdisplaystyle%7B+%5Cfrac%7B%5Cpartial+W%28t%2Cq%29%7D%7B%5Cpartial+q_i%7D+%3D+p_i++%7D&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='&#92;displaystyle{ &#92;frac{&#92;partial W(t,q)}{&#92;partial q_i} = p_i  }' title='&#92;displaystyle{ &#92;frac{&#92;partial W(t,q)}{&#92;partial q_i} = p_i  }' class='latex' /></p>
<p><img src='https://s0.wp.com/latex.php?latex=%5Cdisplaystyle%7B+%5Cfrac%7B%5Cpartial+W%28t%2Cq%29%7D%7B%5Cpartial+t%7D+%3D+-E++%7D&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='&#92;displaystyle{ &#92;frac{&#92;partial W(t,q)}{&#92;partial t} = -E  }' title='&#92;displaystyle{ &#92;frac{&#92;partial W(t,q)}{&#92;partial t} = -E  }' class='latex' /></p>
<p>where <img src='https://s0.wp.com/latex.php?latex=p&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='p' title='p' class='latex' /> is the particle&#8217;s momentum at the endpoint of its path, and <img src='https://s0.wp.com/latex.php?latex=E&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='E' title='E' class='latex' /> is its energy there.</p>
<p>If we replace derivatives by differences, and talk about <i>maximizing total reward</i> instead of <i>minimizing action</i>, we get Bellman&#8217;s equation:</p>
<p>&bull; <a href="https://en.wikipedia.org/wiki/Bellman_equation">Bellman equation</a>, Wikipedia.</p>
<h3> Markov decision processes </h3>
<p>Bellman&#8217;s equation can be useful whenever you&#8217;re trying to figure out an optimal course of action.  An important example is a &#8216;Markov decision process&#8217;.  To prepare you for Tishby&#8217;s talk, I should say what this is.</p>
<p>In a Markov process, something randomly hops around from state to state with fixed probabilities.  In the simplest case there&#8217;s a finite set <img src='https://s0.wp.com/latex.php?latex=S&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='S' title='S' class='latex' /> of <b>states</b>, and time proceeds in discrete steps.  At each time step, the probability to hop from state <img src='https://s0.wp.com/latex.php?latex=s&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='s' title='s' class='latex' /> to state <img src='https://s0.wp.com/latex.php?latex=s%27&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='s&#039;' title='s&#039;' class='latex' /> is some fixed number <img src='https://s0.wp.com/latex.php?latex=P%28s%2Cs%27%29.&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='P(s,s&#039;).' title='P(s,s&#039;).' class='latex' /></p>
<p>This sort of thing is called a <a href="https://en.wikipedia.org/wiki/Markov_chain"><b>Markov chain</b></a>, or if you feel the need to be more insistent, a <b>discrete-time Markov chain</b>.</p>
<p>A <b>Markov decision process</b> is a generalization where an outside agent gets to change these probabilities!  The agent gets to choose <b>actions</b> from some set <img src='https://s0.wp.com/latex.php?latex=A.&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='A.' title='A.' class='latex' />   If at a given time he chooses the action <img src='https://s0.wp.com/latex.php?latex=%5Calpha+%5Cin+A%2C&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='&#92;alpha &#92;in A,' title='&#92;alpha &#92;in A,' class='latex' /> the probability of the system hopping from state <img src='https://s0.wp.com/latex.php?latex=s&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='s' title='s' class='latex' /> to state <img src='https://s0.wp.com/latex.php?latex=s%27&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='s&#039;' title='s&#039;' class='latex' /> is <img src='https://s0.wp.com/latex.php?latex=P_%5Calpha%28s%2Cs%27%29.&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='P_&#92;alpha(s,s&#039;).' title='P_&#92;alpha(s,s&#039;).' class='latex' />  Needless to say, these probabilities have to sum to one for any fixed <img src='https://s0.wp.com/latex.php?latex=s.&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='s.' title='s.' class='latex' /></p>
<p>That would already be interesting, but the real fun is that there&#8217;s also a <b>reward</b> <img src='https://s0.wp.com/latex.php?latex=R_%5Calpha%28s%2Cs%27%29.&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='R_&#92;alpha(s,s&#039;).' title='R_&#92;alpha(s,s&#039;).' class='latex' />   This is a real number saying how much joy or misery the agent experiences if he does action <img src='https://s0.wp.com/latex.php?latex=%5Calpha&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='&#92;alpha' title='&#92;alpha' class='latex' /> and the system hops from <img src='https://s0.wp.com/latex.php?latex=s&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='s' title='s' class='latex' /> to <img src='https://s0.wp.com/latex.php?latex=s%27.&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='s&#039;.' title='s&#039;.' class='latex' /></p>
<p>The problem is to choose a <b>policy</b>&#8212;a function from states to actions&#8212;that maximizes the total expected reward over some period of time.  This is precisely the kind of thing Bellman&#8217;s equation is good for!</p>
<p>If you&#8217;re an economist you might also want to &#8216;discount&#8217; future rewards, saying that a reward <img src='https://s0.wp.com/latex.php?latex=n&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='n' title='n' class='latex' /> time steps in the future gets multiplied by <img src='https://s0.wp.com/latex.php?latex=%5Cgamma%5En%2C&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='&#92;gamma^n,' title='&#92;gamma^n,' class='latex' /> where <img src='https://s0.wp.com/latex.php?latex=0+%3C+%5Cgamma+%5Cle+1&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='0 &lt; &#92;gamma &#92;le 1' title='0 &lt; &#92;gamma &#92;le 1' class='latex' /> is some <b>discount factor</b>.  This extra tweak is easily handled, and you can see it all here:</p>
<p>&bull; <a href="https://en.wikipedia.org/wiki/Markov_decision_process">Markov decision process</a>, Wikipedia.</p>
<h3> Partially observable Markov decision processes </h3>
<p>There&#8217;s a further generalization where the agent can&#8217;t see all the details of the system!  Instead, when he takes an action <img src='https://s0.wp.com/latex.php?latex=%5Calpha+%5Cin+A&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='&#92;alpha &#92;in A' title='&#92;alpha &#92;in A' class='latex' /> and the system hops from state <img src='https://s0.wp.com/latex.php?latex=s&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='s' title='s' class='latex' /> to state <img src='https://s0.wp.com/latex.php?latex=s%27%2C&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='s&#039;,' title='s&#039;,' class='latex' /> he sees something: a point in some set <img src='https://s0.wp.com/latex.php?latex=O&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='O' title='O' class='latex' /> of <b>observations</b>.  He makes the observation <img src='https://s0.wp.com/latex.php?latex=o+%5Cin+O&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='o &#92;in O' title='o &#92;in O' class='latex' /> with probability <img src='https://s0.wp.com/latex.php?latex=%5COmega_%5Calpha%28o%2Cs%27%29.&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='&#92;Omega_&#92;alpha(o,s&#039;).' title='&#92;Omega_&#92;alpha(o,s&#039;).' class='latex' /></p>
<p>(I don&#8217;t know why this probability depends on <img src='https://s0.wp.com/latex.php?latex=s%27&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='s&#039;' title='s&#039;' class='latex' /> but not <img src='https://s0.wp.com/latex.php?latex=s.&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='s.' title='s.' class='latex' />  Maybe it ultimately doesn&#8217;t matter much.)</p>
<p>Again, the goal is to choose a <b>policy</b> that maximizes the expected total reward.  But a policy is a bit different now.  The action at any time can only depend on all the observations made thus far.</p>
<p>Partially observable Markov decision processes are also called <b>POMPD</b>s.   If you want to learn about them, try these:</p>
<p>&bull; <a href="https://en.wikipedia.org/wiki/Partially_observable_Markov_decision_process">Partially observable Markov decision process</a>, Wikipedia.</p>
<p>&bull; Tony Cassandra, <a href="http://www.pomdp.org/index.shtml">Partially observable Markov decision processes</a>.</p>
<p>The latter includes an introduction <i>without any formulas</i> to POMDPs and how to choose optimal policies.  I&#8217;m not sure who would study this subject and not want to see formulas, but it&#8217;s certainly a good exercise to explain things using just words&#8212;and it makes certain things easier to understand (though not others, in a way that depends on who is trying to learn the stuff).</p>
<h3> The action-perception loop </h3>
<p>I already explained the action-perception loop, with the help of this picture from the University of Bielefeld&#8217;s Department of Cognitive Neuroscience:</p>
<div align="center">
<a href="http://www.uni-bielefeld.de/biologie/cns/"><br />
<img width="450" src="https://i0.wp.com/math.ucr.edu/home/baez/mathematical/action-perception_loop.jpg" /></a>
</div>
<p>Nafthali Tishby has a nice picture of it that&#8217;s more abstract:</p>
<div align="center">
<img width="450" src="https://i1.wp.com/math.ucr.edu/home/baez/mathematical/action-perception_loop_tishby.jpg" />
</div>
<p>We&#8217;re assuming time comes in discrete steps, just to keep things simple.</p>
<p>At each time <img src='https://s0.wp.com/latex.php?latex=t&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='t' title='t' class='latex' /></p>
<p>&bull; the <b>world</b> has some state <img src='https://s0.wp.com/latex.php?latex=W_t%2C&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='W_t,' title='W_t,' class='latex' /> and<br />
&bull; the <b>agent</b> has some state <img src='https://s0.wp.com/latex.php?latex=M_t.&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='M_t.' title='M_t.' class='latex' /></p>
<p>Why the letter <img src='https://s0.wp.com/latex.php?latex=M&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='M' title='M' class='latex' />?  This stands for <b>memory</b>: it can be the state of the agent&#8217;s memory, but I prefer to think of it as the state of the agent.</p>
<p>At each time</p>
<p>&bull; the agent takes an <b>action</b> <img src='https://s0.wp.com/latex.php?latex=A_t%2C&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='A_t,' title='A_t,' class='latex' /> which affects the world&#8217;s next state, and</p>
<p>&bull; the world provides a <b>sensation</b> <img src='https://s0.wp.com/latex.php?latex=S_t&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='S_t' title='S_t' class='latex' /> to the agent, which affect&#8217;s the agent&#8217;s next state.</p>
<p>This is simplified but very nice.  Note that there&#8217;s a symmetry interchanging the world and the agent!</p>
<p>We could make it fancier by having lots of agents who all interact, but there are a lot of questions already.  The big question Tishby focuses on is optimizing how much the agent should remember about the past if they</p>
<p>&bull; get a reward depending on the action taken and the resulting state of the world</p>
<p>but</p>
<p>&bull; pay a price for the information stored from sensations.</p>
<p>Tishby formulates this optimization question as something like a partially observed Markov decision process, uses rate-distortion theory to analyze how much information needs to be stored to achieve a given reward, and uses Bellman&#8217;s equation to solve the optimization problem!  <img src="https://i0.wp.com/math.ucr.edu/home/baez/emoticons/surprised.gif" /></p>
<p>So, everything I sketched fits together somehow!</p>
<p>I hope what I&#8217;m saying now is roughly right: it will take me more time to get the details straight.    If you&#8217;re having trouble absorbing all the information I just threw at you, don&#8217;t feel bad: so am I.  But the math feels really natural and good to me.  It involves a lot of my favorite ideas (like generalizations of the principle of least action, and relative entropy), and it seems ripe to be combined with network theory ideas.</p>
<p>For details, I highly recommend this paper:</p>
<p>&bull; Naftali Tishby and Daniel Polani, <a href="http://www.cs.huji.ac.il/labs/learning/Papers/IT-PAC.pdf">Information theory of decisions and actions</a>, in <i>Perception-Reason-Action Cycle: Models, Algorithms and System</i>. Vassilis, Hussain and Taylor, Springer, Berlin, 2010.</p>
<p>I&#8217;m going to print this out, put it by my bed, and read it every night until I&#8217;ve absorbed it.</p>
]]></html><thumbnail_url><![CDATA[https://i1.wp.com/math.ucr.edu/home/baez/mathematical/naftali_tishby.jpg?fit=440%2C330]]></thumbnail_url><thumbnail_height><![CDATA[330]]></thumbnail_height><thumbnail_width><![CDATA[251]]></thumbnail_width></oembed>