<?xml version="1.0" encoding="UTF-8" standalone="yes"?><oembed><version><![CDATA[1.0]]></version><provider_name><![CDATA[Azimuth]]></provider_name><provider_url><![CDATA[https://johncarlosbaez.wordpress.com]]></provider_url><author_name><![CDATA[John Baez]]></author_name><author_url><![CDATA[https://johncarlosbaez.wordpress.com/author/johncarlosbaez/]]></author_url><title><![CDATA[A Second Law for Open Markov&nbsp;Processes]]></title><type><![CDATA[link]]></type><html><![CDATA[<p><i>guest post by <b><a href="http://www.azimuthproject.org/azimuth/show/Blake+Pollard">Blake Pollard</a></b></i></p>
<p>What comes to mind when you hear the term &#8216;random process&#8217;? Do you think of Brownian motion? Do you think of particles hopping around? Do you think of a drunkard staggering home?</p>
<p>Today I’m going to tell you about a version of the drunkard’s walk with a few modifications. Firstly, we don’t have just one drunkard: we can have any positive real number of drunkards. Secondly, our drunkards have no memory; where they go next doesn’t depend on where they’ve been. Thirdly, there are special places, such as entrances to bars, where drunkards magically appear and disappear.</p>
<p>The second condition says that our drunkards satisfy the Markov property, making their random walk into a <b>Markov process</b>. The third condition is really what I want to tell you about, because it makes our Markov process into a more general ‘open Markov process’.</p>
<p>There are a collection of places the drunkards can be, for example:</p>
<p><img src='https://s0.wp.com/latex.php?latex=V%3D+%5C%7B+%5Ctext%7Bbar%7D%2C%5Ctext%7Bsidewalk%7D%2C+%5Ctext%7Bstreet%7D%2C+%5Ctext%7Btaco+truck%7D%2C+%5Ctext%7Bhome%7D+%5C%7D+&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='V= &#92;{ &#92;text{bar},&#92;text{sidewalk}, &#92;text{street}, &#92;text{taco truck}, &#92;text{home} &#92;} ' title='V= &#92;{ &#92;text{bar},&#92;text{sidewalk}, &#92;text{street}, &#92;text{taco truck}, &#92;text{home} &#92;} ' class='latex' /></p>
<p>We call this set <img src='https://s0.wp.com/latex.php?latex=V&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='V' title='V' class='latex' /> the set of <b>states</b>. There are certain probabilities associated with traveling between these places. We call these <b>transition rates</b>. For example it is more likely for a drunkard to go from the bar to the taco truck than to go from the bar to home so the transition rate between the bar and the taco truck should be greater than the transition rate from the bar to home. Sometimes you can’t get from one place to another without passing through intermediate places. In reality the drunkard can’t go directly from the bar to the taco truck: he or she has to go from the bar to sidewalk to the taco truck.</p>
<p>This information can all be summarized by drawing a directed graph where the positive numbers labelling the edges are the transition rates:</p>
<div align="center"><a href="http://math.ucr.edu/home/baez/mathematical/pollard_markov/drunkwalk.png"><img width="450" src="https://i1.wp.com/math.ucr.edu/home/baez/mathematical/pollard_markov/drunkwalk.png" /></a></div>
<p>For simplicity we draw only three states: home, bar, taco truck. Drunkards go from home to the bar and back, but they never go straight from home to the taco truck.</p>
<p>We can keep track of where all of our drunkards are using a vector with 3 entries:</p>
<p><img src='https://s0.wp.com/latex.php?latex=%5Cdisplaystyle%7B+p%28t%29+%3D+%5Cleft%28+%5Cbegin%7Barray%7D%7Bc%7D+p_h%28t%29+%5C%5C+p_b%28t%29+%5C%5C+p_%7Btt%7D%28t%29+%5Cend%7Barray%7D+%5Cright%29+%5Cin+%5Cmathbb%7BR%7D%5E3+%7D+&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='&#92;displaystyle{ p(t) = &#92;left( &#92;begin{array}{c} p_h(t) &#92;&#92; p_b(t) &#92;&#92; p_{tt}(t) &#92;end{array} &#92;right) &#92;in &#92;mathbb{R}^3 } ' title='&#92;displaystyle{ p(t) = &#92;left( &#92;begin{array}{c} p_h(t) &#92;&#92; p_b(t) &#92;&#92; p_{tt}(t) &#92;end{array} &#92;right) &#92;in &#92;mathbb{R}^3 } ' class='latex' /></p>
<p>We call this our <b>population distribution</b>. The first entry <img src='https://s0.wp.com/latex.php?latex=p_h&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='p_h' title='p_h' class='latex' /> is the number of drunkards that are at home, the second <img src='https://s0.wp.com/latex.php?latex=p_b&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='p_b' title='p_b' class='latex' /> is how many are at the bar, and the third <img src='https://s0.wp.com/latex.php?latex=p_%7Btt%7D&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='p_{tt}' title='p_{tt}' class='latex' /> is how many are at the taco truck.</p>
<p>There is a set of coupled, linear, first-order differential equations we can write down using the information in our graph that tells us how the number of drunkards in each place change with time.  This is called the <b>master equation</b>:</p>
<p><img src='https://s0.wp.com/latex.php?latex=%5Cdisplaystyle%7B+%5Cfrac%7Bd+p%7D%7Bd+t%7D+%3D+H+p+%7D+&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='&#92;displaystyle{ &#92;frac{d p}{d t} = H p } ' title='&#92;displaystyle{ &#92;frac{d p}{d t} = H p } ' class='latex' /></p>
<p>where <img src='https://s0.wp.com/latex.php?latex=H&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='H' title='H' class='latex' /> is a 3&times;3 matrix which we call the <b>Hamiltonian</b>. The off-diagonal entries are nonnegative:</p>
<p><img src='https://s0.wp.com/latex.php?latex=H_%7Bij%7D+%5Cgeq+0%2C+i+%5Cneq+j&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='H_{ij} &#92;geq 0, i &#92;neq j' title='H_{ij} &#92;geq 0, i &#92;neq j' class='latex' /></p>
<p>and the columns sum to zero:</p>
<p><img src='https://s0.wp.com/latex.php?latex=%5Csum_i+H_%7Bij%7D%3D0&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='&#92;sum_i H_{ij}=0' title='&#92;sum_i H_{ij}=0' class='latex' /></p>
<p>We call a matrix satisfying these conditions <b>infinitesimal stochastic</b>. <b>Stochastic</b> matrices have columns that sum to one. If we take the exponential of an infinitesimal stochastic matrix we get one whose columns sum to one, hence the label &#8216;infinitesimal&#8217;.</p>
<p>The Hamiltonian for the graph above is</p>
<p><img src='https://s0.wp.com/latex.php?latex=H+%3D+%5Cleft%28+%5Cbegin%7Barray%7D%7Bccc%7D+-2+%26+5+%26+10+%5C%5C+2+%26+-12+%26+0+%5C%5C+0+%26+7+%26+-10+%5Cend%7Barray%7D+%5Cright%29+&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='H = &#92;left( &#92;begin{array}{ccc} -2 &amp; 5 &amp; 10 &#92;&#92; 2 &amp; -12 &amp; 0 &#92;&#92; 0 &amp; 7 &amp; -10 &#92;end{array} &#92;right) ' title='H = &#92;left( &#92;begin{array}{ccc} -2 &amp; 5 &amp; 10 &#92;&#92; 2 &amp; -12 &amp; 0 &#92;&#92; 0 &amp; 7 &amp; -10 &#92;end{array} &#92;right) ' class='latex' /></p>
<p>John has written a lot about Markov processes and infinitesimal stochastic Hamiltonians in previous posts.</p>
<p>Given two vectors <img src='https://s0.wp.com/latex.php?latex=p%2Cq+%5Cin+%5Cmathbb%7BR%7D%5E3&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='p,q &#92;in &#92;mathbb{R}^3' title='p,q &#92;in &#92;mathbb{R}^3' class='latex' /> describing the populations of drunkards which obey the same master equation, we can calculate the <b>relative entropy</b> of <img src='https://s0.wp.com/latex.php?latex=p&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='p' title='p' class='latex' /> relative to <img src='https://s0.wp.com/latex.php?latex=q&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='q' title='q' class='latex' />:</p>
<p><img src='https://s0.wp.com/latex.php?latex=%5Cdisplaystyle%7B+S%28p%2Cq%29+%3D+%5Csum_%7B+i+%5Cin+V%7D+p_i+%5Cln+%5Cleft%28+%5Cfrac%7Bp_i%7D%7Bq_i%7D+%5Cright%29+%7D+&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='&#92;displaystyle{ S(p,q) = &#92;sum_{ i &#92;in V} p_i &#92;ln &#92;left( &#92;frac{p_i}{q_i} &#92;right) } ' title='&#92;displaystyle{ S(p,q) = &#92;sum_{ i &#92;in V} p_i &#92;ln &#92;left( &#92;frac{p_i}{q_i} &#92;right) } ' class='latex' /></p>
<p>This is an example of a &#8216;divergence&#8217;.  In statistics, a <a href="https://en.wikipedia.org/wiki/Divergence_%28statistics%29">divergence</a> a way of measuring the distance between probability distributions, which may not be symmetrical and may even not obey the triangle inequality.</p>
<p>The relative entropy is important because it decreases monotonically with time, making it a <a href="https://en.wikipedia.org/wiki/Lyapunov_function">Lyapunov function</a> for Markov processes. Indeed, it is a well known fact that</p>
<p><img src='https://s0.wp.com/latex.php?latex=%5Cdisplaystyle%7B+%5Cfrac%7BdS%28p%28t%29%2Cq%28t%29+%29+%7D+%7Bdt%7D+%5Cleq+0+%7D&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='&#92;displaystyle{ &#92;frac{dS(p(t),q(t) ) } {dt} &#92;leq 0 }' title='&#92;displaystyle{ &#92;frac{dS(p(t),q(t) ) } {dt} &#92;leq 0 }' class='latex' /></p>
<p>This is true for any two population distributions which evolve according to the same master equation, though you have to allow infinity as a possible value for the relative entropy and negative infinity for its time derivative.</p>
<p>Why is entropy <i>decreasing?</i>  Doesn&#8217;t the Second Law of Thermodynamics say entropy <i>increases?</i></p>
<p>Don&#8217;t worry: the reason is that I have not put a minus sign in my definition of relative entropy.  Put one in if you like, and then it will increase.  Sometimes without the minus sign it&#8217;s called the <a href="https://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence">Kullback&#8211;Leibler divergence</a>.  This <i>decreases</i> with the passage of time, saying that any two population distributions <img src='https://s0.wp.com/latex.php?latex=p%28t%29&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='p(t)' title='p(t)' class='latex' /> and <img src='https://s0.wp.com/latex.php?latex=q%28t%29&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='q(t)' title='q(t)' class='latex' /> get &#8216;closer together&#8217; as they get randomized with the passage of time.</p>
<p>That itself is a nice result, but I want to tell you what happens when you allow drunkards to appear and disappear at certain states. Drunkards appear at the bar once they’ve had enough to drink and once they are home for long enough they can disappear. The set of places where drunkards can appear or disappear <img src='https://s0.wp.com/latex.php?latex=B&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='B' title='B' class='latex' /> is called the set of <b>boundary states</b>.  So for the above process</p>
<p><img src='https://s0.wp.com/latex.php?latex=B+%3D+%5C%7B+%5Ctext%7Bhome%7D%2C%5Ctext%7Bbar%7D+%5C%7D&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='B = &#92;{ &#92;text{home},&#92;text{bar} &#92;}' title='B = &#92;{ &#92;text{home},&#92;text{bar} &#92;}' class='latex' /></p>
<p>is the set of boundary states. This changes the way in which the population of drunkards changes with time!</p>
<p>The drunkards at the taco truck obey the master equation. For them,</p>
<p><img src='https://s0.wp.com/latex.php?latex=%5Cdisplaystyle%7B+%5Cfrac%7Bdp_%7Btt%7D%7D%7Bdt%7D+%3D+7p_b+-10+p_%7Btt%7D+%7D+&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='&#92;displaystyle{ &#92;frac{dp_{tt}}{dt} = 7p_b -10 p_{tt} } ' title='&#92;displaystyle{ &#92;frac{dp_{tt}}{dt} = 7p_b -10 p_{tt} } ' class='latex' /></p>
<p>still holds.  But because the populations can appear or disappear at the boundary states the master equation no longer holds at those states! Instead it is useful to define the flow of drunkards into the <img src='https://s0.wp.com/latex.php?latex=i%5E%7Bth%7D&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='i^{th}' title='i^{th}' class='latex' /> state by</p>
<p><img src='https://s0.wp.com/latex.php?latex=%5Cdisplaystyle%7B+%5Cfrac%7BDp_i%7D%7BDt%7D+%3D+%5Cfrac%7Bdp_i%7D%7Bdt%7D-%5Csum_j+H_%7Bij%7D+p_j%7D+&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='&#92;displaystyle{ &#92;frac{Dp_i}{Dt} = &#92;frac{dp_i}{dt}-&#92;sum_j H_{ij} p_j} ' title='&#92;displaystyle{ &#92;frac{Dp_i}{Dt} = &#92;frac{dp_i}{dt}-&#92;sum_j H_{ij} p_j} ' class='latex' /></p>
<p>This quantity describes by how much the rate of change of the populations at the boundary states differ from that given by the master equation.</p>
<p>The reason why we are interested in open Markov processes is because you can take two open Markov processes and glue them together along some subset of their boundary states to get a new open Markov process! This allows us to build up or break down complicated Markov processes using open Markov processes as the building blocks.</p>
<p>For example we can draw the graph corresponding to the drunkards’ walk again, only now we will distinguish boundary states from internal states by coloring internal states blue and having boundary states be white:</p>
<div align="center"><a href="http://math.ucr.edu/home/baez/mathematical/pollard_markov/drunkopen.png"><img width="450" src="https://i2.wp.com/math.ucr.edu/home/baez/mathematical/pollard_markov/drunkopen.png" /></a></div>
<p>Consider another open Markov process with states</p>
<p><img src='https://s0.wp.com/latex.php?latex=V%3D%5C%7B+%5Ctext%7Bhome%7D%2C%5Ctext%7Bwork%7D%2C%5Ctext%7Bbar%7D+%5C%7D&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='V=&#92;{ &#92;text{home},&#92;text{work},&#92;text{bar} &#92;}' title='V=&#92;{ &#92;text{home},&#92;text{work},&#92;text{bar} &#92;}' class='latex' /></p>
<p>where</p>
<p><img src='https://s0.wp.com/latex.php?latex=B%3D%5C%7B+%5Ctext%7Bhome%7D%2C+%5Ctext%7Bbar%7D%5C%7D+&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='B=&#92;{ &#92;text{home}, &#92;text{bar}&#92;} ' title='B=&#92;{ &#92;text{home}, &#92;text{bar}&#92;} ' class='latex' /></p>
<p>are the boundary states, leaving</p>
<p><img src='https://s0.wp.com/latex.php?latex=I%3D%5C%7B%5Ctext%7Bwork%7D%5C%7D&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='I=&#92;{&#92;text{work}&#92;}' title='I=&#92;{&#92;text{work}&#92;}' class='latex' /></p>
<p>as an internal state:</p>
<div align="center"><a href="http://math.ucr.edu/home/baez/mathematical/pollard_markov/drunkopen2.png"><img width="450" src="https://i2.wp.com/math.ucr.edu/home/baez/mathematical/pollard_markov/drunkopen2.png" /></a></div>
<p>Since the boundary states of this process overlap with the boundary states of the first process we can compose the two to form a new Markov process:</p>
<div align="center"><a href="http://math.ucr.edu/home/baez/mathematical/pollard_markov/drunkcomposite.png"><img width="450" src="https://i0.wp.com/math.ucr.edu/home/baez/mathematical/pollard_markov/drunkcomposite.png" /></a></div>
<p>Notice the boundary states are now internal states. I hope any Markov process that could approximately model your behavior has more interesting nodes! There is a nice way to figure out the Hamiltonian of the composite from the Hamiltonians of the pieces, but we will leave that for another time.</p>
<p>We can ask ourselves, how does relative entropy change with time in open Markov processes?  You can read my paper for the details, but here is the punchline:</p>
<p><img src='https://s0.wp.com/latex.php?latex=%5Cdisplaystyle%7B+%5Cfrac%7BdS%28p%28t%29%2Cq%28t%29+%29+%7D%7Bdt%7D+%5Cleq+%5Csum_%7Bi+%5Cin+B%7D+%5Cfrac%7BDp_i%7D%7BDt%7D%5Cfrac%7B%5Cpartial+S%7D%7B%5Cpartial+p_i%7D+%2B+%5Cfrac%7BDq_i%7D%7BDt%7D%5Cfrac%7B%5Cpartial+S%7D%7B%5Cpartial+q_i%7D+%7D+&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='&#92;displaystyle{ &#92;frac{dS(p(t),q(t) ) }{dt} &#92;leq &#92;sum_{i &#92;in B} &#92;frac{Dp_i}{Dt}&#92;frac{&#92;partial S}{&#92;partial p_i} + &#92;frac{Dq_i}{Dt}&#92;frac{&#92;partial S}{&#92;partial q_i} } ' title='&#92;displaystyle{ &#92;frac{dS(p(t),q(t) ) }{dt} &#92;leq &#92;sum_{i &#92;in B} &#92;frac{Dp_i}{Dt}&#92;frac{&#92;partial S}{&#92;partial p_i} + &#92;frac{Dq_i}{Dt}&#92;frac{&#92;partial S}{&#92;partial q_i} } ' class='latex' /></p>
<p>This is a version of the Second Law of Thermodynamics for open Markov processes.</p>
<p>It is important to notice that the sum is only over the boundary states! This inequality tells us that relative entropy still decreases inside our process, but depending on the flow of populations through the boundary states the relative entropy of the whole process could either increase or decrease! This inequality will be important when we study how the relative entropy changes in different parts of a bigger more complicated process.</p>
<p>That is all for now, but I leave it as an exercise for you to imagine a Markov process that describes your life. How many states does it have? What are the relative transition rates? Are there states you would like to spend more or less time in? Are there states somewhere you would like to visit?</p>
<p>Here is my paper, which proves the above inequality:</p>
<p>&bull; Blake Pollard, <a href="http://arxiv.org/abs/1410.6531">A Second Law for open Markov processes</a>.</p>
<p>If you have comments or corrections, let me know!</p>
]]></html><thumbnail_url><![CDATA[https://i1.wp.com/math.ucr.edu/home/baez/mathematical/pollard_markov/drunkwalk.png?fit=440%2C330]]></thumbnail_url><thumbnail_height><![CDATA[294]]></thumbnail_height><thumbnail_width><![CDATA[440]]></thumbnail_width></oembed>