<?xml version="1.0" encoding="UTF-8" standalone="yes"?><oembed><version><![CDATA[1.0]]></version><provider_name><![CDATA[Azimuth]]></provider_name><provider_url><![CDATA[https://johncarlosbaez.wordpress.com]]></provider_url><author_name><![CDATA[John Baez]]></author_name><author_url><![CDATA[https://johncarlosbaez.wordpress.com/author/johncarlosbaez/]]></author_url><title><![CDATA[More Second Laws of&nbsp;Thermodynamics]]></title><type><![CDATA[link]]></type><html><![CDATA[<p><a href="http://www.quantumlah.org/people/Dahlsten">Oscar Dahlsten</a> is visiting the Centre for Quantum Technologies, so we&#8217;re continuing some conversations about entropy that we started last year, back when the <a href="https://johncarlosbaez.wordpress.com/2011/02/10/rnyi-entropy-and-free-energy/">Entropy Club</a> was active.  But now <a href="http://www.cs.ox.ac.uk/people/jamie.vicary/">Jamie Vicary</a> and <a href="http://www.azimuthproject.org/azimuth/show/Brendan+Fong">Brendan Fong</a> are involved in the conversations.</p>
<p>I was surprised when Oscar told me that for a large class of random processes, the usual second law of thermodynamics is just one of infinitely many laws saying that various kinds of disorder increase.  I&#8217;m annoyed that nobody ever told me about this before! It&#8217;s as if they told me about conservation of <i>energy</i> but not conservation of <i>schmenergy</i>, and <i>phlenergy</i>, and <i>zenergy</i>&#8230;</p>
<p>So I need to tell you about this.  You may not understand it, but at least I can say I tried.  I don&#8217;t want you blaming <i>me</i> for concealing all these extra second laws of thermodynamics!</p>
<p>Here&#8217;s the basic idea.   Not all random processes are guaranteed to make entropy increase.  But a bunch of them always make probability distributions flatter in a certain precise sense.  This makes the entropy of the probability distribution increase.  But when you make a probability distribution flatter in this sense, a bunch of other quantities increase too!  For example, besides the usual entropy, there are infinitely many other kinds of entropy, called &#8216;R&eacute;nyi entropies&#8217;, one for each number between 0 and &infin;.  And a doubly stochastic operator makes <i>all</i> the R&eacute;nyi entropies increase!  This fact is a special case of Theorem 10 here:</p>
<p>&bull; Tim van Erven and Peter Harremoës, <a href="http://arxiv.org/abs/1001.4448">Rényi divergence and majorization</a>.  </p>
<p>Let me state this fact precisely, and then say a word about how this is related to quantum theory and &#8216;the collapse of the wavefunction&#8217;.</p>
<p>To keep things simple let&#8217;s talk about probability distributions on a finite set, though Erven and Harremoës generalize it all to a measure space.  </p>
<p>How do we make precise the concept that one probability distribution is flatter than another?  You know it when you see it, at least some of the time.  For example, suppose I have some system in thermal equilibrium at some temperature, and the probabilities of it being in various states look like this:</p>
<div align="center"><img width="250" src="https://i0.wp.com/math.ucr.edu/home/baez/biodiversity/probabilities_T=1.png" alt="" /></div>
<p>Then say I triple the temperature.   The probabilities flatten out:</p>
<div align="center"><img width="250" src="https://i0.wp.com/math.ucr.edu/home/baez/biodiversity/probabilities_T=3.png" alt="" /></div>
<p>But how can we make this concept precise in a completely general way?  We can do it using the concept of &#8216;majorization&#8217;.  If one probability distribution is <i>less</i> flat than another, people say it &#8216;majorizes&#8217; that other one. </p>
<p>Here&#8217;s the definition.  Say we have two probability distributions <img src='https://s0.wp.com/latex.php?latex=p&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='p' title='p' class='latex' /> and <img src='https://s0.wp.com/latex.php?latex=q&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='q' title='q' class='latex' /> on the same set.  For each one, list the probabilities in decreasing order:</p>
<p><img src='https://s0.wp.com/latex.php?latex=p_1+%5Cge+p_2+%5Cge+%5Ccdots+%5Cge+p_n+&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='p_1 &#92;ge p_2 &#92;ge &#92;cdots &#92;ge p_n ' title='p_1 &#92;ge p_2 &#92;ge &#92;cdots &#92;ge p_n ' class='latex' /></p>
<p><img src='https://s0.wp.com/latex.php?latex=q_1+%5Cge+q_2+%5Cge+%5Ccdots+%5Cge+q_n+&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='q_1 &#92;ge q_2 &#92;ge &#92;cdots &#92;ge q_n ' title='q_1 &#92;ge q_2 &#92;ge &#92;cdots &#92;ge q_n ' class='latex' /></p>
<p>Then we say <img src='https://s0.wp.com/latex.php?latex=p&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='p' title='p' class='latex' /> <b> <a href="http://en.wikipedia.org/wiki/Majorization">majorizes</a></b> <img src='https://s0.wp.com/latex.php?latex=q&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='q' title='q' class='latex' /> if </p>
<p><img src='https://s0.wp.com/latex.php?latex=p_1+%2B+%5Ccdots+%2B+p_k+%5Cge+q_1+%2B+%5Ccdots+%2B+q_k+&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='p_1 + &#92;cdots + p_k &#92;ge q_1 + &#92;cdots + q_k ' title='p_1 + &#92;cdots + p_k &#92;ge q_1 + &#92;cdots + q_k ' class='latex' /></p>
<p>for all <img src='https://s0.wp.com/latex.php?latex=1+%5Cle+k+%5Cle+n.&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='1 &#92;le k &#92;le n.' title='1 &#92;le k &#92;le n.' class='latex' />  So, the idea is that the biggest probabilities in the distribution <img src='https://s0.wp.com/latex.php?latex=p&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='p' title='p' class='latex' /> add up to more than the corresponding biggest ones in <img src='https://s0.wp.com/latex.php?latex=q.&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='q.' title='q.' class='latex' /></p>
<p>In 1960, Alfred R&eacute;nyi defined a generalization of the usual Shannon entropy that depends on a parameter <img src='https://s0.wp.com/latex.php?latex=%5Cbeta.&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='&#92;beta.' title='&#92;beta.' class='latex' />  If <img src='https://s0.wp.com/latex.php?latex=p&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='p' title='p' class='latex' /> is a probability distribution on a finite set, its <b><a href="https://johncarlosbaez.wordpress.com/2011/02/10/rnyi-entropy-and-free-energy/">R&eacute;nyi entropy</a></b> of order <img src='https://s0.wp.com/latex.php?latex=%5Cbeta&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='&#92;beta' title='&#92;beta' class='latex' /> is defined to be</p>
<p><img src='https://s0.wp.com/latex.php?latex=%5Cdisplaystyle%7B+H_%5Cbeta%28p%29+%3D+%5Cfrac%7B1%7D%7B1+-+%5Cbeta%7D+%5Cln+%5Csum_i+p_i%5E%5Cbeta+%7D+&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='&#92;displaystyle{ H_&#92;beta(p) = &#92;frac{1}{1 - &#92;beta} &#92;ln &#92;sum_i p_i^&#92;beta } ' title='&#92;displaystyle{ H_&#92;beta(p) = &#92;frac{1}{1 - &#92;beta} &#92;ln &#92;sum_i p_i^&#92;beta } ' class='latex' /></p>
<p>where <img src='https://s0.wp.com/latex.php?latex=0+%5Cle+%5Cbeta+%3C+%5Cinfty.&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='0 &#92;le &#92;beta &lt; &#92;infty.' title='0 &#92;le &#92;beta &lt; &#92;infty.' class='latex' />   Well, to be honest: if <img src='https://s0.wp.com/latex.php?latex=%5Cbeta&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='&#92;beta' title='&#92;beta' class='latex' /> is 0, 1, or <img src='https://s0.wp.com/latex.php?latex=%5Cinfty&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='&#92;infty' title='&#92;infty' class='latex' /> we have to define this by taking a limit where we let <img src='https://s0.wp.com/latex.php?latex=%5Cbeta&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='&#92;beta' title='&#92;beta' class='latex' /> creep up to that value.  But the limit exists, and when <img src='https://s0.wp.com/latex.php?latex=%5Cbeta+%3D+1&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='&#92;beta = 1' title='&#92;beta = 1' class='latex' /> we get the usual Shannon entropy</p>
<p><img src='https://s0.wp.com/latex.php?latex=%5Cdisplaystyle%7B+H_1%28p%29+%3D+-+%5Csum_i+p_i+%5Cln%28p_i%29+%7D+&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='&#92;displaystyle{ H_1(p) = - &#92;sum_i p_i &#92;ln(p_i) } ' title='&#92;displaystyle{ H_1(p) = - &#92;sum_i p_i &#92;ln(p_i) } ' class='latex' /></p>
<p>As I explained a while ago, R&eacute;nyi entropies are important ways of measuring <a href="https://johncarlosbaez.wordpress.com/2012/07/02/the-mathematics-of-biodiversity-part-4/">biodiversity</a>.  But here&#8217;s what I learned just now, from the paper by Erven and Harremoës:</p>
<p><b>Theorem 1.</b>  If a probability distribution <img src='https://s0.wp.com/latex.php?latex=p&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='p' title='p' class='latex' /> majorizes a probability distribution <img src='https://s0.wp.com/latex.php?latex=q%2C&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='q,' title='q,' class='latex' /> its R&eacute;nyi entropies are smaller:</p>
<p><img src='https://s0.wp.com/latex.php?latex=%5Cdisplaystyle%7B+H_%5Cbeta%28p%29+%5Cle+H_%5Cbeta%28q%29+%7D+&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='&#92;displaystyle{ H_&#92;beta(p) &#92;le H_&#92;beta(q) } ' title='&#92;displaystyle{ H_&#92;beta(p) &#92;le H_&#92;beta(q) } ' class='latex' /></p>
<p>for all <img src='https://s0.wp.com/latex.php?latex=0+%5Cle+%5Cbeta+%3C+%5Cinfty.&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='0 &#92;le &#92;beta &lt; &#92;infty.' title='0 &#92;le &#92;beta &lt; &#92;infty.' class='latex' /></p>
<p>And here&#8217;s what makes this fact so nice.  If you do something to a classical system in a way that might involve some randomness, we can describe your action using a stochastic matrix.   An <img src='https://s0.wp.com/latex.php?latex=n+%5Ctimes+n&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='n &#92;times n' title='n &#92;times n' class='latex' /> matrix <img src='https://s0.wp.com/latex.php?latex=T&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='T' title='T' class='latex' /> is called <b><a href="http://en.wikipedia.org/wiki/Stochastic_matrix">stochastic</a></b> if whenever <img src='https://s0.wp.com/latex.php?latex=p+%5Cin+%5Cmathbb%7BR%7D%5En&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='p &#92;in &#92;mathbb{R}^n' title='p &#92;in &#92;mathbb{R}^n' class='latex' /> is a probability distribution, so is <img src='https://s0.wp.com/latex.php?latex=T+p.&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='T p.' title='T p.' class='latex' /> This is equivalent to saying:</p>
<p>&bull;  the matrix entries of <img src='https://s0.wp.com/latex.php?latex=T&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='T' title='T' class='latex' /> are all <img src='https://s0.wp.com/latex.php?latex=%5Cge+0%2C&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='&#92;ge 0,' title='&#92;ge 0,' class='latex' /> and</p>
<p>&bull;  each column of <img src='https://s0.wp.com/latex.php?latex=T&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='T' title='T' class='latex' /> sums to 1.</p>
<p>If <img src='https://s0.wp.com/latex.php?latex=T&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='T' title='T' class='latex' /> is stochastic, it&#8217;s not necessarily true that the entropy of <img src='https://s0.wp.com/latex.php?latex=T+p&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='T p' title='T p' class='latex' /> is greater than or equal to that of <img src='https://s0.wp.com/latex.php?latex=p%2C&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='p,' title='p,' class='latex' /> not even for the Shannon entropy.  </p>
<p><b>Puzzle 1.</b> Find a counterexample.</p>
<p>However, entropy does increase if we use specially nice stochastic matrices called &#8216;doubly stochastic&#8217; matrices.   People say a matrix <img src='https://s0.wp.com/latex.php?latex=T&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='T' title='T' class='latex' /> <b><a href="http://en.wikipedia.org/wiki/Doubly_stochastic_matrix">doubly stochastic</a></b> if it&#8217;s stochastic and it maps the probability distribution</p>
<p><img src='https://s0.wp.com/latex.php?latex=%5Cdisplaystyle%7B+p_0+%3D+%28%5Cfrac%7B1%7D%7Bn%7D%2C+%5Cdots%2C+%5Cfrac%7B1%7D%7Bn%7D%29+%7D+&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='&#92;displaystyle{ p_0 = (&#92;frac{1}{n}, &#92;dots, &#92;frac{1}{n}) } ' title='&#92;displaystyle{ p_0 = (&#92;frac{1}{n}, &#92;dots, &#92;frac{1}{n}) } ' class='latex' /></p>
<p>to itself.  This is the most spread-out probability distribution of all: every other probability distribution majorizes this one.</p>
<p>Why do they call such matrices &#8216;doubly&#8217; stochastic?  Well, if you&#8217;ve got a stochastic matrix, each <i>column</i> sums to one.  But a stochastic operator is doubly stochastic if and only if each <i>row</i> sums to 1 as well.</p>
<p>Here&#8217;s a really cool fact:</p>
<p><b>Theorem 2.</b>  If <img src='https://s0.wp.com/latex.php?latex=T&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='T' title='T' class='latex' /> is doubly stochastic, <img src='https://s0.wp.com/latex.php?latex=p&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='p' title='p' class='latex' /> majorizes <img src='https://s0.wp.com/latex.php?latex=T+p&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='T p' title='T p' class='latex' /> for any probability distribution <img src='https://s0.wp.com/latex.php?latex=p+%5Cin+%5Cmathbb%7BR%7D%5En.&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='p &#92;in &#92;mathbb{R}^n.' title='p &#92;in &#92;mathbb{R}^n.' class='latex' />  Conversely, if a probability distribution <img src='https://s0.wp.com/latex.php?latex=p&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='p' title='p' class='latex' /> majorizes a probability distribution <img src='https://s0.wp.com/latex.php?latex=q%2C&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='q,' title='q,' class='latex' /> then <img src='https://s0.wp.com/latex.php?latex=q+%3D+T+p&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='q = T p' title='q = T p' class='latex' /> for some doubly stochastic matrix <img src='https://s0.wp.com/latex.php?latex=T&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='T' title='T' class='latex' />.</p>
<p>Taken together, Theorems 1 and 2 say that doubly stochastic transformations increase entropy&#8230; but not just Shannon entropy!  They increase all the different R&eacute;nyi entropies, as well.  So if time evolution is described by a doubly stochastic matrix, we get <i>lots</i> of &#8216;second laws of thermodynamics&#8217;, saying that all these different kinds of entropy increase!</p>
<p>Finally, what does all this have to do with quantum mechanics, and collapsing the wavefunction?  There are different things to say, but this is the simplest:</p>
<p><b>Theorem 3.</b>  Given two probability distributions <img src='https://s0.wp.com/latex.php?latex=p%2C+q+%5Cin+%5Cmathbb%7BR%7D%5En&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='p, q &#92;in &#92;mathbb{R}^n' title='p, q &#92;in &#92;mathbb{R}^n' class='latex' />, then <img src='https://s0.wp.com/latex.php?latex=p&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='p' title='p' class='latex' /> majorizes <img src='https://s0.wp.com/latex.php?latex=q&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='q' title='q' class='latex' /> if and only there exists a self-adjoint matrix <img src='https://s0.wp.com/latex.php?latex=D&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='D' title='D' class='latex' /> with eigenvalues <img src='https://s0.wp.com/latex.php?latex=p_i&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='p_i' title='p_i' class='latex' /> and diagonal entries <img src='https://s0.wp.com/latex.php?latex=q_i.&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='q_i.' title='q_i.' class='latex' /></p>
<p>The matrix <img src='https://s0.wp.com/latex.php?latex=D&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='D' title='D' class='latex' /> will be a <a href="http://en.wikipedia.org/wiki/Density_matrix"><b>density matrix</b></a>: a self-adjoint matrix with positive eigenvalues and trace equal to 1.  We use such matrices to describe mixed states in quantum mechanics.  </p>
<p>Theorem 3 gives a precise sense in which preparing a quantum system in some state, letting time evolve, and then measuring it &#8216;increases randomness&#8217;.  </p>
<p>How?  Well, suppose we have a quantum system whose Hilbert space is <img src='https://s0.wp.com/latex.php?latex=%5Cmathbb%7BC%7D%5En.&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='&#92;mathbb{C}^n.' title='&#92;mathbb{C}^n.' class='latex' />  If we prepare the system in a mixture of the standard basis states with probabilities <img src='https://s0.wp.com/latex.php?latex=p_i%2C&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='p_i,' title='p_i,' class='latex' /> we can describe it with a diagonal density matrix <img src='https://s0.wp.com/latex.php?latex=D_0.&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='D_0.' title='D_0.' class='latex' />  Then suppose we wait a while and some unitary time evolution occurs.  The system is now described by a new density matrix </p>
<p><img src='https://s0.wp.com/latex.php?latex=D+%3D+U+D_0+%5C%2C+U%5E%7B-1%7D+&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='D = U D_0 &#92;, U^{-1} ' title='D = U D_0 &#92;, U^{-1} ' class='latex' /></p>
<p>where <img src='https://s0.wp.com/latex.php?latex=U&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='U' title='U' class='latex' /> is some unitary operator.  If we then do a measurement to see which of the standard basis states our system now lies in, we&#8217;ll get the different possible results with probabilities <img src='https://s0.wp.com/latex.php?latex=q_i%2C&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='q_i,' title='q_i,' class='latex' /> the diagonal entries of <img src='https://s0.wp.com/latex.php?latex=D.&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='D.' title='D.' class='latex' />  But the eigenvalues of <img src='https://s0.wp.com/latex.php?latex=D&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='D' title='D' class='latex' /> will still be the numbers <img src='https://s0.wp.com/latex.php?latex=p_i.&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='p_i.' title='p_i.' class='latex' />  So, by the theorem, <img src='https://s0.wp.com/latex.php?latex=p&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='p' title='p' class='latex' /> majorizes <img src='https://s0.wp.com/latex.php?latex=q&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='q' title='q' class='latex' />!</p>
<p>So, not only Shannon entropy but also all the R&eacute;nyi entropies will increase!</p>
<p>Of course, there are some big physics questions lurking here.  Like: <i>what about the real world?</i>   In the real world, do lots of different kinds of entropy tend to increase, or just some?  </p>
<p>Of course, there&#8217;s a huge famous old problem about how reversible time evolution can be compatible with <i>any</i> sort of law saying that entropy must always increase!   Still, there are some arguments, going back to Boltzmann&#8217;s H-theorem, which show entropy increases under some extra conditions.  So then we can ask if other kinds of entropy, like R&eacute;nyi entropy, increase as well.  This will be true whenever we can argue that time evolution is described by doubly stochastic matrices.  Theorem 3 gives a partial answer, but there&#8217;s probably much more to say. </p>
<p>I don&#8217;t have much more to say right now, though. I&#8217;ll just point out that while doubly stochastic matrices map the &#8216;maximally smeared-out&#8217; probability distribution</p>
<p><img src='https://s0.wp.com/latex.php?latex=%5Cdisplaystyle%7B+p_0+%3D+%28%5Cfrac%7B1%7D%7Bn%7D%2C+%5Cdots%2C+%5Cfrac%7B1%7D%7Bn%7D%29+%7D+&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='&#92;displaystyle{ p_0 = (&#92;frac{1}{n}, &#92;dots, &#92;frac{1}{n}) } ' title='&#92;displaystyle{ p_0 = (&#92;frac{1}{n}, &#92;dots, &#92;frac{1}{n}) } ' class='latex' /></p>
<p>to itself, a lot of this theory generalizes to stochastic matrices that map exactly one <i>other</i> probability distribution to itself.  We need to work with <a href="http://en.wikipedia.org/wiki/R%C3%A9nyi_entropy#R.C3.A9nyi_divergence">relative R&eacute;nyi entropy</a> instead of R&eacute;nyi entropy, and so on, but I don&#8217;t think these adjustments are really a big deal.  And there are nice theorems that let you know when a stochastic matrix maps exactly one probability distribution to itself, based on the <a href="https://johncarlosbaez.wordpress.com/2012/08/06/network-theory-part-20/">Perron&ndash;Frobenius theorem</a>.</p>
<h3> References </h3>
<p>I already gave you a reference for Theorem 1, namely the paper by Erven and Harremoës, though I don&#8217;t think they were the first to prove this particular result: they generalize it quite a lot.</p>
<p>What about Theorem 2?  It goes back at least to here:</p>
<p>&bull; Barry C. Arnold, <i>Majorization and the Lorenz Order: A Brief Introduction</i>, Springer Lecture Notes in Statistics <b>43</b>, Springer, Berlin, 1987.</p>
<p>The partial order on probability distributions given by majorization is also called the &#8216;Lorenz order&#8217;, but mainly when we consider probability distributions on infinite sets.  This name  presumably comes from the <a href="http://en.wikipedia.org/wiki/Lorenz_curve">Lorenz curve</a>, a measure of income inequality.  This curve shows for the bottom x% of households, what percentage y% of the total income they have:</p>
<div align="center"><a href="http://en.wikipedia.org/wiki/Lorenz_curve"><img src="https://i1.wp.com/upload.wikimedia.org/wikipedia/commons/thumb/5/59/Economics_Gini_coefficient2.svg/400px-Economics_Gini_coefficient2.svg.png" /></a></div>
<p><b>Puzzle 2.</b> If you&#8217;ve got two different probability distributions of incomes, and one majorizes the other, how are their Lorenz curves related?</p>
<p>When we generalize majorization by letting some other probability distribution take the place of</p>
<p><img src='https://s0.wp.com/latex.php?latex=%5Cdisplaystyle%7B+p_0+%3D+%28%5Cfrac%7B1%7D%7Bn%7D%2C+%5Cdots%2C+%5Cfrac%7B1%7D%7Bn%7D%29+%7D+&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='&#92;displaystyle{ p_0 = (&#92;frac{1}{n}, &#92;dots, &#92;frac{1}{n}) } ' title='&#92;displaystyle{ p_0 = (&#92;frac{1}{n}, &#92;dots, &#92;frac{1}{n}) } ' class='latex' /></p>
<p>it seems people call it the &#8216;Markov order&#8217;.   Here&#8217;s a really fascinating paper on that, which I&#8217;m just barely beginning to understand:</p>
<p>&bull; A. N. Gorban, P. A. Gorban and G. Judge, <a href="http://arxiv.org/abs/1003.1377">Entropy: the Markov ordering approach</a>, <i><a href="http://www.mdpi.com/1099-4300/12/5/1145">Entropy</a></i> <b>12</b> (2010), 1145&#8211;1193. </p>
<p>What about Theorem 3?  Apparently it goes back to here:</p>
<p>&bull; A. Uhlmann, <i>Wiss. Z. Karl-Marx-Univ. Leipzig</i> <b>20</b> (1971), 633.</p>
<p>though I only know this thanks to a more recent paper:</p>
<p>&bull; Michael A. Nielsen, <a href="http://arxiv.org/abs/quant-ph/9811053">Conditions for a class of entanglement transformations</a>, <i>Phys. Rev. Lett.</i> <b>83</b> (1999), 436&#8211;439.</p>
<p>By the way, Nielsen&#8217;s paper contains another very nice result about majorization!  Suppose you have states <img src='https://s0.wp.com/latex.php?latex=%5Cpsi&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='&#92;psi' title='&#92;psi' class='latex' /> and <img src='https://s0.wp.com/latex.php?latex=%5Cphi&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='&#92;phi' title='&#92;phi' class='latex' /> of a 2-part quantum system.  You can trace out one part and get density matrices describing mixed states of the other part, say <img src='https://s0.wp.com/latex.php?latex=D_%5Cpsi&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='D_&#92;psi' title='D_&#92;psi' class='latex' /> and <img src='https://s0.wp.com/latex.php?latex=D_%5Cphi&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='D_&#92;phi' title='D_&#92;phi' class='latex' />.  Then Nielsen shows you can get from <img src='https://s0.wp.com/latex.php?latex=%5Cpsi&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='&#92;psi' title='&#92;psi' class='latex' /> to <img src='https://s0.wp.com/latex.php?latex=%5Cphi&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='&#92;phi' title='&#92;phi' class='latex' /> using &#8216;local operations and classical communication&#8217; if and only if <img src='https://s0.wp.com/latex.php?latex=D_%5Cphi&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='D_&#92;phi' title='D_&#92;phi' class='latex' /> majorizes <img src='https://s0.wp.com/latex.php?latex=D_%5Cpsi&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='D_&#92;psi' title='D_&#92;psi' class='latex' />.  Note that things are going backwards here compared to how they&#8217;ve been going in the rest of this post: if we can get from <img src='https://s0.wp.com/latex.php?latex=%5Cpsi&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='&#92;psi' title='&#92;psi' class='latex' /> to <img src='https://s0.wp.com/latex.php?latex=%5Cphi&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='&#92;phi' title='&#92;phi' class='latex' />, then all forms of entropy go <i>down</i> when we go from <img src='https://s0.wp.com/latex.php?latex=D_%5Cpsi&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='D_&#92;psi' title='D_&#92;psi' class='latex' /> to <img src='https://s0.wp.com/latex.php?latex=D_%5Cphi&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='D_&#92;phi' title='D_&#92;phi' class='latex' />!  This &#8216;anti-second-law&#8217; behavior is confusing at first, but <a href="https://johncarlosbaez.wordpress.com/2011/06/02/a-characterization-of-entropy/">familiar to me by now</a>.</p>
<p>When I first learned all this stuff, I naturally thought of the following question&#8212;maybe you did too, just now.  If <img src='https://s0.wp.com/latex.php?latex=p%2C+q+%5Cin+%5Cmathbb%7BR%7D%5En&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='p, q &#92;in &#92;mathbb{R}^n' title='p, q &#92;in &#92;mathbb{R}^n' class='latex' /> are probability distributions and </p>
<p><img src='https://s0.wp.com/latex.php?latex=%5Cdisplaystyle%7B+H_%5Cbeta%28p%29+%5Cle+H_%5Cbeta%28q%29+%7D+&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='&#92;displaystyle{ H_&#92;beta(p) &#92;le H_&#92;beta(q) } ' title='&#92;displaystyle{ H_&#92;beta(p) &#92;le H_&#92;beta(q) } ' class='latex' /></p>
<p>for all  <img src='https://s0.wp.com/latex.php?latex=0+%5Cle+%5Cbeta+%3C+%5Cinfty&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='0 &#92;le &#92;beta &lt; &#92;infty' title='0 &#92;le &#92;beta &lt; &#92;infty' class='latex' />, is it true that <img src='https://s0.wp.com/latex.php?latex=p&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='p' title='p' class='latex' /> majorizes <img src='https://s0.wp.com/latex.php?latex=q&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='q' title='q' class='latex' />?</p>
<p>Apparently the answer must be <i>no</i>, because Klimesh has gone to quite a bit of work to obtain a weaker conclusion: not that <img src='https://s0.wp.com/latex.php?latex=p&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='p' title='p' class='latex' /> majorizes <img src='https://s0.wp.com/latex.php?latex=q&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='q' title='q' class='latex' />, but that <img src='https://s0.wp.com/latex.php?latex=p+%5Cotimes+r&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='p &#92;otimes r' title='p &#92;otimes r' class='latex' /> majorizes <img src='https://s0.wp.com/latex.php?latex=q+%5Cotimes+r&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='q &#92;otimes r' title='q &#92;otimes r' class='latex' /> for some probability distribution <img src='https://s0.wp.com/latex.php?latex=r+%5Cin+%5Cmathbb%7BR%7D%5Em.&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='r &#92;in &#92;mathbb{R}^m.' title='r &#92;in &#92;mathbb{R}^m.' class='latex' />  He calls this <b>catalytic majorization</b>, with <img src='https://s0.wp.com/latex.php?latex=r&#038;bg=ffffff&#038;fg=000&#038;s=0' alt='r' title='r' class='latex' /> serving as a &#8216;catalyst&#8217;:</p>
<p>&bull; Matthew Klimesh, <a href="http://arxiv.org/abs/0709.3680">Inequalities that collectively completely characterizes the catalytic majorization relation</a>.</p>
<p>I thank <a href="http://www.vlatkovedral.org/">Vlatko Vedral</a> here at the CQT for pointing this out!</p>
<p>Finally, here is a good general introduction to majorization, pointed out by Vasileios Anagnostopoulos:</p>
<p>&bull; T. Ando, Majorization, doubly stochastic matrices, and comparison of eigenvalues, <i><a href="http://www.sciencedirect.com/science/article/pii/0024379589905806#">Linear Algebra and its Applications</a></i> <b>118</b> (1989), 163-–248.</p>
]]></html><thumbnail_url><![CDATA[https://i0.wp.com/math.ucr.edu/home/baez/biodiversity/probabilities_T=1.png?fit=440%2C330]]></thumbnail_url><thumbnail_height><![CDATA[330]]></thumbnail_height><thumbnail_width><![CDATA[356]]></thumbnail_width></oembed>