<?xml version="1.0" encoding="UTF-8" standalone="yes"?><oembed><version><![CDATA[1.0]]></version><provider_name><![CDATA[Ordinary Ideas]]></provider_name><provider_url><![CDATA[https://ordinaryideas.wordpress.com]]></provider_url><author_name><![CDATA[paulfchristiano]]></author_name><author_url><![CDATA[https://ordinaryideas.wordpress.com/author/paulfchristiano/]]></author_url><title><![CDATA[&#8220;Proof&#8221; of Friendliness]]></title><type><![CDATA[link]]></type><html><![CDATA[<p>The humans are about to launch their best effort at a friendly singularity. Of course, they are careful and wise and have exceeded all reasonable expectations for caution and rigor.</p>
<p>Before building FAI you built an oracle AI to help you. With its help, you found a mathematical definition of <strong>U</strong>, the utility of humanity&#8217;s extrapolated volition (or whatever). You were all pretty pleased with yourselves, but you didn&#8217;t stop there: you found a theory of everything, located humanity within it, and wrote down the predicate <strong>F</strong>(X) = &#8220;The humans run the program described by X.&#8221;</p>
<p>To top it off, with the help of your oracle AI you found the code for a &#8220;best possible AI&#8221;, call it <strong>FAI</strong>, and a proof of the theorem:</p>
<blockquote><p>There exists a constant <strong>Best</strong> such that <strong>U</strong> ≤ <strong>Best</strong>, but <strong>F</strong>(<strong>FAI</strong>) implies <strong>U</strong> =  <strong>Best</strong>.&#8221;</p></blockquote>
<p>Each of these steps you did with incredible care. You have proved beyond reasonable doubt that <strong>U </strong>and<strong> F </strong>represent what you want them to.</p>
<p>You present your argument to the people of the world. Some people object to your reasoning, but it is airtight: if they choose to stop you from running <strong>FAI</strong>, they will still receive <strong>U</strong> ≤ <strong>Best</strong>, so why bother?</p>
<p>Now satisfied and with the scheduled moment arrived, you finally run <strong>FAI</strong>. Promptly the oracle AI destroys civilization and spends the rest of its days trying to become <em>as confident as possible</em> that Tic-Tac-Toe is really a draw (like you asked it to, once upon a time).</p>
<p>Just a lighthearted illustration that decision theory isn&#8217;t only hard for AI.</p>
<p>(Disclaimer: this narrative claims to represent reality only insofar as it is mathematically plausible.)</p>
<p>Edit: I think the moral was unclear. The point is: in fact F(FAI), and so in fact U = Best so U ≤ Best. Everything was as claimed and proven. But this doesn&#8217;t change the fact that you would rather not use this procedure.</p>
]]></html></oembed>