<?xml version="1.0" encoding="UTF-8" standalone="yes"?><oembed><version><![CDATA[1.0]]></version><provider_name><![CDATA[Software is Crap]]></provider_name><provider_url><![CDATA[https://davmac.wordpress.com]]></provider_url><author_name><![CDATA[davmac]]></author_name><author_url><![CDATA[https://davmac.wordpress.com/author/davmac/]]></author_url><title><![CDATA[Mysql, and C99 aliasing rules. A detective&nbsp;story.]]></title><type><![CDATA[link]]></type><html><![CDATA[<p>When running mythfrontend after upgrading Mysql (to 5.1.40) I got:</p>
<blockquote><p>Driver error was [2/1210]:<br />
QMYSQL3: Unable to execute statement<br />
Database error was:<br />
Incorrect arguments to mysql_stmt_execute</p></blockquote>
<p>After a bit of Google-ing I found a <a href="http://svn.mythtv.org/trac/ticket/6683">Myth bug ticket</a> which seems to refer to the same (or a very similar) problem. The problem is apparently due to a <a href="http://bugs.mysql.com/bug.php?id=19694">Mysql bug</a> and has something to do with an invalid cast and gcc optimizations, which makes it easy enough to work around &#8211; except of course that the bug was apparently fixed many versions ago.</p>
<p>However, conveniently, changing my configure options for Mysql (from -O3 to -O1 in my CFLAGS and CXXFLAGS) does make the problem go away. I suspected alias analysis to be the problematic optimization, so I checked whether &#8220;-O3 -fno-strict-aliasing&#8221; was enough &#8211; and it was. I resolved to track down the bug.</p>
<p>I started by compiling Mysql twice, in two different trees &#8211; one with my regular options and one with -fno-strict-aliasing. Then, I copied object files from the first tree to the second and repeated &#8220;make install&#8221; until the problem appeared (actually, I had to copy the .o file, as well as the stupidly hidden .libs/*.o file that libtool creates, and then touch the *.lo file; the build system is broken enough that just changing an .o file won&#8217;t cause the library to rebuilt). Fortunately I found the right file first try: it was libmysql.o (or rather, libmysql.c).</p>
<p>Unfortunately, libmysql.c is over  5000 lines long, and I didn&#8217;t know which function was causing the problem, so I had no idea where to look in the code. I proceeded by compiling the file both with and without &#8220;-fno-strict-aliasing&#8221; and with the &#8220;-save-temps&#8221; option so I could grab the generated assembly (i.e. libmysql.s) for each case. Then I did a manual binary search by hand-editing functions from one file into the other, assembling it and re-building the library until I was able to find which function had the problem. It turned out to be &#8220;cli_stmt_execute&#8221;. At that point I decided to look through the generated assembly and see whether I could spot where it didn&#8217;t seem to match up with what was expected.  First, though, I got the size of the function down by marking various other static function that it was calling with __attribute__((noinline)), each time re-testing to make sure that the problem still occurred.</p>
<p>Anyway, enough with the suspense. Here is the problematic line of code (2553 in libmysql.c):</p>
<pre>    for (param= stmt-&gt;params; param &lt; param_end; param++)
        store_param_type((char**) &amp;net-&gt;write_pos, param);</pre>
<p>The problem of course is the cast &#8220;(char **)&#8221; which casts to an incompatible type, thus the value net-&gt;write_pos is accessed as a &#8220;char *&#8221; when it is declared as a &#8220;uchar *&#8221; (&#8220;uchar&#8221; is a typedef for &#8220;unsigned char&#8221;). Now &#8220;char&#8221; and &#8220;unsigned char&#8221; are compatible types, but &#8220;char *&#8221; and &#8220;unsigned char *&#8221; are <em>not</em>. That means, if you access some value via a &#8220;char *&#8221; reference then you must always do so and never through any incompatible reference <em>including</em> &#8220;unsigned char *&#8221;. Doing so breaks the strict aliasing rules.</p>
<p>In this case, store_param_type is actually meant to modify net-&gt;writepos:</p>
<pre>    static void store_param_type(char **pos, MYSQL_BIND *param)
    {
        uint typecode= param-&gt;buffer_type | (param-&gt;is_unsigned ? 32768 : 0);
        int2store(*pos, typecode);
        *pos+= 2;
    }</pre>
<p>But, as gcc inlines the call, it effectively sees the &#8220;*pos+=2&#8221; as updating a location that must be different to the source location (net-&gt;write_pos) seeing as the types are incompatible, in other words, it effectively sees &#8220;*pos = *otherpos + 2&#8221; (where otherpos is an &#8220;unsigned char **&#8221;). Then, it sees that *otherpos must be constant for the duration of the loop, seeing as a value of a compatible type is not written. This means it can defer the statement and collapse its multiple instances (one per iteration of the loop) to a single instance. The result? kaboom.</p>
<p>The solution? Change the type of the &#8220;pos&#8221; parameter for store_param_type() from &#8220;char **&#8221; to &#8220;unsigned char **&#8221; (and remove the cast from the call in cli_stmt_execute()). The function isn&#8217;t used anywhere else so this makes sense even if it wasn&#8217;t causing a bug.</p>
<p>And the lesson to be learnt from this: be very careful with pointer casts. Avoid them whenever possible. And <em>learn</em>, I mean <em>really learn </em>the aliasing rules. There are way too many pieces of software out there written by people who do not understand them and the result is a buggy software.</p>
<p><a href="http://bugs.mysql.com/48284">Mysql bug</a> filed.</p>
]]></html></oembed>