<?xml version="1.0" encoding="UTF-8" standalone="yes"?><oembed><version><![CDATA[1.0]]></version><provider_name><![CDATA[The ryg blog]]></provider_name><provider_url><![CDATA[https://fgiesen.wordpress.com]]></provider_url><author_name><![CDATA[fgiesen]]></author_name><author_url><![CDATA[https://fgiesen.wordpress.com/author/fgiesen/]]></author_url><title><![CDATA[Triangle rasterization in&nbsp;practice]]></title><type><![CDATA[link]]></type><html><![CDATA[<p><em>This post is part of a series &#8211; go <a href="https://fgiesen.wordpress.com/2013/02/17/optimizing-sw-occlusion-culling-index/">here</a> for the index.</em></p>
<p>Welcome back! The <a href="https://fgiesen.wordpress.com/2013/02/06/the-barycentric-conspirac/">previous post</a> gave us a lot of theoretical groundwork on triangles. This time, let&#8217;s turn it into a working triangle rasterizer. Again, no profiling or optimization this time, but there will be code, and it should get us set up to talk actual rasterizer optimizations in the next post. But before we start optimizing, let&#8217;s first try to write the simplest rasterizer that we possibly can, using the primitives we saw in the last part.</p>
<h3>The basic rasterizer</h3>
<p>As we saw last time, we can calculate edge functions (which produce barycentric coordinates) as a 2&#215;2 determinant. And we also saw last time that we can check if a point is inside, on the edge or outside a triangle simply by looking at the signs of the three edge functions at that point. Our rasterizer is going to work in integer coordinates, so let&#8217;s assume for now that our triangle vertex positions and point coordinates are given as integers too. The orientation test that computes the 2&#215;2 determinant looks like this in code:</p>
<pre>
struct Point2D {
    int x, y;
};

int orient2d(const Point2D&amp; a, const Point2D&amp; b, const Point2D&amp; c)
{
    return (b.x-a.x)*(c.y-a.y) - (b.y-a.y)*(c.x-a.x);
}
</pre>
<p>Now, all we have to do to rasterize our triangle is to loop over candidate pixels and check whether they&#8217;re inside or not. We could do it brute-force and loop over all screen pixels, but let&#8217;s try to not be completely brain-dead about this: we do know that all pixels inside the triangle are also going to be inside an axis-aligned bounding box around the triangle. And axis-aligned bounding boxes are both easy to compute and trivial to traverse. This gives:</p>
<pre>
void drawTri(const Point2D&amp; v0, const Point2D&amp; v1, const Point2D&amp; v2)
{
    // Compute triangle bounding box
    int minX = min3(v0.x, v1.x, v2.x);
    int minY = min3(v0.y, v1.y, v2.y);
    int maxX = max3(v0.x, v1.x, v2.x);
    int maxY = max3(v0.y, v1.y, v2.y);

    // Clip against screen bounds
    minX = max(minX, 0);
    minY = max(minY, 0);
    maxX = min(maxX, screenWidth - 1);
    maxY = min(maxY, screenHeight - 1);

    // Rasterize
    Point2D p;
    for (p.y = minY; p.y &lt;= maxY; p.y++) {
        for (p.x = minX; p.x &lt;= maxX; p.x++) {
            // Determine barycentric coordinates
            int w0 = orient2d(v1, v2, p);
            int w1 = orient2d(v2, v0, p);
            int w2 = orient2d(v0, v1, p);

            // If p is on or inside all edges, render pixel.
            if (w0 &gt;= 0 &amp;&amp; w1 &gt;= 0 &amp;&amp; w2 &gt;= 0)
                renderPixel(p, w0, w1, w2);           
        }
    }
}
</pre>
<p>And that&#8217;s it. That&#8217;s a fully functional triangle rasterizer. In theory anyway &#8211; you need to write the <code>min</code> / <code>max</code> and <code>renderPixel</code> functions yourself, and I didn&#8217;t actually test this code, but you get the idea. It even does 2D clipping. Now, don&#8217;t get me wrong. I don&#8217;t recommend that you use this code as-is anywhere, for reasons I will explain below. But I wanted you to see this, because this is the actual heart of the algorithm. In any implementation of it that you&#8217;re ever going to see in practice, the wonderful underlying simplicity of it is going to be obscured by the various wrinkles introduced by various features and optimizations. That&#8217;s fine &#8211; all these additions are worth their price. But they are, in a sense, implementation details. Hell, even limiting the traversal to a bounding box is just an optimization, if a simple and important one. The point I&#8217;m trying to make here: This is not, at heart, a hard problem that requires a complex solution. It&#8217;s a fundamentally simple problem that can be solved much more efficiently if we apply some smarts &#8211; an important distinction.</p>
<h3>Issues with this approach</h3>
<p>All that said, let&#8217;s list some problems with this initial implementation:</p>
<ul>
<li><b>Integer overflows</b>. What if some of the computations overflow? This might be an actual problem or it might not, but at the very least we need to look into it.</li>
<li><b>Sub-pixel precision</b>. This code doesn&#8217;t have any.</li>
<li><b>Fill rules</b>. Graphics APIs specify a set of tie-breaking rules to make sure that when two non-overlapping triangles share an edge, every pixel (or sample) covered by these two triangles is lit once and only once. GPUs and software rasterizers need to strictly abide by these rules to avoid visual artifacts.</li>
<li><b>Speed</b>. While the code as given above sure is nice and short, it really isn&#8217;t particularly efficient. There&#8217;s a lot we can do make it faster, and we&#8217;ll get there in a bit, but of course this will make things more complicated.</li>
</ul>
<p>I&#8217;m going to address each of these in turn.</p>
<h3>Integer overflows</h3>
<p>Since all the computations happen in <code>orient2d</code>, that&#8217;s the only expression we actually have to look at:</p>
<p><code>(b.x-a.x)*(c.y-a.y) - (b.y-a.y)*(c.x-a.x)</code></p>
<p>Luckily, it&#8217;s pretty very symmetric, so there&#8217;s not many different sub-expressions we have to look at: Say we start with p-bit signed integer coordinates. That means the individual coordinates are in [-2<sup>p-1</sup>,2<sup>p-1</sup>-1]. By subtracting the upper bound from the lower bound (and vice versa), we can determine the bounds for the difference of the two coordinates:</p>
<p><img src="https://s0.wp.com/latex.php?latex=-%282%5Ep+-+1%29+%5Cle+b_x+-+a_x+%5Cle+2%5Ep+-+1+%5Cquad+%5CLeftrightarrow+%5Cquad+%7Cb_x+-+a_x%7C+%5Cle+2%5Ep+-+1&#038;bg=ffffff&#038;fg=000&#038;s=0&#038;c=20201002" srcset="https://s0.wp.com/latex.php?latex=-%282%5Ep+-+1%29+%5Cle+b_x+-+a_x+%5Cle+2%5Ep+-+1+%5Cquad+%5CLeftrightarrow+%5Cquad+%7Cb_x+-+a_x%7C+%5Cle+2%5Ep+-+1&#038;bg=ffffff&#038;fg=000&#038;s=0&#038;c=20201002 1x, https://s0.wp.com/latex.php?latex=-%282%5Ep+-+1%29+%5Cle+b_x+-+a_x+%5Cle+2%5Ep+-+1+%5Cquad+%5CLeftrightarrow+%5Cquad+%7Cb_x+-+a_x%7C+%5Cle+2%5Ep+-+1&#038;bg=ffffff&#038;fg=000&#038;s=0&#038;c=20201002&#038;zoom=4.5 4x" alt="-(2^p - 1) &#92;le b_x - a_x &#92;le 2^p - 1 &#92;quad &#92;Leftrightarrow &#92;quad |b_x - a_x| &#92;le 2^p - 1" class="latex" /></p>
<p>And the same applies for the other three coordinate differences we compute. Next, we compute a product of two such values. Easy enough:</p>
<p><img src="https://s0.wp.com/latex.php?latex=%7C%28b_x+-+a_x%29+%28c_y+-+a_y%29%7C+%5Cle+%7Cb_x+-+a_x%7C+%7Cc_y+-+a_y%7C+%3D+%282%5Ep+-+1%29%5E2&#038;bg=ffffff&#038;fg=000&#038;s=0&#038;c=20201002" srcset="https://s0.wp.com/latex.php?latex=%7C%28b_x+-+a_x%29+%28c_y+-+a_y%29%7C+%5Cle+%7Cb_x+-+a_x%7C+%7Cc_y+-+a_y%7C+%3D+%282%5Ep+-+1%29%5E2&#038;bg=ffffff&#038;fg=000&#038;s=0&#038;c=20201002 1x, https://s0.wp.com/latex.php?latex=%7C%28b_x+-+a_x%29+%28c_y+-+a_y%29%7C+%5Cle+%7Cb_x+-+a_x%7C+%7Cc_y+-+a_y%7C+%3D+%282%5Ep+-+1%29%5E2&#038;bg=ffffff&#038;fg=000&#038;s=0&#038;c=20201002&#038;zoom=4.5 4x" alt="|(b_x - a_x) (c_y - a_y)| &#92;le |b_x - a_x| |c_y - a_y| = (2^p - 1)^2" class="latex" /></p>
<p>Again, the same applies to the other product. Finally, we compute the difference between the two products, which doubles our bound on the absolute value:</p>
<p><img src="https://s0.wp.com/latex.php?latex=%7C%5Cmathrm%7BOrient2D%7D%28a%2Cb%2Cc%29%7C+%5Cle+2+%282%5Ep+-+1%29%5E2+%3D+2%5E%7B2p+%2B+1%7D+-+2%5E%7Bp%2B2%7D+%2B+2+%5Cle+2%5E%7B2p+%2B+1%7D+-+2&#038;bg=ffffff&#038;fg=000&#038;s=0&#038;c=20201002" srcset="https://s0.wp.com/latex.php?latex=%7C%5Cmathrm%7BOrient2D%7D%28a%2Cb%2Cc%29%7C+%5Cle+2+%282%5Ep+-+1%29%5E2+%3D+2%5E%7B2p+%2B+1%7D+-+2%5E%7Bp%2B2%7D+%2B+2+%5Cle+2%5E%7B2p+%2B+1%7D+-+2&#038;bg=ffffff&#038;fg=000&#038;s=0&#038;c=20201002 1x, https://s0.wp.com/latex.php?latex=%7C%5Cmathrm%7BOrient2D%7D%28a%2Cb%2Cc%29%7C+%5Cle+2+%282%5Ep+-+1%29%5E2+%3D+2%5E%7B2p+%2B+1%7D+-+2%5E%7Bp%2B2%7D+%2B+2+%5Cle+2%5E%7B2p+%2B+1%7D+-+2&#038;bg=ffffff&#038;fg=000&#038;s=0&#038;c=20201002&#038;zoom=4.5 4x" alt="|&#92;mathrm{Orient2D}(a,b,c)| &#92;le 2 (2^p - 1)^2 = 2^{2p + 1} - 2^{p+2} + 2 &#92;le 2^{2p + 1} - 2" class="latex" /></p>
<p>since p is always nonnegative. Accounting for the sign bit, that means the result of Orient2D fits inside a (2p+2)-bit signed integer. Since we want the results to fit inside a 32-bit integer, that means we need <img src="https://s0.wp.com/latex.php?latex=p+%5Cle+%2832+-+2%29+%2F+2+%3D+15&#038;bg=ffffff&#038;fg=000&#038;s=0&#038;c=20201002" srcset="https://s0.wp.com/latex.php?latex=p+%5Cle+%2832+-+2%29+%2F+2+%3D+15&#038;bg=ffffff&#038;fg=000&#038;s=0&#038;c=20201002 1x, https://s0.wp.com/latex.php?latex=p+%5Cle+%2832+-+2%29+%2F+2+%3D+15&#038;bg=ffffff&#038;fg=000&#038;s=0&#038;c=20201002&#038;zoom=4.5 4x" alt="p &#92;le (32 - 2) / 2 = 15" class="latex" /> to make sure there are no overflows. In other words, we&#8217;re good as long as the input coordinates are all inside [-16384,16383]. Anything poking outside that area needs to be analytically clipped beforehand to make sure there&#8217;s no overflows during rasterization.</p>
<p>Incidentally, this is shows how a typical implementation <a href="https://fgiesen.wordpress.com/2011/07/05/a-trip-through-the-graphics-pipeline-2011-part-5/">guard band clipping</a> works: the rasterizer performs computations using some set bit width, which determines the range of coordinates that the rasterizer accepts. X/Y-clipping only needs to be done when a triangle doesn&#8217;t fall entirely within that region, which is very rare with common viewport sizes. Note that there is no need for rasterizer coordinates to agree with render-target coordinates, and if you want to maximize the utility of your guard band region, your best bet is to translate the rasterizer coordinate system such that the center (instead of the top-left or bottom-right corner) of your viewport is near (0,0). Otherwise large viewports might have a much bigger guard band on the left side than they do on the right side (and similar in the vertical direction), which is undesirable.</p>
<p>Anyway. Integer overflows: Not a big deal, at least in our current setup with all-integer coordinates. We do need to check for (and possibly clip) huge triangles, but they&#8217;re rare in practice, so we still get away with no clipping most of the time.</p>
<h3>Sub-pixel precision</h3>
<p>For this point and the next, I&#8217;m only going to give a high-level overview, since we&#8217;re not actually going to use it for our target application.</p>
<p>Snapping vertex coordinates to pixels is actually quite crappy in terms of quality. It&#8217;s okay for a static view of a static scene, but if either the camera or one of the visible objects moves very slowly, it&#8217;s quite noticeable that the triangles only move in discrete steps once one of the vertices has moved from one pixel to the next after rounding the coordinates to integer. It looks as if the triangle is &#8220;wobbly&#8221;, especially so if there&#8217;s a texture on it.</p>
<p>Now, for the application we&#8217;re concerned with in this series, we&#8217;re only going to render a depth buffer, and the user is never gonna see it directly. So we can live with artifacts that are merely visually distracting, and needn&#8217;t bother with sub-pixel correction. This still means that the triangles we software-rasterize aren&#8217;t going to match up exactly with what the hardware rasterizer does, but in practice, if we mistakenly occlusion-cull an object even though some of its pixel are <em>just</em> about visible due to sub-pixel coordinate differences, it&#8217;s not a big deal. And neither is not culling an object because of a few pixels that are actually invisible. As one of my CS professors once pointed out, there are reasonable error bounds for <em>everything</em>, and for occlusion culling, &#8220;a handful of pixels give or take&#8221; is a reasonable error bound, at least if they&#8217;re not clustered together!</p>
<p>But suppose that you want to actually render something user-visible, in which case you absolutely do need sub-pixel precision. You want at least 4 extra bits in each coordinate (i.e. coordinates are specified in 1/16ths of a pixel), and at this point the standard in DX11-compliant GPUs in 8 bits of sub-pixel precision (coordinates in 1/256ths of a pixel). Let&#8217;s assume 8 bits of sub-pixel precision for now. The trivial way to get this is to multiply everything by 256: our (still integer) coordinates are now in 1/256ths of a pixel, but we still only perform one sample each pixel. Easy enough: (just sketching the updated main loop here)</p>
<pre>
    static const int subStep = 256;
    static const int subMask = subStep - 1;

    // Round start position up to next integer multiple
    // (we sample at integer pixel positions, so if our
    // min is not an integer coordinate, that pixel won't
    // be hit)
    minX = (minX + subMask) &amp; ~subMask;
    minY = (minY + subMask) &amp; ~subMask;

    for (p.y = minY; p.y &lt;= maxY; p.y += subStep) {
        for (p.x = minX; p.x &lt;= maxX; p.x += subStep) {
            // Determine barycentric coordinates
            int w0 = orient2d(v1, v2, p);
            int w1 = orient2d(v2, v0, p);
            int w2 = orient2d(v0, v1, p);

            // If p is on or inside all edges, render pixel.
            if (w0 &gt;= 0 &amp;&amp; w1 &gt;= 0 &amp;&amp; w2 &gt;= 0)
                renderPixel(p, w0, w1, w2);           
        }
    }
</pre>
<p>Simple enough, and it works just fine. Well, in theory it does, anyway &#8211; this code fragment is just as untested as the previous one, so be careful :). By the way, this seems like a good place to note that <em>if you&#8217;re writing a software rasterizer, this is likely not what you want</em>: This code samples triangle coverage at integer coordinates. This is simpler if you&#8217;re writing a rasterizer without sub-pixel correction (as we will do, which is why I set up coordinates this way), and it also happens to match with D3D9 rasterization conventions, but it disagrees with OpenGL and D3D10+ rasterization rules, which turn out to be saner in several important ways for a full-blown renderer. So consider yourselves warned.</p>
<p>Anyway, as said, this works, but it has a problem: doing the computation like this costs us a <em>lot</em> of bits. Our accepted coordinate range when working with 32-bit integers is still [-16384,16383], but now that&#8217;s in sub-pixel steps and boils down to approximately [-64,63.996] pixels. That&#8217;s tiny &#8211; even if we center the viewport perfectly, we can&#8217;t squeeze more than 128 pixels along each axis out of it this way. One way out is to decrease the level of sub-pixel precision: at 4 bits, we can just about fit a 2048&#215;2048 pixel render target inside our coordinate space, which isn&#8217;t exactly comfortable but workable.</p>
<p>But there&#8217;s a better way. I&#8217;m not gonna go into details here because we&#8217;re already on a tangent and the details, though not hard, are fairly subtle. I might turn it into a separate post at some point. But the key realization is that we&#8217;re still taking steps of one pixel at a time: all the p&#8217;s we pass into <code>orient2d</code> are an integral number of pixel samples apart. This, together with the incremental evaluation we&#8217;re gonna see soon, means that we only have to do a full-precision calculation once per triangle. All the pixel-stepping code always advances in units of integral pixels, which means the sub-pixel size enters the computation only once, not squared. Which in turn means we can actually cover the 2048&#215;2048 render target with 8 bits of subpixel accuracy, or 8192&#215;8192 pixels with 4 bits of subpixel resolution. You can squeeze that some more if you traverse the triangle in 2&#215;2 pixel blocks and not actual pixels, as our triangle rasterizer and any OpenGL/D3D-style rasterizer will do, but again, I digress.</p>
<h3>Fill rules</h3>
<p>The goal of fill rules, as briefly explained earlier, is to make sure that when two non-overlapping triangles share an edge and you render both of them, each pixel gets processed only once. Now, if you look at an <a href="http://msdn.microsoft.com/en-us/library/windows/desktop/cc627092(v=vs.85).aspx#Triangle">actual description</a> (this one is for D3D10 and up), it might seem like they&#8217;re really tricky to implement and require comparing edges to other edges, but luckily it all turns out to be fairly simple to do, although I&#8217;ll need a bit of space to explain it.</p>
<p>Remember that our core rasterizer only deals with triangles in one winding order &#8211; let&#8217;s say counter-clockwise, as we&#8217;ve been using last time. Now let&#8217;s look at the rules from the article I just pointed you to:</p>
<blockquote><p>
A top edge, is an edge that is exactly horizontal and is above the other edges.<br />
A left edge, is an edge that is not exactly horizontal and is on the left side of the triangle.
</p></blockquote>
<p><a href="https://fgiesen.files.wordpress.com/2013/02/tri1.png"><img loading="lazy" data-attachment-id="1419" data-permalink="https://fgiesen.wordpress.com/2013/02/06/the-barycentric-conspirac/tri/" data-orig-file="https://fgiesen.files.wordpress.com/2013/02/tri1.png" data-orig-size="127,213" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;}" data-image-title="A triangle." data-image-description="" data-image-caption="" data-medium-file="https://fgiesen.files.wordpress.com/2013/02/tri1.png?w=127" data-large-file="https://fgiesen.files.wordpress.com/2013/02/tri1.png?w=127" src="https://fgiesen.files.wordpress.com/2013/02/tri1.png?w=127&#038;h=213" alt="A triangle." width="127" height="213" class="alignnone size-full wp-image-1419" style="float:left;margin:1em 1em 1em 0;" /></a></p>
<p>The &#8220;exactly horizontal&#8221; part is easy enough to find out (just check if the y-coordinates are different), but the second half of these definitions looks troublesome. Luckily, it turns out to be fairly easy. Let&#8217;s do top first: What does &#8220;above the other edges&#8221; mean, really? An edge connects two points. The edge that&#8217;s &#8220;above the other edges&#8221; connects the two highest vertices; the third vertex is below them. In our example triangle, that edge is v<sub>1</sub>v<sub>2</sub> (ignore that it&#8217;s not horizontal for now, it&#8217;s still the edge that&#8217;s above the others).  Now I claim that edge <em>must</em> be one that is going towards the left. Suppose it was going to the right instead &#8211; then v<sub>0</sub> would be in its right (negative) half-space, meaning the triangle is wound clockwise, contradicting our initial assertion that it&#8217;s counter-clockwise! And by the same argument, any horizontal edge that goes to the right must be a bottom edge, or again we&#8217;d have a clockwise triangle. Which gives us our first updated rule:</p>
<p><em>In a counter-clockwise triangle, a top edge is an edge that is exactly horizontal and goes towards the left, i.e. its end point is left of its start point.</em></p>
<p>That&#8217;s really easy to figure out &#8211; just a sign test on the edge vectors. And again using the same kind of argument as before (consider the edge v<sub>2</sub>v<sub>0</sub>), we can see that any &#8220;left&#8221; edge must be one that&#8217;s going down, and that any edge that is going up is in fact a right edge. Which gives us the second updated rule:</p>
<p><em>In a counter-clockwise triangle, a left edge is an edge that goes down, i.e. its end point is strictly below its start point.</em></p>
<p>Note we can drop the &#8220;not horizontal&#8221; part entirely: any edge that goes down by our definition can&#8217;t be horizontal to begin with. So this is just one sign test, even easier than testing for a top edge!</p>
<p>And now that we know how to identify which edge is which, what do we do with that information? Again, quoting from the D3D10 rules:</p>
<blockquote><p>
Any pixel center which falls inside a triangle is drawn; a pixel is assumed to be inside if it passes the top-left rule. The top-left rule is that a pixel center is defined to lie inside of a triangle if it lies on the top edge or the left edge of a triangle.
</p></blockquote>
<p>To paraphrase: if our sample point actually falls inside the triangle (not on an edge), we draw it no matter what. It if happens to fall on an edge, we draw it if and only if that edge happens to be a top or a left edge.</p>
<p>Now, our current rasterizer code:</p>
<pre>
    int w0 = orient2d(v1, v2, p);
    int w1 = orient2d(v2, v0, p);
    int w2 = orient2d(v0, v1, p);

    // If p is on or inside all edges, render pixel.
    if (w0 &gt;= 0 &amp;&amp; w1 &gt;= 0 &amp;&amp; w2 &gt;= 0)
        renderPixel(p, w0, w1, w2);           
</pre>
<p>Draws <em>all</em> points that fall on edges, no matter which kind &#8211; all the tests are for greater-or-equals to zero. That&#8217;s okay for edge functions corresponding to top or left edges, but for the other edges we really want to be testing for a proper &#8220;greater than zero&#8221; instead. We could have multiple versions of the rasterizer, one for each possible combination of &#8220;edge 0/1/2 is (not) top-left&#8221;, but that&#8217;s too horrible to contemplate. Instead, we&#8217;re going to use the fact that for integers, <code>x &gt; 0</code> and <code>x &gt;= 1</code> mean the same thing. Which means we can leave the tests as they are by first computing a per-edge offset once:</p>
<pre>
  int bias0 = isTopLeft(v1, v2) ? 0 : -1;
  int bias1 = isTopLeft(v2, v0) ? 0 : -1;
  int bias2 = isTopLeft(v0, v1) ? 0 : -1;
</pre>
<p>and then changing our edge function computation slightly:</p>
<pre>
    int w0 = orient2d(v1, v2, p) + bias0;
    int w1 = orient2d(v2, v0, p) + bias1;
    int w2 = orient2d(v0, v1, p) + bias2;

    // If p is on or inside all edges, render pixel.
    if (w0 &gt;= 0 &amp;&amp; w1 &gt;= 0 &amp;&amp; w2 &gt;= 0)
        renderPixel(p, w0, w1, w2);           
</pre>
<p>Full disclosure: this changes the barycentric coordinates we pass to <code>renderPixel</code> slightly (as does the subpixel-precision squeezing we did earlier!). If you&#8217;re not using sub-pixel correction, this can be quite a big error, and you want to correct for it. With sub-pixel correction, you might decide that being off-by-1 on interpolated quantities is no big deal (remember that the edge functions are in area units, so &#8220;1&#8221; is a 1-subpixel-by-1-subpixel square, which is fairly small). Either way, the bias values are computed once per triangle, and you can usually do the correction once per triangle too, so it&#8217;s no extra per-pixel overhead. Right now, we pay some per-pixel cost to apply the biases too, but it turns out that will go away once we start optimizing it. And by the way, if you go back to the &#8220;integer overflow&#8221; section, you&#8217;ll notice we had a bit of slack on the precision requirements; the &#8220;bias&#8221; terms will not cause us to need any extra bits. So it really does all work out, and we can get proper fill rule handling in our rasterizer.</p>
<p>Which reminds me: This is the part where I tell you that the depth buffer rasterizer we&#8217;re going to look at doesn&#8217;t bother with implementing a consistent fill rule. It has the same &#8220;fill everything inside or on the edge&#8221; behavior as our initial code does. That might be an oversight, or it might be an intentional decision to make the rasterizer slightly conservative, which would make sense given the application. I&#8217;m not sure, and I decided not to mess with it. But I figured that since I was writing a post on rasterization, it would be a sin <em>not</em> to describe how to do this properly, especially since a coherent explanation of how exactly it&#8217;s done is quite hard to find on the net.</p>
<h3>All that&#8217;s fine and good, but now how do we make it fast?</h3>
<p>Well, that&#8217;s a big question, and &#8211; much as I hate to tell you &#8211; one that I will try to answer in the next post. We&#8217;ll also end this brief detour into software rasterization generalities and get back to the Software Occlusion Culling demo that started this series.</p>
<p>So what&#8217;s the point of this and the previous post? Well, first off, this is still my blog, and I just felt like writing about it. 🙂 And just as importantly, I&#8217;m going to spend at least two posts poking around in the guts of a rasterizer, and none of the changes I&#8217;m going to describe will make <em>any</em> sense to you without this background information. Low-hanging fruit are all nice and good, but sometimes you actually have to work for it, and this is one of those times. Besides, while optimizing code is fun, correctness isn&#8217;t optional. Fast code that doesn&#8217;t do what it&#8217;s supposed to is no good to anyone. So I&#8217;m trying to get it right before we make it fast. I can promise you it will be worth your while, though, and I&#8217;ll try to finish and upload the next post quickly. Until then, take care!</p>
]]></html><thumbnail_url><![CDATA[https://fgiesen.files.wordpress.com/2013/02/tri1.png?fit=440%2C330]]></thumbnail_url><thumbnail_width><![CDATA[]]></thumbnail_width><thumbnail_height><![CDATA[]]></thumbnail_height></oembed>