<?xml version="1.0" encoding="UTF-8" standalone="yes"?><oembed><version><![CDATA[1.0]]></version><provider_name><![CDATA[The ryg blog]]></provider_name><provider_url><![CDATA[https://fgiesen.wordpress.com]]></provider_url><author_name><![CDATA[fgiesen]]></author_name><author_url><![CDATA[https://fgiesen.wordpress.com/author/fgiesen/]]></author_url><title><![CDATA[How to generate cellular textures&nbsp;2]]></title><type><![CDATA[link]]></type><html><![CDATA[<p>After writing the previous article, I experimented with the problem a bit more and found out two things. First, there&#8217;s a bug in the distance lower bound computation I used for the &#8220;tiles&#8221; variant that caused it to perform significantly worse than it should. I actually copied it over from the Werkkzeug3 implementation, which has the same bug &#8211; shouldn&#8217;t have assumed it&#8217;s correct just because it&#8217;s worked for a couple of years. (The bug causes the lower bound to be much larger than necessary in some cases; i.e. it never causes any visual changes, it just makes the algorithm run slower). The fixed version of distanceBound is a lot simpler than the original implementation. The second thing I found out is that there&#8217;s another big speedup to be had without using point location structures, higher-order Voronoi diagrams or an &#8220;all nearest neighbor&#8221; algorithm. All it takes is a bit of thinking about the problem.</p>
<p><!--more--></p>
<p>The &#8220;tiled&#8221; algorithm is already quite good at reducing the number of iterations in the inner loop; for example, when rendering a 1024&#215;1024 image with 256 randomly placed cells in 32&#215;32 pixel tiles, around 3.5 distance calculations are performed per pixel on average. Considering that <em>any</em> algorithm that solves the problem needs to calculate at least 2 distances per pixel (to the nearest and second-nearest cell center for coloring), and that reducing the number of distance computations from 256/pixel to 3.5/pixel &#8220;only&#8221; gave us a speedup factor of about 28, there&#8217;s not that much to be gained from reducing the distance computations further. We&#8217;re gonna hit diminishing returns there.</p>
<p>However, we did achieve our speedup by shifting a lot of work to the outer loop, which now has quite a lot to do &#8211; traverse all tiles in the image, compute a distance lower bound to each cell center per-tile, then sort the list of cell centers by distance. Considering that most cell centers only &#8220;contribute&#8221; to a small number of tiles, this seems wasteful. Wouldn&#8217;t it be great if we could avoid dealing with distant cell centers altogether?</p>
<p>And that&#8217;s exactly the main idea for this improvement: get rid of unnecessary points early. But how do we determine which points to reject? Assume you&#8217;re rendering some subrectangle R of the current image, and that R contains at least one cell center C. Either this cell center is already the closest to every pixel P inside R, or there is some pixel P for which another cell center D outside of R is closer to P. In short, <img src="https://s0.wp.com/latex.php?latex=dist%28P%2CD%29+%5Cle+dist%28P%2CC%29+%5Cle+diam%28R%29&#038;bg=ffffff&#038;fg=000&#038;s=0&#038;c=20201002" srcset="https://s0.wp.com/latex.php?latex=dist%28P%2CD%29+%5Cle+dist%28P%2CC%29+%5Cle+diam%28R%29&#038;bg=ffffff&#038;fg=000&#038;s=0&#038;c=20201002 1x, https://s0.wp.com/latex.php?latex=dist%28P%2CD%29+%5Cle+dist%28P%2CC%29+%5Cle+diam%28R%29&#038;bg=ffffff&#038;fg=000&#038;s=0&#038;c=20201002&#038;zoom=4.5 4x" alt="dist(P,D) &#92;le dist(P,C) &#92;le diam(R)" class="latex" /> because both P and C are in R. This means that all points that <em>might</em> be the nearest point to some pixel in R must be within a distance of <img src="https://s0.wp.com/latex.php?latex=diam%28R%29&#038;bg=ffffff&#038;fg=000&#038;s=0&#038;c=20201002" srcset="https://s0.wp.com/latex.php?latex=diam%28R%29&#038;bg=ffffff&#038;fg=000&#038;s=0&#038;c=20201002 1x, https://s0.wp.com/latex.php?latex=diam%28R%29&#038;bg=ffffff&#038;fg=000&#038;s=0&#038;c=20201002&#038;zoom=4.5 4x" alt="diam(R)" class="latex" /> from R. The same argument works for the closest 2 points if at least 2 points are contained in R, and it can also be done in 1D for the x and y axis separately. The net result is that if R contains at least 2 points, all points that are candidates for &#8220;closest or second-closest to some pixel in R&#8221; must be contained in a rectangle R&#8217; that is centered on R and has three times the diameter in both dimensions. That&#8217;s the trick.</p>
<p>Now all we need to do is make use of this. What I did was to subdivide the output image is subdivided recursively in a quadtree-like fashion. To render the pixels inside the subrectangle R, subdivide R into four rectangles, which are processed sequentially (note this is completely implicit; no tree is actually built and no data structures are queried). Say the current such rectangle is S. We then compute S&#8217; and divide points into two categories: within S&#8217; and outside S&#8217;. If there&#8217;s some points inside S&#8217; and at least two of these points inside S (without the apostrophe!), the process is repeated starting with the smaller rectangle S. If we can&#8217;t subdivide any further or if the subrectangles get too small, actually render the current rectangle, using any of the methods described in the previous article on the reduced point set.</p>
<p>So, how much does it help? Time for some hard data. Same setup as before, my notebook (2.2GHz Core2Duo), 1024&#215;1024 pixel, randomly placed points with minimum distance enforced. I&#8217;ve also changed the measuring code to retry everything 3 times and take the fastest run; it didn&#8217;t much matter with the huge differences between algorithms last time, but this time I wanted to do it a bit more properly. (I still always use the same point distribution, though)</p>
<pre>64 points.
         brute force:    710142 microseconds
                tree:   1745306 microseconds
           sort by y:    149844 microseconds
      sort by y, SSE:     54521 microseconds
               tiles:     99334 microseconds
          tiles, SSE:     22741 microseconds
        spatial subd:     21571 microseconds

128 points.
         brute force:   1534623 microseconds
                tree:   2191476 microseconds
           sort by y:    195516 microseconds
      sort by y, SSE:     74734 microseconds
               tiles:    109785 microseconds
          tiles, SSE:     29030 microseconds
        spatial subd:     23927 microseconds

256 points.
         brute force:   3206320 microseconds
                tree:   2777783 microseconds
           sort by y:    250306 microseconds
      sort by y, SSE:    104342 microseconds
               tiles:    123634 microseconds
          tiles, SSE:     39941 microseconds
        spatial subd:     28874 microseconds

512 points.
         brute force:   7312217 microseconds
                tree:   4044699 microseconds
           sort by y:    357257 microseconds
      sort by y, SSE:    144823 microseconds
               tiles:    160355 microseconds
          tiles, SSE:     66846 microseconds
        spatial subd:     34001 microseconds

1024 points.
         brute force:  15794828 microseconds
                tree:   5683397 microseconds
           sort by y:    490350 microseconds
      sort by y, SSE:    215715 microseconds
               tiles:    254225 microseconds
          tiles, SSE:    151504 microseconds
        spatial subd:     40939 microseconds</pre>
<p>Results are overall similar to last time for the first few methods. The tile-based algorithm has improved dramatically thanks to the distanceBound-bugfix. The other thing is that the new spatial subdivision algorithm really kills everything else across the whole range. This didn&#8217;t surprise me much with large numbers of points (after all, this is where rejecting points outright really helps), but I didn&#8217;t expect the algorithm to deal gracefully with a small number of points, due to the increased overhead. Turns out that&#8217;s not a problem &#8211; the new algorithm still is reliably a win for as little as 16 points, and only minimally smaller (less than 1%) for less than that.</p>
<p>So, what about implementation complexity? Here&#8217;s the number of lines of code (including blank lines and comments) for the different algorithms. &#8220;sort Y&#8221;, &#8220;tiles&#8221; and &#8220;spatial subd&#8221; use the KeyedPoint functions (24 lines), and &#8220;spatial subd&#8221; also uses most of the tile-based rendering code (61 lines). The result is:</p>
<pre>brute force:               31 lines
sort y:            42+24 = 66 lines
tiles:             70+24 = 94 lines
tree:                     185 lines
spatial subd:  53+61+24 = 138 lines</pre>
<p>You can download the new code <a href="http://www.farbrausch.de/~fg/code/cellular_new.cpp">here</a>. And my original point still stands: keep it simple, stupid! Look at the problem you&#8217;re trying to solve. And don&#8217;t just use trees for everything 🙂</p>
<p><strong>UPDATE:</strong> This one had a bug as well. Stupid error in the &#8220;spatial subd&#8221; variant, introduced while cleaning up the code. Found and fixed 🙂</p>
]]></html></oembed>