<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>CodeItNow</title>
	<atom:link href="http://www.rorydriscoll.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.rorydriscoll.com</link>
	<description></description>
	<lastBuildDate>Fri, 23 Mar 2012 06:03:50 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Derivative Map Artifacts</title>
		<link>http://www.rorydriscoll.com/2012/01/22/derivative-map-artifacts/</link>
		<comments>http://www.rorydriscoll.com/2012/01/22/derivative-map-artifacts/#comments</comments>
		<pubDate>Mon, 23 Jan 2012 01:50:36 +0000</pubDate>
		<dc:creator>Rory</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://www.rorydriscoll.com/?p=1070</guid>
		<description><![CDATA[I had been suffering from some strange artifacts on the edges of my objects when using derivative maps. After much time spent in GPU Perf Studio, I finally realised that my mipmap generation was not correct. It was introducing one extra column of garbage at every level. My use of FXAA and anisotropic filtering was [...]]]></description>
			<content:encoded><![CDATA[<p>I had been suffering from some strange artifacts on the edges of my objects when using derivative maps. After much time spent in GPU Perf Studio, I finally realised that my mipmap generation was not correct. It was introducing one extra column of garbage at every level.</p>
<p>My use of FXAA and anisotropic filtering was just making the problem more evident. I would recommend using regular trilinear filtering for derivative maps anyway.</p>
<p>So, mea culpa and all that. Let the name of Morten Mikkelsen and derivative maps be cleared!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.rorydriscoll.com/2012/01/22/derivative-map-artifacts/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Derivative Maps vs Normal Maps</title>
		<link>http://www.rorydriscoll.com/2012/01/15/derivative-maps-vs-normal-maps/</link>
		<comments>http://www.rorydriscoll.com/2012/01/15/derivative-maps-vs-normal-maps/#comments</comments>
		<pubDate>Mon, 16 Jan 2012 03:41:52 +0000</pubDate>
		<dc:creator>Rory</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://www.rorydriscoll.com/?p=1012</guid>
		<description><![CDATA[This post is a quick follow up to my previous post on derivative maps. This time I&#8217;m going to compare the quality and performance of derivative maps with the currently accepted technique, normal maps. I&#8217;ll be using the precomputed derivative maps for comparison since the ddx/ddy technique just isn&#8217;t acceptable in terms of quality. Quality [...]]]></description>
			<content:encoded><![CDATA[<p>This post is a quick follow up to my <a href="http://www.rorydriscoll.com/2012/01/11/derivative-maps/">previous post</a> on derivative maps. This time I&#8217;m going to compare the quality and performance of derivative maps with the currently accepted technique, normal maps.</p>
<p>I&#8217;ll be using the precomputed derivative maps for comparison since the ddx/ddy technique just isn&#8217;t acceptable in terms of quality.</p>
<h2>Quality</h2>
<p>Here&#8217;s the close up shot of the sphere with the moon texture again. This shows the derivative map implementation, and if you mouse over, you&#8217;ll see the normal map version.</p>
<p><img class="mouseover aligncenter size-full wp-image-1018" title="Moon texture comparison" src="http://www.rorydriscoll.com/wp-content/uploads/2012/01/DerivativeMapCompare.png" alt="" width="512" height="512" data-oversrc="http://www.rorydriscoll.com/wp-content/uploads/2012/01/NormalMapCompare.png" /></p>
<p>There are some slight differences because the height of the derivative map doesn&#8217;t quite match the heights used to precompute the normal map, but overall I would say that they are remarkably similar. It looks to me that the normal map is preserving more of the detail though.</p>
<p>Here&#8217;s a high-contrast checkerboard, again with the normal map shown if you mouse over.</p>
<p><img class="mouseover aligncenter size-full wp-image-1022" title="Checkerboard texture comparison" src="http://www.rorydriscoll.com/wp-content/uploads/2012/01/DerivativeMapContrast.png" alt="" width="512" height="512" data-oversrc="http://www.rorydriscoll.com/wp-content/uploads/2012/01/NormalMapContrast.png" /></p>
<p>I&#8217;m no artist, but I would say the the derivative map results are close enough to the normal maps to call the technique viable from a quality standpoint.</p>
<table border="0" bordercolor="#FFCC00" style="background-color:#CCCCCC" width="90%" cellpadding="3" cellspacing="3">
<tr>
<td>
<strong>EDIT:</strong> I had some issues with artifacts which I posted here. It turns out they were (embarrassingly) caused by my mipmap generation which was introducing a row of garbage at each level. Combined with FXAA and anisotropic filtering, this caused the weird vertical stripes I posted before.</p>
<p>I&#8217;ve removed the images since I don&#8217;t want to give the wrong impression of the quality of derivative maps.
</td>
</tr>
</table>
<h2>Performance</h2>
<p>I ran these tests on my Macbook Pro which has an AMD 6750M. The shader in question is a simple shader for filling out the gbuffer render targets. All shaders were compiled using shader model 5. I took the frame times from the Fraps frame counter and the other numbers came from Gpu Perf Studio.</p>
<p>For comparison, I&#8217;ve included an implementation with no normal perturbation at all.</p>
<table class="gridtable">
<tbody>
<tr>
<th><strong>Perturbation</strong></th>
<th><strong>Frame</strong></th>
<th><strong>Pixels</strong></th>
<th><strong>Tex Inst</strong></th>
<th><strong>Tex Busy</strong></th>
<th><strong>ALU Inst</strong></th>
<th><strong>ALU Busy</strong></th>
<th><strong>ALU/Tex</strong></th>
</tr>
<tr>
<td>None</td>
<td>1.08 ms</td>
<td>262144</td>
<td>3</td>
<td>27.5 %</td>
<td>14</td>
<td>32.1 %</td>
<td>4.667</td>
</tr>
<tr>
<td>Normal map</td>
<td>1.37 ms</td>
<td>262144</td>
<td>4</td>
<td>36.5 %</td>
<td>23</td>
<td>52.4 %</td>
<td>5.75</td>
</tr>
<tr>
<td>Derivative map</td>
<td>1.36 ms</td>
<td>262144</td>
<td>9</td>
<td>82.0 %</td>
<td>28</td>
<td>63.8 %</td>
<td>3.11</td>
</tr>
</tbody>
</table>
<p>Despite the extra shader instructions, the derivative map method is basically as fast as normal maps on my hardware. As Mikkelsen predicted, it seems like having one fewer vertex attribute interpolator offsets the cost of the extra ALU instructions.</p>
<p>Note that the derivative map shader has nine texture instructions compared to just four for the normal maps. The extra five instructions are the two sets of ddx/ddy instructions, and the instruction to get the texture dimensions. The pixel shader can issue one texture instruction and one ALU instruction on the same cycle, these are essentially free.</p>
<p>The only performance overhead which has any impact for derivative maps are the five extra ALU instructions.</p>
<h2>Memory</h2>
<p>As I mentioned in my previous post, derivative maps also have the tremendous benefit of not requiring tangent vectors. In my case, with a simple vertex containing position, normal, tangent and one set of texcoords, the tangent takes up 27% of the mesh space.</p>
<p>Given that most games these days have tens of megabytes of mesh data, this would turn into some pretty decent memory savings. There&#8217;s also a minor benefit on the tool-side to not having to spend time generating face tangents and merging them into the vertices.</p>
<h2>Conclusion</h2>
<p>Well, for me it&#8217;s pretty clear. On my setup, derivative maps have a similar quality with the same performance but less memory. This makes them a win in my book. Of course, these numbers will vary wildly based on the API and hardware, so this can&#8217;t be taken as a blanket &#8216;derivative maps are better than normal maps&#8217; statement, but they look promising. Good job Morten Mikkelsen!</p>
<p>I would love to see a similar comparison for the current generation of console hardware (hint, hint!).</p>
<p>If you have DirectX 11, then you should be able to run the demo <a href='http://www.rorydriscoll.com/wp-content/uploads/2012/01/DerivativeMaps.zip'>here</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.rorydriscoll.com/2012/01/15/derivative-maps-vs-normal-maps/feed/</wfw:commentRss>
		<slash:comments>22</slash:comments>
		</item>
		<item>
		<title>Cubemap Texel Solid Angle</title>
		<link>http://www.rorydriscoll.com/2012/01/15/cubemap-texel-solid-angle/</link>
		<comments>http://www.rorydriscoll.com/2012/01/15/cubemap-texel-solid-angle/#comments</comments>
		<pubDate>Sun, 15 Jan 2012 21:42:18 +0000</pubDate>
		<dc:creator>Rory</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://www.rorydriscoll.com/?p=759</guid>
		<description><![CDATA[Warning: This post is going to be pretty math heavy. If you suck at math, then go and read this first, then come back. If you still think that you suck, then I suggest going to Khan Academy and start watching videos. You won&#8217;t regret it! I was reading through the AMD CubeMapGen source last [...]]]></description>
			<content:encoded><![CDATA[<table width="90%" align="center" cellpadding="10" cellspacing="1" bgcolor="#f7f7f7" border="0">
<tr>
<td>
<b>Warning:</b> This post is going to be pretty math heavy. If you suck at math, then go and read <a href="http://www.thebestpageintheuniverse.net/c.cgi?u=math">this</a> first, then come back. If you still think that you suck, then I suggest going to <a href="http://www.khanacademy.org/">Khan Academy</a> and start watching videos. You won&#8217;t regret it!
</td>
</tr>
</table>
<p>I was reading through the <a href="http://code.google.com/p/cubemapgen/">AMD CubeMapGen source</a> last week and came across the code for calculating the solid angle of a cube map texel. This code piqued my interest, since it seemed very terse for what I thought would be a horrific calculation.</p>
<pre class="brush: cpp; title: ; notranslate">
static float32 AreaElement( float32 x, float32 y )
{
	return atan2(x * y, sqrt(x * x + y * y + 1));
}

float32 TexelCoordSolidAngle(int32 a_FaceIdx, float32 a_U, float32 a_V, int32 a_Size)
{
   //scale up to [-1, 1] range (inclusive), offset by 0.5 to point to texel center.
   float32 U = (2.0f * ((float32)a_U + 0.5f) / (float32)a_Size ) - 1.0f;
   float32 V = (2.0f * ((float32)a_V + 0.5f) / (float32)a_Size ) - 1.0f;

   float32 InvResolution = 1.0f / a_Size;

	// U and V are the -1..1 texture coordinate on the current face.
	// Get projected area for this texel
	float32 x0 = U - InvResolution;
	float32 y0 = V - InvResolution;
	float32 x1 = U + InvResolution;
	float32 y1 = V + InvResolution;
	float32 SolidAngle = AreaElement(x0, y0) - AreaElement(x0, y1) - AreaElement(x1, y0) + AreaElement(x1, y1);

	return SolidAngle;
}
</pre>
<p>The source code for this particular part is well documented, and points you towards <a href="http://www.fizzmoll11.com/thesis/thesis.pdf">this thesis</a> by Manne &Ouml;hrstr&ouml;m (<a href="https://twitter.com/#!/manneohrstrom">@manneohrstrom</a>) where he gives a high level overview of the derivation. I was interested in finding out some more of the details, so I had a go myself, and this post is the result.</p>
<h2>Why is it Useful?</h2>
<p><a href="http://www.rorydriscoll.com/wp-content/uploads/2012/01/FilteredEnvMap.png"><img src="http://www.rorydriscoll.com/wp-content/uploads/2012/01/FilteredEnvMap.png" alt="" title="FilteredEnvMap" width="196" height="196" class="alignright size-full wp-image-979" /></a></p>
<p>When processing cube maps (for example, generating a diffuse irradiance map, or spherical harmonic approximation), you need to be able to integrate the texel values over a sphere. </p>
<p>One way of approximating this integral is to use a Monte Carlo estimator. This is a statistical technique that may oversample some texels and undersample other. This seems a bit wasteful considering that we have a finite number of input values. Ideally we&#8217;d like to use each texel value just once.</p>
<p>A naive approach to analytical integration where each texel has the same weight would result in overly bright values in the corner areas. This is because the texels in the corners project to smaller and smaller areas on the sphere. The correct approach is to factor in the solid angle during the integral, and this is what CubeMapGen does.</p>
<h2>The Plan</h2>
<p>Imagine a single cube map face placed at (0,0,1) and scaled such that the texel locations are all in [-1,1]. For any texel in this cube map, we want to project it onto a unit sphere sitting at the origin, then work out the area on the sphere. This area corresponds to the solid angle because the sphere is a unit sphere.</p>
<p>We can repeat this same calculation for any of the other cube map faces by first transforming them into the same range. </p>
<p><a href="http://www.rorydriscoll.com/wp-content/uploads/2012/01/TexelProjection.png"><img src="http://www.rorydriscoll.com/wp-content/uploads/2012/01/TexelProjection.png" alt="" title="TexelProjection" width="400" height="410" class="aligncenter size-full wp-image-870" /></a></p>
<p>This is the high-level game plan for calculating out the solid angle:</p>
<ol>
<li>Determine a formula for projecting a position from texture-space onto the sphere.</li>
<li>Work out how this projected position changes as the texture-space coordinates change in x and y.</li>
<li>Imagining that these position change vectors define two sides of a microscopic quadrilateral, then calculate the microscopic area of this quad using the magnitude of the cross product.</li>
<li>Integrate the microscopic area using the corner coordinates of a texel to calculate its area on the sphere, and solid angle.</li>
</ol>
<h2>The Details</h2>
<p>We start off with the formula for projecting a point from its location on the texture face (x, y, 1) onto the unit sphere. This is just a standard vector normalization.</p>
<p class="ql-center-displayed-equation" style="line-height: 50px;"><span class="ql-right-eqno"> &nbsp; </span><span class="ql-left-eqno"> &nbsp; </span><img src="http://www.rorydriscoll.com/wp-content/ql-cache/quicklatex.com-daa796d4ca2f5ee61899317d00838d61_l3.png"class="ql-img-displayed-equation" alt="&#92;&#98;&#101;&#103;&#105;&#110;&#123;&#97;&#108;&#105;&#103;&#110;&#42;&#125; &#92;&#118;&#101;&#99;&#123;&#112;&#125;&#32;&#61;&#32;&#92;&#100;&#102;&#114;&#97;&#99;&#123;&#92;&#98;&#101;&#103;&#105;&#110;&#123;&#112;&#109;&#97;&#116;&#114;&#105;&#120;&#125;&#120;&#44;&#32;&#121;&#44;&#32;&#49;&#32;&#92;&#101;&#110;&#100;&#123;&#112;&#109;&#97;&#116;&#114;&#105;&#120;&#125;&#125;&#123;&#92;&#115;&#113;&#114;&#116;&#123;&#120;&#94;&#50;&#32;&#43;&#32;&#121;&#94;&#50;&#43;&#49;&#125;&#125; &#92;&#101;&#110;&#100;&#123;&#97;&#108;&#105;&#103;&#110;&#42;&#125;" title="Rendered by QuickLaTeX.com"/></p>
<table width="90%" align="center" cellpadding="10" cellspacing="1" bgcolor="#f7f7f7" border="0">
<tr>
<td>
<b>Note:</b> I&#8217;ll be switching back and forth between negative and fractional exponents as I see fit. This makes things easier. Remember, <img src="http://www.rorydriscoll.com/wp-content/ql-cache/quicklatex.com-c9beaf5c04427d6698cf1443a55c3147_l3.png" class="ql-img-inline-formula" alt="&#120;&#94;&#123;&#45;&#110;&#125;&#32;&#61;&#32;&#92;&#102;&#114;&#97;&#99;&#123;&#49;&#125;&#123;&#120;&#94;&#110;&#125;" title="Rendered by QuickLaTeX.com" style="vertical-align: -6px;"/>, and <img src="http://www.rorydriscoll.com/wp-content/ql-cache/quicklatex.com-fb697885f9d30c8a2d88d78798d88e16_l3.png" class="ql-img-inline-formula" alt="&#120;&#94;&#123;&#92;&#102;&#114;&#97;&#99;&#123;&#49;&#125;&#123;&#50;&#125;&#125;&#32;&#61;&#32;&#92;&#115;&#113;&#114;&#116;&#123;&#120;&#125;" title="Rendered by QuickLaTeX.com" style="vertical-align: -4px;"/>.
</td>
</tr>
</table>
<p>We want to calculate how this projected point changes as the texture-space x and y coordinates change. We can do this separately for each axis using partial derivatives. First we&#8217;ll start by calculating how the projected z component changes along to the texture-space x axis.</p>
<h3>Projected Z Change According to X</h3>
<p>The z-component of p is simply:</p>
<p class="ql-center-displayed-equation" style="line-height: 73px;"><span class="ql-right-eqno"> &nbsp; </span><span class="ql-left-eqno"> &nbsp; </span><img src="http://www.rorydriscoll.com/wp-content/ql-cache/quicklatex.com-6281591c71b267a0c150f46b50d87b3d_l3.png"class="ql-img-displayed-equation" alt="&#92;&#98;&#101;&#103;&#105;&#110;&#123;&#97;&#108;&#105;&#103;&#110;&#42;&#125; &#112;&#95;&#122;&#32;&#38;&#61;&#32;&#92;&#100;&#102;&#114;&#97;&#99;&#123;&#49;&#125;&#123;&#92;&#115;&#113;&#114;&#116;&#123;&#120;&#94;&#50;&#32;&#43;&#32;&#121;&#94;&#50;&#43;&#49;&#125;&#125;&#92;&#92; &#38;&#61;&#32;&#40;&#120;&#94;&#50;&#32;&#43;&#32;&#121;&#94;&#50;&#43;&#49;&#41;&#94;&#123;&#45;&#92;&#102;&#114;&#97;&#99;&#123;&#49;&#125;&#123;&#50;&#125;&#125; &#92;&#101;&#110;&#100;&#123;&#97;&#108;&#105;&#103;&#110;&#42;&#125;" title="Rendered by QuickLaTeX.com"/></p>
<p>We need to differentiate this equation with respect to x. Because of the exponent, we need to use the chain rule to do this. The chain rule is a method for finding the derivative of the composition of two functions. First, we can reformulate the equation a little bit to make the two functions a bit clearer:</p>
<p class="ql-center-displayed-equation" style="line-height: 23px;"><span class="ql-right-eqno"> &nbsp; </span><span class="ql-left-eqno"> &nbsp; </span><img src="http://www.rorydriscoll.com/wp-content/ql-cache/quicklatex.com-31f684ea7c54bd03992ea5baf43c43db_l3.png"class="ql-img-displayed-equation" alt="&#92;&#98;&#101;&#103;&#105;&#110;&#123;&#97;&#108;&#105;&#103;&#110;&#42;&#125; &#112;&#95;&#122;&#32;&#61;&#32;&#117;&#94;&#123;&#45;&#92;&#102;&#114;&#97;&#99;&#123;&#49;&#125;&#123;&#50;&#125;&#125;&#44;&#32;&#117;&#32;&#61;&#32;&#120;&#94;&#50;&#32;&#43;&#32;&#121;&#94;&#50;&#43;&#49; &#92;&#101;&#110;&#100;&#123;&#97;&#108;&#105;&#103;&#110;&#42;&#125;" title="Rendered by QuickLaTeX.com"/></p>
<p>In our case our first function is a function of <img src="http://www.rorydriscoll.com/wp-content/ql-cache/quicklatex.com-43fe27dc3e528266a619764d90fce60b_l3.png" class="ql-img-inline-formula" alt="&#117;" title="Rendered by QuickLaTeX.com" style="vertical-align: 0px;"/> and our second function is a function of <img src="http://www.rorydriscoll.com/wp-content/ql-cache/quicklatex.com-ede05c264bba0eda080918aaa09c4658_l3.png" class="ql-img-inline-formula" alt="&#120;" title="Rendered by QuickLaTeX.com" style="vertical-align: 0px;"/> and <img src="http://www.rorydriscoll.com/wp-content/ql-cache/quicklatex.com-0af556714940c351c933bba8cf840796_l3.png" class="ql-img-inline-formula" alt="&#121;" title="Rendered by QuickLaTeX.com" style="vertical-align: -4px;"/>. Given this, the chain rule says:</p>
<p class="ql-center-displayed-equation" style="line-height: 39px;"><span class="ql-right-eqno"> &nbsp; </span><span class="ql-left-eqno"> &nbsp; </span><img src="http://www.rorydriscoll.com/wp-content/ql-cache/quicklatex.com-6f170398dabb2f22172e26279dffd4b3_l3.png"class="ql-img-displayed-equation" alt="&#92;&#98;&#101;&#103;&#105;&#110;&#123;&#97;&#108;&#105;&#103;&#110;&#42;&#125; &#92;&#102;&#114;&#97;&#99;&#123;&#92;&#112;&#97;&#114;&#116;&#105;&#97;&#108;&#32;&#112;&#95;&#122;&#125;&#123;&#92;&#112;&#97;&#114;&#116;&#105;&#97;&#108;&#32;&#120;&#125;&#32;&#61;&#32;&#92;&#102;&#114;&#97;&#99;&#123;&#92;&#112;&#97;&#114;&#116;&#105;&#97;&#108;&#32;&#112;&#95;&#122;&#125;&#123;&#92;&#112;&#97;&#114;&#116;&#105;&#97;&#108;&#32;&#117;&#125;&#32;&#92;&#102;&#114;&#97;&#99;&#123;&#92;&#112;&#97;&#114;&#116;&#105;&#97;&#108;&#32;&#117;&#125;&#123;&#92;&#112;&#97;&#114;&#116;&#105;&#97;&#108;&#32;&#120;&#125; &#92;&#101;&#110;&#100;&#123;&#97;&#108;&#105;&#103;&#110;&#42;&#125;" title="Rendered by QuickLaTeX.com"/></p>
<p>We can apply this rule very easily to our reformulated functions:</p>
<p class="ql-center-displayed-equation" style="line-height: 220px;"><span class="ql-right-eqno"> &nbsp; </span><span class="ql-left-eqno"> &nbsp; </span><img src="http://www.rorydriscoll.com/wp-content/ql-cache/quicklatex.com-ae0b2428e50b1171daec7bafca098554_l3.png"class="ql-img-displayed-equation" alt="&#92;&#98;&#101;&#103;&#105;&#110;&#123;&#97;&#108;&#105;&#103;&#110;&#42;&#125; &#92;&#102;&#114;&#97;&#99;&#123;&#92;&#112;&#97;&#114;&#116;&#105;&#97;&#108;&#32;&#112;&#95;&#122;&#125;&#123;&#92;&#112;&#97;&#114;&#116;&#105;&#97;&#108;&#32;&#117;&#125;&#32;&#38;&#61;&#45;&#92;&#102;&#114;&#97;&#99;&#123;&#117;&#94;&#123;&#45;&#92;&#102;&#114;&#97;&#99;&#123;&#51;&#125;&#123;&#50;&#125;&#125;&#125;&#123;&#50;&#125;&#92;&#92; &#38;&#61;&#45;&#92;&#102;&#114;&#97;&#99;&#123;&#49;&#125;&#123;&#50;&#40;&#120;&#94;&#50;&#43;&#121;&#94;&#50;&#43;&#49;&#41;&#94;&#123;&#92;&#102;&#114;&#97;&#99;&#123;&#51;&#125;&#123;&#50;&#125;&#125;&#125; &#92;&#92;&#91;&#49;&#48;&#93; &#92;&#102;&#114;&#97;&#99;&#123;&#92;&#112;&#97;&#114;&#116;&#105;&#97;&#108;&#32;&#117;&#125;&#123;&#92;&#112;&#97;&#114;&#116;&#105;&#97;&#108;&#32;&#120;&#125;&#38;&#61;&#50;&#120;&#92;&#92;&#91;&#49;&#48;&#93; &#92;&#102;&#114;&#97;&#99;&#123;&#92;&#112;&#97;&#114;&#116;&#105;&#97;&#108;&#32;&#112;&#95;&#122;&#125;&#123;&#92;&#112;&#97;&#114;&#116;&#105;&#97;&#108;&#32;&#120;&#125;&#32;&#38;&#61;&#32;&#45;&#92;&#102;&#114;&#97;&#99;&#123;&#120;&#125;&#123;&#40;&#120;&#94;&#50;&#43;&#121;&#94;&#50;&#43;&#49;&#41;&#94;&#123;&#92;&#102;&#114;&#97;&#99;&#123;&#51;&#125;&#123;&#50;&#125;&#125;&#125; &#92;&#101;&#110;&#100;&#123;&#97;&#108;&#105;&#103;&#110;&#42;&#125;" title="Rendered by QuickLaTeX.com"/></p>
<p>This equation tells us exactly how the z component of the projected point changes as the texture-space position moves along the x axis.</p>
<h3>Projected X Change According to X</h3>
<p>Now we&#8217;ve found the projected z component derivative, it&#8217;s going to make finding the x and y components a little easier. Why? Because we can express the x and y components in terms of the z component.</p>
<p class="ql-center-displayed-equation" style="line-height: 65px;"><span class="ql-right-eqno"> &nbsp; </span><span class="ql-left-eqno"> &nbsp; </span><img src="http://www.rorydriscoll.com/wp-content/ql-cache/quicklatex.com-34d9aaef6cd544ab1461e03a84d2ebd5_l3.png"class="ql-img-displayed-equation" alt="&#92;&#98;&#101;&#103;&#105;&#110;&#123;&#97;&#108;&#105;&#103;&#110;&#42;&#125; &#112;&#95;&#120;&#32;&#38;&#61;&#32;&#92;&#102;&#114;&#97;&#99;&#123;&#120;&#125;&#123;&#92;&#115;&#113;&#114;&#116;&#123;&#120;&#94;&#50;&#32;&#43;&#32;&#121;&#94;&#50;&#43;&#49;&#125;&#125;&#92;&#92; &#38;&#61;&#32;&#120;&#32;&#112;&#95;&#122; &#92;&#101;&#110;&#100;&#123;&#97;&#108;&#105;&#103;&#110;&#42;&#125;" title="Rendered by QuickLaTeX.com"/></p>
<p>We don&#8217;t have the same &#8216;composition of functions&#8217; setup that we did last time, so we can&#8217;t use the chain rule to differentiate this. Instead, we can use the product rule. The product rule in our case says:</p>
<p class="ql-center-displayed-equation" style="line-height: 40px;"><span class="ql-right-eqno"> &nbsp; </span><span class="ql-left-eqno"> &nbsp; </span><img src="http://www.rorydriscoll.com/wp-content/ql-cache/quicklatex.com-d9af0578f8cbc7aea136da9efb618f6f_l3.png"class="ql-img-displayed-equation" alt="&#92;&#98;&#101;&#103;&#105;&#110;&#123;&#97;&#108;&#105;&#103;&#110;&#42;&#125; &#92;&#102;&#114;&#97;&#99;&#123;&#92;&#112;&#97;&#114;&#116;&#105;&#97;&#108;&#32;&#112;&#95;&#120;&#125;&#123;&#92;&#112;&#97;&#114;&#116;&#105;&#97;&#108;&#32;&#120;&#125;&#32;&#38;&#61;&#32;&#112;&#95;&#122;&#92;&#102;&#114;&#97;&#99;&#123;&#92;&#112;&#97;&#114;&#116;&#105;&#97;&#108;&#32;&#40;&#120;&#41;&#125;&#123;&#92;&#112;&#97;&#114;&#116;&#105;&#97;&#108;&#32;&#120;&#125;&#32;&#43;&#32;&#120;&#92;&#102;&#114;&#97;&#99;&#123;&#92;&#112;&#97;&#114;&#116;&#105;&#97;&#108;&#32;&#40;&#112;&#95;&#122;&#41;&#125;&#123;&#92;&#112;&#97;&#114;&#116;&#105;&#97;&#108;&#32;&#120;&#125; &#92;&#101;&#110;&#100;&#123;&#97;&#108;&#105;&#103;&#110;&#42;&#125;" title="Rendered by QuickLaTeX.com"/></p>
<p>Applying this to equation for for the projected x component:</p>
<p class="ql-center-displayed-equation" style="line-height: 103px;"><span class="ql-right-eqno"> &nbsp; </span><span class="ql-left-eqno"> &nbsp; </span><img src="http://www.rorydriscoll.com/wp-content/ql-cache/quicklatex.com-f9f74cdc622cf680b43663722f4df5f5_l3.png"class="ql-img-displayed-equation" alt="&#92;&#98;&#101;&#103;&#105;&#110;&#123;&#97;&#108;&#105;&#103;&#110;&#42;&#125; &#92;&#102;&#114;&#97;&#99;&#123;&#92;&#112;&#97;&#114;&#116;&#105;&#97;&#108;&#32;&#112;&#95;&#120;&#125;&#123;&#92;&#112;&#97;&#114;&#116;&#105;&#97;&#108;&#32;&#120;&#125;&#32;&#38;&#61;&#32;&#92;&#102;&#114;&#97;&#99;&#123;&#49;&#125;&#123;&#40;&#120;&#94;&#50;&#32;&#43;&#32;&#121;&#94;&#50;&#43;&#49;&#41;&#94;&#123;&#92;&#102;&#114;&#97;&#99;&#123;&#49;&#125;&#123;&#50;&#125;&#125;&#125;&#45;&#92;&#102;&#114;&#97;&#99;&#123;&#120;&#94;&#50;&#125;&#123;&#40;&#120;&#94;&#50;&#43;&#121;&#94;&#50;&#43;&#49;&#41;&#94;&#123;&#92;&#102;&#114;&#97;&#99;&#123;&#51;&#125;&#123;&#50;&#125;&#125;&#125;&#92;&#92; &#38;&#61;&#32;&#92;&#102;&#114;&#97;&#99;&#123;&#121;&#94;&#50;&#43;&#49;&#125;&#123;&#40;&#120;&#94;&#50;&#43;&#121;&#94;&#50;&#43;&#49;&#41;&#94;&#123;&#92;&#102;&#114;&#97;&#99;&#123;&#51;&#125;&#123;&#50;&#125;&#125;&#125; &#92;&#101;&#110;&#100;&#123;&#97;&#108;&#105;&#103;&#110;&#42;&#125;" title="Rendered by QuickLaTeX.com"/></p>
<h3>Projected Y Change According to X</h3>
<p>We have a very similar derivation for the projected y derivative:</p>
<p class="ql-center-displayed-equation" style="line-height: 40px;"><span class="ql-right-eqno"> &nbsp; </span><span class="ql-left-eqno"> &nbsp; </span><img src="http://www.rorydriscoll.com/wp-content/ql-cache/quicklatex.com-41dda9503b32046dad5add6b972d74ad_l3.png"class="ql-img-displayed-equation" alt="&#92;&#98;&#101;&#103;&#105;&#110;&#123;&#97;&#108;&#105;&#103;&#110;&#42;&#125; &#112;&#95;&#121;&#32;&#61;&#32;&#92;&#102;&#114;&#97;&#99;&#123;&#121;&#125;&#123;&#92;&#115;&#113;&#114;&#116;&#123;&#120;&#94;&#50;&#32;&#43;&#32;&#121;&#94;&#50;&#43;&#49;&#125;&#125;&#32;&#61;&#32;&#121;&#32;&#112;&#95;&#122; &#92;&#101;&#110;&#100;&#123;&#97;&#108;&#105;&#103;&#110;&#42;&#125;" title="Rendered by QuickLaTeX.com"/></p>
<p>We can use the product rule again:</p>
<p class="ql-center-displayed-equation" style="line-height: 85px;"><span class="ql-right-eqno"> &nbsp; </span><span class="ql-left-eqno"> &nbsp; </span><img src="http://www.rorydriscoll.com/wp-content/ql-cache/quicklatex.com-2c4f3ec120ad14b9203f1fe9cda5fa02_l3.png"class="ql-img-displayed-equation" alt="&#92;&#98;&#101;&#103;&#105;&#110;&#123;&#97;&#108;&#105;&#103;&#110;&#42;&#125; &#92;&#102;&#114;&#97;&#99;&#123;&#92;&#112;&#97;&#114;&#116;&#105;&#97;&#108;&#32;&#112;&#95;&#121;&#125;&#123;&#92;&#112;&#97;&#114;&#116;&#105;&#97;&#108;&#32;&#120;&#125;&#32;&#38;&#61;&#32;&#112;&#95;&#122;&#92;&#102;&#114;&#97;&#99;&#123;&#92;&#112;&#97;&#114;&#116;&#105;&#97;&#108;&#32;&#40;&#121;&#41;&#125;&#123;&#92;&#112;&#97;&#114;&#116;&#105;&#97;&#108;&#32;&#120;&#125;&#32;&#43;&#32;&#121;&#92;&#102;&#114;&#97;&#99;&#123;&#92;&#112;&#97;&#114;&#116;&#105;&#97;&#108;&#32;&#40;&#112;&#95;&#122;&#41;&#125;&#123;&#92;&#112;&#97;&#114;&#116;&#105;&#97;&#108;&#32;&#120;&#125;&#92;&#92; &#38;&#61;&#32;&#45;&#92;&#102;&#114;&#97;&#99;&#123;&#120;&#121;&#125;&#123;&#40;&#120;&#94;&#50;&#43;&#121;&#94;&#50;&#43;&#49;&#41;&#94;&#123;&#92;&#102;&#114;&#97;&#99;&#123;&#51;&#125;&#123;&#50;&#125;&#125;&#125; &#92;&#101;&#110;&#100;&#123;&#97;&#108;&#105;&#103;&#110;&#42;&#125;" title="Rendered by QuickLaTeX.com"/></p>
<h3>Projected Position Change According to X and Y</h3>
<p>Putting this all together, we have our equation showing how the projected position changes as the texture-space position changes in the x direction. We can use the exact same process to work out how it moves in the y direction.</p>
<p class="ql-center-displayed-equation" style="line-height: 106px;"><span class="ql-right-eqno"> &nbsp; </span><span class="ql-left-eqno"> &nbsp; </span><img src="http://www.rorydriscoll.com/wp-content/ql-cache/quicklatex.com-ab827bfce1e25c4fb911aaea12128a65_l3.png"class="ql-img-displayed-equation" alt="&#92;&#98;&#101;&#103;&#105;&#110;&#123;&#97;&#108;&#105;&#103;&#110;&#42;&#125; &#92;&#102;&#114;&#97;&#99;&#123;&#92;&#112;&#97;&#114;&#116;&#105;&#97;&#108;&#32;&#92;&#118;&#101;&#99;&#123;&#112;&#125;&#125;&#123;&#92;&#112;&#97;&#114;&#116;&#105;&#97;&#108;&#32;&#120;&#125;&#32;&#38;&#61;&#32;&#92;&#102;&#114;&#97;&#99;&#123;&#92;&#98;&#101;&#103;&#105;&#110;&#123;&#112;&#109;&#97;&#116;&#114;&#105;&#120;&#125;&#121;&#94;&#50;&#43;&#49;&#44;&#45;&#120;&#121;&#44;&#32;&#45;&#120;&#92;&#101;&#110;&#100;&#123;&#112;&#109;&#97;&#116;&#114;&#105;&#120;&#125;&#125;&#123;&#40;&#120;&#94;&#50;&#43;&#121;&#94;&#50;&#43;&#49;&#41;&#94;&#123;&#92;&#102;&#114;&#97;&#99;&#123;&#51;&#125;&#123;&#50;&#125;&#125;&#125;&#92;&#92; &#92;&#102;&#114;&#97;&#99;&#123;&#92;&#112;&#97;&#114;&#116;&#105;&#97;&#108;&#32;&#92;&#118;&#101;&#99;&#123;&#112;&#125;&#125;&#123;&#92;&#112;&#97;&#114;&#116;&#105;&#97;&#108;&#32;&#121;&#125;&#32;&#38;&#61;&#32;&#92;&#102;&#114;&#97;&#99;&#123;&#92;&#98;&#101;&#103;&#105;&#110;&#123;&#112;&#109;&#97;&#116;&#114;&#105;&#120;&#125;&#45;&#120;&#121;&#44;&#32;&#120;&#94;&#50;&#43;&#49;&#44;&#45;&#121;&#92;&#101;&#110;&#100;&#123;&#112;&#109;&#97;&#116;&#114;&#105;&#120;&#125;&#125;&#123;&#40;&#120;&#94;&#50;&#43;&#121;&#94;&#50;&#43;&#49;&#41;&#94;&#123;&#92;&#102;&#114;&#97;&#99;&#123;&#51;&#125;&#123;&#50;&#125;&#125;&#125; &#92;&#101;&#110;&#100;&#123;&#97;&#108;&#105;&#103;&#110;&#42;&#125;" title="Rendered by QuickLaTeX.com"/></p>
<h3>Differential Area</h3>
<p>The next step is to calculate the differential (microscopic) area of the projected point using the partial derivatives we just calculated. Clearly at a normal scale, we wouldn&#8217;t be able to take the cross product of two projected vectors on a sphere and expect the magnitude to be the area they define on the sphere. But at this differential scale, we can treat the surface as if it is flat, so this works.</p>
<p>The first thing we need to do is to calculate the cross product of the partial derivatives.</p>
<p class="ql-center-displayed-equation" style="line-height: 96px;"><span class="ql-right-eqno"> &nbsp; </span><span class="ql-left-eqno"> &nbsp; </span><img src="http://www.rorydriscoll.com/wp-content/ql-cache/quicklatex.com-827bec2eba8abd8e22b037b4205ed4ed_l3.png"class="ql-img-displayed-equation" alt="&#92;&#98;&#101;&#103;&#105;&#110;&#123;&#97;&#108;&#105;&#103;&#110;&#42;&#125; &#92;&#118;&#101;&#99;&#123;&#114;&#125;&#32;&#38;&#61;&#32;&#92;&#102;&#114;&#97;&#99;&#123;&#92;&#112;&#97;&#114;&#116;&#105;&#97;&#108;&#32;&#92;&#118;&#101;&#99;&#123;&#112;&#125;&#125;&#123;&#92;&#112;&#97;&#114;&#116;&#105;&#97;&#108;&#32;&#120;&#125;&#32;&#92;&#116;&#105;&#109;&#101;&#115;&#32;&#92;&#102;&#114;&#97;&#99;&#123;&#92;&#112;&#97;&#114;&#116;&#105;&#97;&#108;&#32;&#92;&#118;&#101;&#99;&#123;&#112;&#125;&#125;&#123;&#92;&#112;&#97;&#114;&#116;&#105;&#97;&#108;&#32;&#121;&#125;&#92;&#92; &#38;&#61;&#32;&#92;&#102;&#114;&#97;&#99;&#123;&#92;&#98;&#101;&#103;&#105;&#110;&#123;&#112;&#109;&#97;&#116;&#114;&#105;&#120;&#125;&#121;&#94;&#50;&#43;&#49;&#44;&#45;&#120;&#121;&#44;&#32;&#45;&#120;&#92;&#101;&#110;&#100;&#123;&#112;&#109;&#97;&#116;&#114;&#105;&#120;&#125;&#125;&#123;&#40;&#120;&#94;&#50;&#32;&#43;&#32;&#121;&#94;&#50;&#43;&#49;&#41;&#94;&#123;&#92;&#102;&#114;&#97;&#99;&#123;&#51;&#125;&#123;&#50;&#125;&#125;&#125;&#32;&#92;&#116;&#105;&#109;&#101;&#115;&#32;&#92;&#102;&#114;&#97;&#99;&#123;&#92;&#98;&#101;&#103;&#105;&#110;&#123;&#112;&#109;&#97;&#116;&#114;&#105;&#120;&#125;&#45;&#120;&#121;&#44;&#32;&#120;&#94;&#50;&#43;&#49;&#44;&#45;&#121;&#92;&#101;&#110;&#100;&#123;&#112;&#109;&#97;&#116;&#114;&#105;&#120;&#125;&#125;&#123;&#40;&#120;&#94;&#50;&#32;&#43;&#32;&#121;&#94;&#50;&#43;&#49;&#41;&#94;&#123;&#92;&#102;&#114;&#97;&#99;&#123;&#51;&#125;&#123;&#50;&#125;&#125;&#125; &#92;&#101;&#110;&#100;&#123;&#97;&#108;&#105;&#103;&#110;&#42;&#125;" title="Rendered by QuickLaTeX.com"/></p>
<p>Calculating the cross product for each of the components relatively straightforward.</p>
<p class="ql-center-displayed-equation" style="line-height: 144px;"><span class="ql-right-eqno"> &nbsp; </span><span class="ql-left-eqno"> &nbsp; </span><img src="http://www.rorydriscoll.com/wp-content/ql-cache/quicklatex.com-786b1317cba9bc442a11a75f1ab4b54f_l3.png"class="ql-img-displayed-equation" alt="&#92;&#98;&#101;&#103;&#105;&#110;&#123;&#97;&#108;&#105;&#103;&#110;&#42;&#125; &#114;&#95;&#120;&#32;&#38;&#61;&#32;&#92;&#102;&#114;&#97;&#99;&#123;&#120;&#94;&#51;&#43;&#120;&#121;&#94;&#50;&#43;&#120;&#125;&#123;&#40;&#120;&#94;&#50;&#32;&#43;&#32;&#121;&#94;&#50;&#43;&#49;&#41;&#94;&#51;&#125;&#92;&#92; &#114;&#95;&#121;&#32;&#38;&#61;&#32;&#92;&#102;&#114;&#97;&#99;&#123;&#120;&#94;&#50;&#121;&#43;&#121;&#94;&#51;&#43;&#121;&#125;&#123;&#40;&#120;&#94;&#50;&#32;&#43;&#32;&#121;&#94;&#50;&#43;&#49;&#41;&#94;&#51;&#125;&#92;&#92; &#114;&#95;&#122;&#32;&#38;&#61;&#32;&#92;&#102;&#114;&#97;&#99;&#123;&#40;&#121;&#94;&#50;&#43;&#49;&#41;&#40;&#120;&#94;&#50;&#43;&#49;&#41;&#45;&#120;&#94;&#50;&#121;&#94;&#50;&#125;&#123;&#40;&#120;&#94;&#50;&#32;&#43;&#32;&#121;&#94;&#50;&#43;&#49;&#41;&#94;&#51;&#125;&#92;&#92; &#92;&#101;&#110;&#100;&#123;&#97;&#108;&#105;&#103;&#110;&#42;&#125;" title="Rendered by QuickLaTeX.com"/></p>
<p>If you&#8217;re reading carefully, you&#8217;ll notice that each component has a factor of <img src="http://www.rorydriscoll.com/wp-content/ql-cache/quicklatex.com-8d9216d24c360332801b9b01ede93033_l3.png" class="ql-img-inline-formula" alt="&#120;&#94;&#50;&#43;&#121;&#94;&#50;&#43;&#49;" title="Rendered by QuickLaTeX.com" style="vertical-align: -4px;"/> on the top and the bottom, so we can divide through. Combining all the components back together again, we arrive at the final equation for the perpendicular vector.</p>
<p class="ql-center-displayed-equation" style="line-height: 43px;"><span class="ql-right-eqno"> &nbsp; </span><span class="ql-left-eqno"> &nbsp; </span><img src="http://www.rorydriscoll.com/wp-content/ql-cache/quicklatex.com-f822cb9bc3c3b3b3b6ae38d63f92d219_l3.png"class="ql-img-displayed-equation" alt="&#92;&#98;&#101;&#103;&#105;&#110;&#123;&#97;&#108;&#105;&#103;&#110;&#42;&#125; &#92;&#118;&#101;&#99;&#123;&#114;&#125;&#32;&#38;&#61;&#32;&#92;&#102;&#114;&#97;&#99;&#123;&#40;&#120;&#44;&#121;&#44;&#49;&#41;&#125;&#123;&#40;&#120;&#94;&#50;&#32;&#43;&#32;&#121;&#94;&#50;&#43;&#49;&#41;&#94;&#50;&#125; &#92;&#101;&#110;&#100;&#123;&#97;&#108;&#105;&#103;&#110;&#42;&#125;" title="Rendered by QuickLaTeX.com"/></p>
<p>Now we simply need to take the length of the result of the cross product to find the differential area on the sphere.</p>
<p class="ql-center-displayed-equation" style="line-height: 151px;"><span class="ql-right-eqno"> &nbsp; </span><span class="ql-left-eqno"> &nbsp; </span><img src="http://www.rorydriscoll.com/wp-content/ql-cache/quicklatex.com-d01ca9cd76b91c518393ab4a9ad617bf_l3.png"class="ql-img-displayed-equation" alt="&#92;&#98;&#101;&#103;&#105;&#110;&#123;&#97;&#108;&#105;&#103;&#110;&#42;&#125; &#92;&#112;&#97;&#114;&#116;&#105;&#97;&#108;&#32;&#65;&#32;&#38;&#61;&#32;&#92;&#115;&#113;&#114;&#116;&#123;&#92;&#118;&#101;&#99;&#123;&#114;&#125;&#32;&#92;&#99;&#100;&#111;&#116;&#32;&#92;&#118;&#101;&#99;&#123;&#114;&#125;&#125;&#92;&#92;&#91;&#53;&#93; &#38;&#61;&#32;&#92;&#115;&#113;&#114;&#116;&#123;&#92;&#102;&#114;&#97;&#99;&#123;&#120;&#94;&#50;&#43;&#121;&#94;&#50;&#43;&#49;&#125;&#123;&#40;&#120;&#94;&#50;&#32;&#43;&#32;&#121;&#94;&#50;&#43;&#49;&#41;&#94;&#52;&#125;&#125;&#92;&#92;&#91;&#53;&#93; &#38;&#61;&#32;&#92;&#102;&#114;&#97;&#99;&#123;&#49;&#125;&#123;&#40;&#120;&#94;&#50;&#32;&#43;&#32;&#121;&#94;&#50;&#43;&#49;&#41;&#94;&#123;&#92;&#102;&#114;&#97;&#99;&#123;&#51;&#125;&#123;&#50;&#125;&#125;&#125; &#92;&#101;&#110;&#100;&#123;&#97;&#108;&#105;&#103;&#110;&#42;&#125;" title="Rendered by QuickLaTeX.com"/></p>
<h3>Solid Angle</h3>
<p>The final step is to integrate the differential area over our range of texture-space values to get the solid angle of the texel. We can start by calculating the integral between <img src="http://www.rorydriscoll.com/wp-content/ql-cache/quicklatex.com-9cf2000c782cfe94be6df5f499cd3e24_l3.png" class="ql-img-inline-formula" alt="&#40;&#48;&#44;&#48;&#41;" title="Rendered by QuickLaTeX.com" style="vertical-align: -4px;"/> and some point <img src="http://www.rorydriscoll.com/wp-content/ql-cache/quicklatex.com-29dc6bb5c0d77f139941a7a167e1b164_l3.png" class="ql-img-inline-formula" alt="&#40;&#115;&#44;&#116;&#41;" title="Rendered by QuickLaTeX.com" style="vertical-align: -4px;"/> on the cube map face.</p>
<p class="ql-center-displayed-equation" style="line-height: 95px;"><span class="ql-right-eqno"> &nbsp; </span><span class="ql-left-eqno"> &nbsp; </span><img src="http://www.rorydriscoll.com/wp-content/ql-cache/quicklatex.com-de12d8d6b7a16ee514769d04570a8156_l3.png"class="ql-img-displayed-equation" alt="&#92;&#98;&#101;&#103;&#105;&#110;&#123;&#97;&#108;&#105;&#103;&#110;&#42;&#125; &#102;&#40;&#115;&#44;&#116;&#41;&#38;&#61;&#92;&#105;&#110;&#116;&#95;&#123;&#121;&#61;&#48;&#125;&#94;&#116;&#92;&#105;&#110;&#116;&#95;&#123;&#120;&#61;&#48;&#125;&#94;&#115;&#32;&#92;&#102;&#114;&#97;&#99;&#123;&#49;&#125;&#123;&#40;&#120;&#94;&#50;&#43;&#121;&#94;&#50;&#43;&#49;&#41;&#94;&#123;&#92;&#102;&#114;&#97;&#99;&#123;&#51;&#125;&#123;&#50;&#125;&#125;&#125;&#32;&#92;&#44;&#92;&#109;&#97;&#116;&#104;&#114;&#109;&#123;&#100;&#125;&#120;&#92;&#44;&#32;&#92;&#109;&#97;&#116;&#104;&#114;&#109;&#123;&#100;&#125;&#121;&#92;&#92; &#38;&#61;&#92;&#109;&#97;&#116;&#104;&#114;&#109;&#123;&#116;&#97;&#110;&#125;&#94;&#123;&#45;&#49;&#125;&#92;&#102;&#114;&#97;&#99;&#123;&#115;&#116;&#125;&#123;&#92;&#115;&#113;&#114;&#116;&#123;&#115;&#94;&#50;&#43;&#116;&#94;&#50;&#43;&#49;&#125;&#125; &#92;&#101;&#110;&#100;&#123;&#97;&#108;&#105;&#103;&#110;&#42;&#125;" title="Rendered by QuickLaTeX.com"/></p>
<p>From this formula, we can calculate the area of any texel in the cube map face by adding together the two right-diagonal corners, A and C, and subtracting the left-diagonal corners, B and D. </p>
<p class="ql-center-displayed-equation" style="line-height: 18px;"><span class="ql-right-eqno"> &nbsp; </span><span class="ql-left-eqno"> &nbsp; </span><img src="http://www.rorydriscoll.com/wp-content/ql-cache/quicklatex.com-ab67ae51f09ddb951225dd4437469c3e_l3.png"class="ql-img-displayed-equation" alt="&#92;&#98;&#101;&#103;&#105;&#110;&#123;&#97;&#108;&#105;&#103;&#110;&#42;&#125; &#83;&#61;&#102;&#40;&#65;&#41;&#32;&#45;&#32;&#102;&#40;&#66;&#41;&#32;&#43;&#32;&#102;&#40;&#67;&#41;&#32;&#45;&#32;&#102;&#40;&#68;&#41; &#92;&#101;&#110;&#100;&#123;&#97;&#108;&#105;&#103;&#110;&#42;&#125;" title="Rendered by QuickLaTeX.com"/></p>
<p>You can see on the image below that the added areas in green are canceled out by the subtracted areas in red.</p>
<p><a href="http://www.rorydriscoll.com/wp-content/uploads/2012/01/Area.png"><img src="http://www.rorydriscoll.com/wp-content/uploads/2012/01/Area.png" alt="" title="Area" width="400" height="417" class="aligncenter size-full wp-image-893" /></a></p>
<p>That should look familiar, since that&#8217;s exactly what the CubeMapGen code does. If you look at the surrounding source code to TexelCoordSolidAngle, then you&#8217;ll notice that there&#8217;s another method mentioned for calculating the solid angle of a texel. This method is based on Girard&#8217;s theorem, which describes how to calculate the area of a spherical triangle based on the excess of the sum of its interior angles. This method was also suggested to me on Twitter by Ignacio Castaño (<a href="https://twitter.com/#!/castano">@castano</a>). I haven&#8217;t actually tried it, but it looks fascinating!</p>
<h2>Is it Correct?</h2>
<p>It&#8217;s always a bit daunting to get to the end of a derivation like this, and not know if the answer is correct or not. In this case, it&#8217;s pretty easy to verify if this result is correct. </p>
<p>Remember that the texture-space coordinates are in range [-1,1]. If we set our <img src="http://www.rorydriscoll.com/wp-content/ql-cache/quicklatex.com-76f553ea5055b27082c28955d9ece578_l3.png" class="ql-img-inline-formula" alt="&#40;&#120;&#44;&#121;&#41;" title="Rendered by QuickLaTeX.com" style="vertical-align: -4px;"/> values to 1, that corresponds to the top right quarter of the cube map face. We know that there are <img src="http://www.rorydriscoll.com/wp-content/ql-cache/quicklatex.com-a64f86508ea52835b7fd42736282275d_l3.png" class="ql-img-inline-formula" alt="&#52;&#92;&#112;&#105;" title="Rendered by QuickLaTeX.com" style="vertical-align: -1px;"/> steradians in a sphere, so that means that each face gets <img src="http://www.rorydriscoll.com/wp-content/ql-cache/quicklatex.com-0d5f0508005e85f68bfe0ed48e803e40_l3.png" class="ql-img-inline-formula" alt="&#92;&#102;&#114;&#97;&#99;&#123;&#50;&#92;&#112;&#105;&#125;&#123;&#51;&#125;" title="Rendered by QuickLaTeX.com" style="vertical-align: -6px;"/> steradians. Since we&#8217;re only calculating for a quarter of a face we expect our result to be <img src="http://www.rorydriscoll.com/wp-content/ql-cache/quicklatex.com-38b9dcafd63e963896575ed3fca15de1_l3.png" class="ql-img-inline-formula" alt="&#92;&#102;&#114;&#97;&#99;&#123;&#92;&#112;&#105;&#125;&#123;&#54;&#125;" title="Rendered by QuickLaTeX.com" style="vertical-align: -6px;"/>.</p>
<p class="ql-center-displayed-equation" style="line-height: 78px;"><span class="ql-right-eqno"> &nbsp; </span><span class="ql-left-eqno"> &nbsp; </span><img src="http://www.rorydriscoll.com/wp-content/ql-cache/quicklatex.com-68598b2241da90f392546ab98484ea91_l3.png"class="ql-img-displayed-equation" alt="&#92;&#98;&#101;&#103;&#105;&#110;&#123;&#97;&#108;&#105;&#103;&#110;&#42;&#125; &#102;&#40;&#49;&#44;&#49;&#41;&#38;&#61;&#92;&#109;&#97;&#116;&#104;&#114;&#109;&#123;&#116;&#97;&#110;&#125;&#94;&#123;&#45;&#49;&#125;&#92;&#102;&#114;&#97;&#99;&#123;&#49;&#125;&#123;&#92;&#115;&#113;&#114;&#116;&#123;&#51;&#125;&#125;&#92;&#92; &#38;&#61;&#32;&#92;&#102;&#114;&#97;&#99;&#123;&#92;&#112;&#105;&#125;&#123;&#54;&#125; &#92;&#101;&#110;&#100;&#123;&#97;&#108;&#105;&#103;&#110;&#42;&#125;" title="Rendered by QuickLaTeX.com"/></p>
<p>And it does. Thanks to the various people on twitter (<a href="https://twitter.com/#!/SebLagarde">@SebLagarde</a>, <a href="https://twitter.com/#!/mattpharr">@mattpharr</a>, <a href="https://twitter.com/#!/manneohrstrom">@manneohrstrom</a>, <a href="https://twitter.com/#!/castano">@castano</a> and <a href="https://twitter.com/#!/ChristerEricson">@ChristerEricson</a>) for engaging me in conversation over this.</p>
<p>Please let me know in the comments if you spot an error in this post, or if anything needs to be explained better or more easily.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.rorydriscoll.com/2012/01/15/cubemap-texel-solid-angle/feed/</wfw:commentRss>
		<slash:comments>17</slash:comments>
		</item>
		<item>
		<title>Derivative Maps</title>
		<link>http://www.rorydriscoll.com/2012/01/11/derivative-maps/</link>
		<comments>http://www.rorydriscoll.com/2012/01/11/derivative-maps/#comments</comments>
		<pubDate>Wed, 11 Jan 2012 16:17:26 +0000</pubDate>
		<dc:creator>Rory</dc:creator>
				<category><![CDATA[Graphics]]></category>

		<guid isPermaLink="false">http://www.rorydriscoll.com/?p=609</guid>
		<description><![CDATA[I recently came across an interesting paper, Bump Mapping Unparametrized Surfaces on the GPU by Morten Mikkelsen of Naughty Dog. This paper describes an alternative method to normal mapping, closely related to bump mapping. The alluring prospect of this technique is that it doesn’t require that a tangent space be defined. Mikkelsen is apparently well-versed [...]]]></description>
			<content:encoded><![CDATA[<p>I recently came across an interesting paper, <a href="http://jbit.net/~sparky/sfgrad_bump/mm_sfgrad_bump.pdf">Bump Mapping Unparametrized Surfaces on the GPU</a> by <a href="http://mmikkelsen3d.blogspot.com/">Morten Mikkelsen</a> of Naughty Dog. This paper describes an alternative method to normal mapping, closely related to bump mapping. The alluring prospect of this technique is that it doesn’t require that a tangent space be defined.</p>
<p>Mikkelsen is apparently well-versed in academic obfuscation (tsk!), so the paper itself can be a little hard to read. If you&#8217;re interested in reading it, then I would recommend first reading Jim Blinn’s <a href="http://research.microsoft.com/pubs/73939/p286-blinn.pdf">original bump mapping paper</a> to understand some of the derivations.</p>
<h2>But Wait! What’s Wrong with Normal Maps?</h2>
<p>Nothing really. But if something comes along that can improve quality, performance or memory consumption then it&#8217;s worth taking a a look.</p>
<h2>A Quick Detour into Gradients</h2>
<p>Given a scalar height field (i.e. a two-dimensional array of scalar values), the gradient of that field is a 2D vector field where each vector points in the direction of greatest change. The length of the vectors corresponds to the rate of change.</p>
<p>The contour map below represents the scalar field generated from the function <img src="http://www.rorydriscoll.com/wp-content/ql-cache/quicklatex.com-1915089e5a5c29010638af5758da1d42_l3.png" class="ql-img-inline-formula" alt="&#102;&#40;&#120;&#44;&#121;&#41;&#32;&#61;&#32;&#49;&#32;&#45;&#32;&#40;&#120;&#94;&#50;&#32;&#43;&#32;&#121;&#94;&#50;&#41;" title="Rendered by QuickLaTeX.com" style="vertical-align: -4px;"/>. The vector field shows the gradient of that scalar field. Note how each vector points towards the center, and how the vectors in the center are smaller due to the lower rate of change.</p>
<p><a href="http://www.rorydriscoll.com/wp-content/uploads/2012/01/ScalarField.png"><img class="alignright width=300 height=300 wp-image-706" title="ScalarField" src="http://www.rorydriscoll.com/wp-content/uploads/2012/01/ScalarField.png" alt="" /></a><a href="http://www.rorydriscoll.com/wp-content/uploads/2012/01/GradientOfScalarField.png"><img class="alignright width=300 height=300 wp-image-705" title="GradientOfScalarField" src="http://www.rorydriscoll.com/wp-content/uploads/2012/01/GradientOfScalarField.png" alt="" /></a></p>
<h2>Derivative Maps</h2>
<p>The main premise of the paper is that we can project the gradient of the height field onto an underlying surface and use it to skew the surface normal to approximate the normal of the height-map surface. We can do all of this without requiring tangent vectors.</p>
<p>As with the original bump-mapping technique, it’s not exact due to some terms being dropped due to their relatively small influence, but it’s close.</p>
<p>There are really only two important formulae to consider from the paper. The first shows how to perturb the surface normal using the <em>surface gradient</em>. Don&#8217;t confuse the surface gradient with the gradient of the height field mentioned above! As you&#8217;ll see shortly, they&#8217;re different.</p>
<p class="ql-center-displayed-equation" style="line-height: 20px;"><span class="ql-right-eqno"> &nbsp; </span><span class="ql-left-eqno"> &nbsp; </span><img src="http://www.rorydriscoll.com/wp-content/ql-cache/quicklatex.com-da2c8c0e98f692b80aabe646b5371512_l3.png"class="ql-img-displayed-equation" alt="&#92;&#98;&#101;&#103;&#105;&#110;&#123;&#97;&#108;&#105;&#103;&#110;&#42;&#125; &#123;&#92;&#98;&#97;&#114;&#123;&#110;&#125;&#125;&#39;&#61;&#92;&#98;&#97;&#114;&#123;&#110;&#125;&#45;&#92;&#110;&#97;&#98;&#108;&#97;&#95;&#115;&#92;&#98;&#101;&#116;&#97; &#92;&#101;&#110;&#100;&#123;&#97;&#108;&#105;&#103;&#110;&#42;&#125;" title="Rendered by QuickLaTeX.com"/></p>
<p>Here, <img src="http://www.rorydriscoll.com/wp-content/ql-cache/quicklatex.com-4819a3c333fc44a343764b1342001d8f_l3.png" class="ql-img-inline-formula" alt="&#123;&#92;&#98;&#97;&#114;&#123;&#110;&#125;&#125;&#39;" title="Rendered by QuickLaTeX.com" style="vertical-align: 0px;"/> represents the perturbed normal, <img src="http://www.rorydriscoll.com/wp-content/ql-cache/quicklatex.com-c7e9798e3ddea5e522bf0c1f7f53e13c_l3.png" class="ql-img-inline-formula" alt="&#92;&#98;&#97;&#114;&#123;&#110;&#125;" title="Rendered by QuickLaTeX.com" style="vertical-align: 0px;"/> is the underlying surface normal, and <img src="http://www.rorydriscoll.com/wp-content/ql-cache/quicklatex.com-3ddb240a7d1306cc2890a8e18e325135_l3.png" class="ql-img-inline-formula" alt="&#92;&#110;&#97;&#98;&#108;&#97;&#95;&#115;&#92;&#98;&#101;&#116;&#97;" title="Rendered by QuickLaTeX.com" style="vertical-align: -4px;"/> is the surface gradient. So basically, this says that the perturbed normal is the surface normal offset in the negative surface gradient direction.</p>
<p>So how do we calculate the surface gradient from the height field gradient? Well, there&#8217;s some fun math in there which I don&#8217;t want to repeat, but if you&#8217;re interested, I would recommend reading Blinn&#8217;s paper first, then Mikkelsen&#8217;s paper. You eventually arrive at:</p>
<p class="ql-center-displayed-equation" style="line-height: 43px;"><span class="ql-right-eqno"> &nbsp; </span><span class="ql-left-eqno"> &nbsp; </span><img src="http://www.rorydriscoll.com/wp-content/ql-cache/quicklatex.com-c91a24911f75d5b28639848dc2f26253_l3.png"class="ql-img-displayed-equation" alt="&#92;&#98;&#101;&#103;&#105;&#110;&#123;&#97;&#108;&#105;&#103;&#110;&#42;&#125; &#92;&#110;&#97;&#98;&#108;&#97;&#95;&#115;&#92;&#98;&#101;&#116;&#97;&#61;&#92;&#100;&#102;&#114;&#97;&#99;&#123;&#40;&#92;&#115;&#105;&#103;&#109;&#97;&#95;&#116;&#32;&#92;&#116;&#105;&#109;&#101;&#115;&#32;&#92;&#98;&#97;&#114;&#123;&#110;&#125;&#32;&#41;&#92;&#98;&#101;&#116;&#97;&#95;&#115;&#32;&#43;&#32;&#40;&#92;&#98;&#97;&#114;&#123;&#110;&#125;&#32;&#92;&#116;&#105;&#109;&#101;&#115;&#32;&#92;&#115;&#105;&#103;&#109;&#97;&#95;&#115;&#41;&#92;&#98;&#101;&#116;&#97;&#95;&#116;&#125;&#123;&#92;&#98;&#97;&#114;&#123;&#110;&#125;&#32;&#92;&#99;&#100;&#111;&#116;&#32;&#40;&#92;&#115;&#105;&#103;&#109;&#97;&#95;&#115;&#32;&#92;&#116;&#105;&#109;&#101;&#115;&#32;&#92;&#115;&#105;&#103;&#109;&#97;&#95;&#116;&#41;&#125; &#92;&#101;&#110;&#100;&#123;&#97;&#108;&#105;&#103;&#110;&#42;&#125;" title="Rendered by QuickLaTeX.com"/></p>
<p>In addition to the symbols defined previously, <img src="http://www.rorydriscoll.com/wp-content/ql-cache/quicklatex.com-7edd0cc0c18eef3f61257c1cc800b3c9_l3.png" class="ql-img-inline-formula" alt="&#92;&#115;&#105;&#103;&#109;&#97;&#95;&#115;" title="Rendered by QuickLaTeX.com" style="vertical-align: -3px;"/> and <img src="http://www.rorydriscoll.com/wp-content/ql-cache/quicklatex.com-3b8516ca4e18b5434a5638717a0c3f45_l3.png" class="ql-img-inline-formula" alt="&#92;&#115;&#105;&#103;&#109;&#97;&#95;&#116;" title="Rendered by QuickLaTeX.com" style="vertical-align: -3px;"/> are the partial derivatives of the surface position, and <img src="http://www.rorydriscoll.com/wp-content/ql-cache/quicklatex.com-11be5102f7ef033283dc776a65d7767d_l3.png" class="ql-img-inline-formula" alt="&#92;&#98;&#101;&#116;&#97;&#95;&#115;" title="Rendered by QuickLaTeX.com" style="vertical-align: -4px;"/> and <img src="http://www.rorydriscoll.com/wp-content/ql-cache/quicklatex.com-87c3e16c80862454881cd3a4832b53fb_l3.png" class="ql-img-inline-formula" alt="&#92;&#98;&#101;&#116;&#97;&#95;&#116;" title="Rendered by QuickLaTeX.com" style="vertical-align: -4px;"/> are the partial derivatives of the height field. The derivative directions <img src="http://www.rorydriscoll.com/wp-content/ql-cache/quicklatex.com-ae1901659f469e6be883797bfd30f4f8_l3.png" class="ql-img-inline-formula" alt="&#115;" title="Rendered by QuickLaTeX.com" style="vertical-align: 0px;"/> and <img src="http://www.rorydriscoll.com/wp-content/ql-cache/quicklatex.com-b4e3cbf5d4c5c6d9b702dd139f14c147_l3.png" class="ql-img-inline-formula" alt="&#116;" title="Rendered by QuickLaTeX.com" style="vertical-align: 0px;"/> are not explictly defined here.</p>
<p>It&#8217;s easiest to think of this as the projection of the 2D gradient onto a 3D surface along the normal. Intuitively, this says that the surface gradient direction is pushed out on orthogonal vectors to the s/n and t/n planes by however much the gradient specifies. The denominator term is there to scale up the result when the <img src="http://www.rorydriscoll.com/wp-content/ql-cache/quicklatex.com-ae1901659f469e6be883797bfd30f4f8_l3.png" class="ql-img-inline-formula" alt="&#115;" title="Rendered by QuickLaTeX.com" style="vertical-align: 0px;"/> and <img src="http://www.rorydriscoll.com/wp-content/ql-cache/quicklatex.com-b4e3cbf5d4c5c6d9b702dd139f14c147_l3.png" class="ql-img-inline-formula" alt="&#116;" title="Rendered by QuickLaTeX.com" style="vertical-align: 0px;"/> are not orthogonal, or are flipped.</p>
<h2>Implementation</h2>
<p>Implementing this technique is fairly straightforward once you realise the meaning of some of the variables. Since we&#8217;re free to choose the partial derivative directions <img src="http://www.rorydriscoll.com/wp-content/ql-cache/quicklatex.com-ae1901659f469e6be883797bfd30f4f8_l3.png" class="ql-img-inline-formula" alt="&#115;" title="Rendered by QuickLaTeX.com" style="vertical-align: 0px;"/> and <img src="http://www.rorydriscoll.com/wp-content/ql-cache/quicklatex.com-b4e3cbf5d4c5c6d9b702dd139f14c147_l3.png" class="ql-img-inline-formula" alt="&#116;" title="Rendered by QuickLaTeX.com" style="vertical-align: 0px;"/>, it&#8217;s convenient for the shader to use screen-space x and y. The value <img src="http://www.rorydriscoll.com/wp-content/ql-cache/quicklatex.com-1c9cc40f96a1492e298e7da85a2c1692_l3.png" class="ql-img-inline-formula" alt="&#92;&#115;&#105;&#103;&#109;&#97;" title="Rendered by QuickLaTeX.com" style="vertical-align: 0px;"/> is the position, and the value <img src="http://www.rorydriscoll.com/wp-content/ql-cache/quicklatex.com-b6a7605b1bcca8f1b416eaf733f34e08_l3.png" class="ql-img-inline-formula" alt="&#92;&#98;&#101;&#116;&#97;" title="Rendered by QuickLaTeX.com" style="vertical-align: -4px;"/> is the height field sample.</p>
<pre class="brush: cpp; title: ; notranslate">
// Project the surface gradient (dhdx, dhdy) onto the surface (n, dpdx, dpdy)
float3 CalculateSurfaceGradient(float3 n, float3 dpdx, float3 dpdy, float dhdx, float dhdy)
{
	float3 r1 = cross(dpdy, n);
	float3 r2 = cross(n, dpdx);

	return (r1 * dhdx - r2 * dhdy) / dot(dpdx, r1);
}

// Move the normal away from the surface normal in the opposite surface gradient direction
float3 PerturbNormal(float3 n, float3 dpdx, float3 dpdy, float dhdx, float dhdy)
{
	return normalize(normal - CalculateSurfaceGradient(normal, dpdx, dpdy, dhdx, dhdy));
}
</pre>
<p>So far, so good. Next we need to work out how to calculate the partial derivatives. The reason why we chose screen-space x and y to be our partial derivative directions is so that we can use the ddx and ddy shader instructions to generate the partial derivatives of both the position and the height.</p>
<p>Given a position and normal in the same coordinate-space, and a height map sample, calculating the final normal is straighforward:</p>
<pre class="brush: cpp; title: ; notranslate">
// Calculate the surface normal using screen-space partial derivatives of the height field
float3 CalculateSurfaceNormal(float3 position, float3 normal, float height)
{
	float3 dpdx = ddx(position);
	float3 dpdy = ddy(position);

	float dhdx = ddx(height);
	float dhdy = ddy(height);

	return PerturbNormal(normal, dpdx, dpdy, dhdx, dhdy);
}
</pre>
<p>Note that in shader model 5.0, you can use ddx_fine/ddy_fine instead of ddx/ddy to get high-precision partial derivatives.</p>
<p>So how does this look? At a medium distance, I would say that it looks pretty good:</p>
<p><a href="http://www.rorydriscoll.com/wp-content/uploads/2012/01/HeightMapFar.png"><img class="aligncenter size-full wp-image-700" title="HeightMapFar" src="http://www.rorydriscoll.com/wp-content/uploads/2012/01/HeightMapFar.png" alt="" /></a></p>
<p>But what about up close?</p>
<p><a href="http://www.rorydriscoll.com/wp-content/uploads/2012/01/HeightMapNear.png"><img class="aligncenter size-full wp-image-701" title="HeightMapNear" src="http://www.rorydriscoll.com/wp-content/uploads/2012/01/HeightMapNear.png" alt="" /></a></p>
<p>Uh oh! What’s happening here? Well, there are a couple of problems&#8230;</p>
<p>The main problem is that the height texture is using bilinear filtering, so the gradient between any two texels is constant. This causes large blocks to become very obvious when up close. There are a couple of options for alleviating this somewhat.</p>
<p>One option is to use bicubic filtering. I haven&#8217;t tried it, but I would expect this to make a good difference. The problem is that it will incur an extra cost. Another option, suggested in the paper, is to add a detail bump texture on top. This helps quite a lot, but again it adds more cost.</p>
<p>In the image below I&#8217;ve just tiled the same texture at 10x frequency over the top. It would be better to apply some kind of noise function as in the original paper. </p>
<p><a href="http://www.rorydriscoll.com/wp-content/uploads/2012/01/HeightMapWithDetailNear.png"><img class="aligncenter size-full wp-image-702" title="HeightMapWithDetailNear" src="http://www.rorydriscoll.com/wp-content/uploads/2012/01/HeightMapWithDetailNear.png" alt="" /></a></p>
<p>The second problem is more subtle. We&#8217;re getting some small block artifacts because of the way that the ddx and ddy shader instructions work. They take pairs of pixels in a pixel quad and subtract the relevant values to get the derivative. In the case of the height derivatives, we can alleviate this by performing the differencing ourselves with extra texture samples.</p>
<p>The first problem is pretty much a killer for me. I would rather not have to cover up a fundamental implementation issue with extra fudges and more cost.</p>
<h2>What Now?</h2>
<p>It&#8217;s unfortunate that this didn&#8217;t make it into the original paper, but Mikkelsen mentions in a <a href="http://mmikkelsen3d.blogspot.com/2011/07/derivative-maps.html">blog post</a> that you can increase the quality by using precomputed height derivatives. This method requires double the texture storage (or half the resolution) of the ddx/ddy method, but produces much better results.</p>
<p>You&#8217;re probably wondering how you can possibly precompute screen-space derivatives. We don&#8217;t actually have to. Instead we can use the chain rule to transform a partial derivative from one space to another. In our case we can transform our derivatives from uv-space to screen-space if we have the partial derivatives of the uvs in screen-space.</p>
<p>To calculate dhdx you need dhdu, dhdv, dudx and dvdx:</p>
<p class="ql-center-displayed-equation" style="line-height: 38px;"><span class="ql-right-eqno"> &nbsp; </span><span class="ql-left-eqno"> &nbsp; </span><img src="http://www.rorydriscoll.com/wp-content/ql-cache/quicklatex.com-976eff4caf3ee8bc616750362015dbbb_l3.png"class="ql-img-displayed-equation" alt="&#92;&#98;&#101;&#103;&#105;&#110;&#123;&#97;&#108;&#105;&#103;&#110;&#42;&#125; &#92;&#100;&#102;&#114;&#97;&#99;&#123;&#92;&#100;&#101;&#108;&#116;&#97;&#32;&#104;&#125;&#123;&#92;&#100;&#101;&#108;&#116;&#97;&#32;&#120;&#125;&#32;&#61;&#32;&#92;&#100;&#102;&#114;&#97;&#99;&#123;&#92;&#100;&#101;&#108;&#116;&#97;&#32;&#104;&#125;&#123;&#92;&#100;&#101;&#108;&#116;&#97;&#32;&#117;&#125;&#32;&#92;&#99;&#100;&#111;&#116;&#32;&#92;&#100;&#102;&#114;&#97;&#99;&#123;&#92;&#100;&#101;&#108;&#116;&#97;&#32;&#117;&#125;&#123;&#92;&#100;&#101;&#108;&#116;&#97;&#32;&#120;&#125;&#32;&#43;&#32;&#92;&#100;&#102;&#114;&#97;&#99;&#123;&#92;&#100;&#101;&#108;&#116;&#97;&#32;&#104;&#125;&#123;&#92;&#100;&#101;&#108;&#116;&#97;&#32;&#118;&#125;&#32;&#92;&#99;&#100;&#111;&#116;&#32;&#92;&#100;&#102;&#114;&#97;&#99;&#123;&#92;&#100;&#101;&#108;&#116;&#97;&#32;&#118;&#125;&#123;&#92;&#100;&#101;&#108;&#116;&#97;&#32;&#120;&#125; &#92;&#101;&#110;&#100;&#123;&#97;&#108;&#105;&#103;&#110;&#42;&#125;" title="Rendered by QuickLaTeX.com"/></p>
<p>To calculate dhdy you need dhdu, dhdv, dudy and dvdy:</p>
<p class="ql-center-displayed-equation" style="line-height: 42px;"><span class="ql-right-eqno"> &nbsp; </span><span class="ql-left-eqno"> &nbsp; </span><img src="http://www.rorydriscoll.com/wp-content/ql-cache/quicklatex.com-0bb139f70074e81bd9b3deccbfdf55b3_l3.png"class="ql-img-displayed-equation" alt="&#92;&#98;&#101;&#103;&#105;&#110;&#123;&#97;&#108;&#105;&#103;&#110;&#42;&#125; &#92;&#100;&#102;&#114;&#97;&#99;&#123;&#92;&#100;&#101;&#108;&#116;&#97;&#32;&#104;&#125;&#123;&#92;&#100;&#101;&#108;&#116;&#97;&#32;&#121;&#125;&#32;&#61;&#32;&#92;&#100;&#102;&#114;&#97;&#99;&#123;&#92;&#100;&#101;&#108;&#116;&#97;&#32;&#104;&#125;&#123;&#92;&#100;&#101;&#108;&#116;&#97;&#32;&#117;&#125;&#32;&#92;&#99;&#100;&#111;&#116;&#32;&#92;&#100;&#102;&#114;&#97;&#99;&#123;&#92;&#100;&#101;&#108;&#116;&#97;&#32;&#117;&#125;&#123;&#92;&#100;&#101;&#108;&#116;&#97;&#32;&#121;&#125;&#32;&#43;&#32;&#92;&#100;&#102;&#114;&#97;&#99;&#123;&#92;&#100;&#101;&#108;&#116;&#97;&#32;&#104;&#125;&#123;&#92;&#100;&#101;&#108;&#116;&#97;&#32;&#118;&#125;&#32;&#92;&#99;&#100;&#111;&#116;&#32;&#92;&#100;&#102;&#114;&#97;&#99;&#123;&#92;&#100;&#101;&#108;&#116;&#97;&#32;&#118;&#125;&#123;&#92;&#100;&#101;&#108;&#116;&#97;&#32;&#121;&#125; &#92;&#101;&#110;&#100;&#123;&#97;&#108;&#105;&#103;&#110;&#42;&#125;" title="Rendered by QuickLaTeX.com"/></p>
<p>The hlsl for this is very simple:</p>
<pre class="brush: cpp; title: ; notranslate">
float ApplyChainRule(float dhdu, float dhdv, float dud_, float dvd_)
{
	return dhdu * dud_ + dhdv * dvd_;
}
</pre>
<p>Assuming that we have a texture that stores the <em>texel-space</em> height derivatives, we can scale this up in the shader to uv-space by simply multiplying by the texture dimensions. We can then use the screen space uv derivatives and the chain rule to transform from dhdu/dhdv to dhdx/dhdy.</p>
<pre class="brush: cpp; title: ; notranslate">
// Calculate the surface normal using the uv-space gradient (dhdu, dhdv)
float3 CalculateSurfaceNormal(float3 position, float3 normal, float2 gradient)
{
	float3 dpdx = ddx(position);
	float3 dpdy = ddy(position);

	float dhdx = ApplyChainRule(gradient.x, gradient.y, ddx(uv.x), ddx(uv.y));
	float dhdy = ApplyChainRule(gradient.x, gradient.y, ddy(uv.x), ddy(uv.y));

	return PerturbNormal(normal, dpdx, dpdy, dhdx, dhdy);
}
</pre>
<p>So how does this look? Well, it&#8217;s pretty much the same at medium distance.</p>
<p><a href="http://www.rorydriscoll.com/wp-content/uploads/2012/01/DerivativeMapFar.png"><img class="aligncenter size-full wp-image-698" title="DerivativeMapFar" src="http://www.rorydriscoll.com/wp-content/uploads/2012/01/DerivativeMapFar.png" alt="" /></a></p>
<p>But it&#8217;s way better up close, since we&#8217;re now interpolating the derivatives.</p>
<p><a href="http://www.rorydriscoll.com/wp-content/uploads/2012/01/DerivativeMapNear.png"><img class="aligncenter size-full wp-image-699" title="DerivativeMapNear" src="http://www.rorydriscoll.com/wp-content/uploads/2012/01/DerivativeMapNear.png" alt="" /></a></p>
<h2>Conclusions</h2>
<p>In order to really draw any conclusions about this technique, I&#8217;m going to need to compare the quality, performance and memory consumption to that of normal mapping. That&#8217;s a whole other blog post waiting to happen&#8230;</p>
<p>But in theory, the pros are:</p>
<ul>
<li><b>Less mesh memory:</b> We don&#8217;t need to store a tangent vector, so this should translate into some pretty significant mesh memory savings.</li>
<li><b>Fewer interpolators:</b> We don&#8217;t need to pass the tangent vector from the vertex shader to the pixel shader, so this should be a performance gain.</li>
<li><b>Possible less texture memory:</b> At worst this method requires two channels in a texture. At best, a normal map takes up two channels.</li>
<li><del datetime="2012-01-15T20:50:01+00:00"><b>Easy scaling:</b> It&#8217;s easy to change the height scale on the fly by scaling the height derivatives. This isn&#8217;t quite so easy to get right when using normal maps. See <a href="http://www.j3l7h.de/talks/2008-02-18_Care_and_Feeding_of_Normal_Vectors.pdf">here</a>.</del> As Stephen Hill points out in the comments below, this is a pretty weak argument, so I&#8217;m removing it.</li>
</ul>
<p>And the cons are:</p>
<ul>
<li><b>More ALU:</b> It&#8217;s going to be interesting to see the actual numbers, but this is probably the only thing that could put the nail in the coffin for derivative maps. The extra cost for ALU might be compensated partially by the fewer interpolators, but we&#8217;ll have to see.</li>
<li><b>Less flexible:</b> A normal map can represent any derivative map, but the reverse is not true. I&#8217;m not sure that this is a significant problem in practice though.</li>
<li><b>Worse quality?</b> I&#8217;m not sure about this one, but it&#8217;ll be interesting to see if the quality holds up.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.rorydriscoll.com/2012/01/11/derivative-maps/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
		</item>
		<item>
		<title>UI Anti-Aliasing</title>
		<link>http://www.rorydriscoll.com/2012/01/08/ui-anti-aliasing/</link>
		<comments>http://www.rorydriscoll.com/2012/01/08/ui-anti-aliasing/#comments</comments>
		<pubDate>Sun, 08 Jan 2012 22:30:08 +0000</pubDate>
		<dc:creator>Rory</dc:creator>
				<category><![CDATA[General]]></category>
		<category><![CDATA[Graphics]]></category>

		<guid isPermaLink="false">http://www.rorydriscoll.com/?p=576</guid>
		<description><![CDATA[I&#8217;ve been working on making a really simple IMGUI implementation for my engine at home. I like to do a little bit of research when I&#8217;m approaching something new to me like this, so I went hunting around for publicly available implementations. While doing this, I came across Mikko Mononen&#8217;s implementation in Recast. I was [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve been working on making a really simple IMGUI implementation for my engine at home. I like to do a little bit of research when I&#8217;m approaching something new to me like this, so I went hunting around for publicly available implementations. While doing this, I came across <a href="http://digestingduck.blogspot.com/">Mikko Mononen&#8217;s</a> implementation in <a href="http://code.google.com/p/recastnavigation/">Recast</a>.</p>
<p>I was impressed when I ran the demo with how smooth his UI looked. It turns out that he&#8217;s using a little trick (which I&#8217;d never seen before, but I&#8217;m sure is old to many) to smooth of the edges of his UI elements.</p>
<p>Basically, the trick is to create a ring of extra vertices by extruding the edges of the polygon out by a certain amount. These extra vertices take the same color as the originals, but their alpha is set to zero. Mikko calls this &#8216;feathering&#8217;.</p>
<p>In my case, I found that I got good results by feathering just one pixel. Here&#8217;s a quick before/after comparison of the my IMGUI check box at 800% zoom:</p>
<p><a href="http://www.rorydriscoll.com/wp-content/uploads/2012/01/CheckBoxNoAA.png"><img class="aligncenter size-full wp-image-581" title="CheckBoxNoAA" src="http://www.rorydriscoll.com/wp-content/uploads/2012/01/CheckBoxNoAA.png" alt="" width="376" height="160" /></a></p>
<p><a href="http://www.rorydriscoll.com/wp-content/uploads/2012/01/CheckBoxWithAA.png"><img class="aligncenter size-full wp-image-580" title="CheckBoxWithAA" src="http://www.rorydriscoll.com/wp-content/uploads/2012/01/CheckBoxWithAA.png" alt="" width="376" height="160" /></a></p>
<p>And here&#8217;s a 1-to-1 example showing rounded button corners:</p>
<p><a href="http://www.rorydriscoll.com/wp-content/uploads/2012/01/ButtonNoAA.png"><img class="aligncenter size-full wp-image-584" title="ButtonNoAA" src="http://www.rorydriscoll.com/wp-content/uploads/2012/01/ButtonNoAA.png" alt="" width="160" height="60" /></a><a href="http://www.rorydriscoll.com/wp-content/uploads/2012/01/ButtonWithAA.png"><img class="aligncenter size-full wp-image-583" title="ButtonWithAA" src="http://www.rorydriscoll.com/wp-content/uploads/2012/01/ButtonWithAA.png" alt="" width="160" height="60" /></a></p>
<p>It&#8217;s a pretty nice improvement for a very simple technique! If you&#8217;re interested in what the code looks like, then either take a look at <a href="http://code.google.com/p/recastnavigation/source/browse/trunk/RecastDemo/Source/imgui.cpp?r=213">Mikko&#8217;s IMGUI implementation</a>, or you can find the code I use to feather my convex polygons below. </p>
<p>My implementation is a little less efficient since I recalculate each edge normal twice, but I chose to keep it simple for readability.</p>
<div id="gist-1579850" class="gist">

        <div class="gist-file">
          <div class="gist-data gist-syntax">
              <div class="highlight"><pre><div class='line' id='LC1'><span class="kt">void</span> <span class="n">CalculateEdgeNormal</span><span class="p">(</span><span class="kt">float</span><span class="o">&amp;</span> <span class="n">nx</span><span class="p">,</span> <span class="kt">float</span><span class="o">&amp;</span> <span class="n">ny</span><span class="p">,</span> <span class="kt">float</span> <span class="n">x0</span><span class="p">,</span> <span class="kt">float</span> <span class="n">y0</span><span class="p">,</span> <span class="kt">float</span> <span class="n">x1</span><span class="p">,</span> <span class="kt">float</span> <span class="n">y1</span><span class="p">)</span></div><div class='line' id='LC2'><span class="p">{</span></div><div class='line' id='LC3'>	<span class="k">const</span> <span class="kt">float</span> <span class="n">x01</span> <span class="o">=</span> <span class="n">x1</span> <span class="o">-</span> <span class="n">x0</span><span class="p">;</span></div><div class='line' id='LC4'>	<span class="k">const</span> <span class="kt">float</span> <span class="n">y01</span> <span class="o">=</span> <span class="n">y1</span> <span class="o">-</span> <span class="n">y0</span><span class="p">;</span></div><div class='line' id='LC5'><br/></div><div class='line' id='LC6'>	<span class="k">const</span> <span class="kt">float</span> <span class="n">length</span> <span class="o">=</span> <span class="n">Sqrt</span><span class="p">(</span><span class="n">x01</span> <span class="o">*</span> <span class="n">x01</span> <span class="o">+</span> <span class="n">y01</span> <span class="o">*</span> <span class="n">y01</span><span class="p">);</span></div><div class='line' id='LC7'><br/></div><div class='line' id='LC8'>	<span class="k">const</span> <span class="kt">float</span> <span class="n">dx</span> <span class="o">=</span> <span class="n">x01</span> <span class="o">/</span> <span class="n">length</span><span class="p">;</span></div><div class='line' id='LC9'>	<span class="k">const</span> <span class="kt">float</span> <span class="n">dy</span> <span class="o">=</span> <span class="n">y01</span> <span class="o">/</span> <span class="n">length</span><span class="p">;</span></div><div class='line' id='LC10'><br/></div><div class='line' id='LC11'>	<span class="n">nx</span> <span class="o">=</span> <span class="n">dy</span><span class="p">;</span></div><div class='line' id='LC12'>	<span class="n">ny</span> <span class="o">=</span> <span class="o">-</span><span class="n">dx</span><span class="p">;</span></div><div class='line' id='LC13'><span class="p">}</span></div><div class='line' id='LC14'><br/></div><div class='line' id='LC15'><span class="kt">void</span> <span class="n">FeatherConvexPolygon</span><span class="p">(</span><span class="n">Primitives</span><span class="o">&amp;</span> <span class="n">primitives</span><span class="p">,</span> <span class="k">const</span> <span class="n">Vertex</span><span class="o">*</span> <span class="n">vertices</span><span class="p">,</span> <span class="kt">int</span> <span class="n">count</span><span class="p">,</span> <span class="kt">float</span> <span class="n">amount</span><span class="p">,</span> <span class="k">const</span> <span class="n">Texture</span><span class="o">*</span> <span class="n">texture</span><span class="p">)</span></div><div class='line' id='LC16'><span class="p">{</span></div><div class='line' id='LC17'>	<span class="n">Vertex</span><span class="o">*</span> <span class="n">extruded</span> <span class="o">=</span> <span class="n">Memory</span><span class="o">::</span><span class="n">Allocate</span><span class="o">&lt;</span><span class="n">Vertex</span><span class="o">&gt;</span><span class="p">(</span><span class="n">Memory</span><span class="o">::</span><span class="n">Temp</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">Vertex</span><span class="p">)</span> <span class="o">*</span> <span class="n">count</span><span class="p">);</span></div><div class='line' id='LC18'><br/></div><div class='line' id='LC19'>	<span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o">&lt;</span> <span class="n">count</span><span class="p">;</span> <span class="o">++</span><span class="n">i</span><span class="p">)</span></div><div class='line' id='LC20'>	<span class="p">{</span></div><div class='line' id='LC21'>		<span class="k">const</span> <span class="n">Vertex</span><span class="o">&amp;</span> <span class="n">previous</span> <span class="o">=</span> <span class="n">vertices</span><span class="p">[(</span><span class="n">i</span> <span class="o">+</span> <span class="n">count</span> <span class="o">-</span> <span class="mi">1</span><span class="p">)</span> <span class="o">%</span> <span class="n">count</span><span class="p">];</span></div><div class='line' id='LC22'>		<span class="k">const</span> <span class="n">Vertex</span><span class="o">&amp;</span> <span class="n">current</span> <span class="o">=</span> <span class="n">vertices</span><span class="p">[</span><span class="n">i</span><span class="p">];</span></div><div class='line' id='LC23'>		<span class="k">const</span> <span class="n">Vertex</span><span class="o">&amp;</span> <span class="n">next</span> <span class="o">=</span> <span class="n">vertices</span><span class="p">[(</span><span class="n">i</span> <span class="o">+</span> <span class="mi">1</span><span class="p">)</span> <span class="o">%</span> <span class="n">count</span><span class="p">];</span></div><div class='line' id='LC24'><br/></div><div class='line' id='LC25'>		<span class="kt">float</span> <span class="n">nx0</span><span class="p">,</span> <span class="n">ny0</span><span class="p">,</span> <span class="n">nx1</span><span class="p">,</span> <span class="n">ny1</span><span class="p">;</span></div><div class='line' id='LC26'><br/></div><div class='line' id='LC27'>		<span class="n">CalculateEdgeNormal</span><span class="p">(</span><span class="n">nx0</span><span class="p">,</span> <span class="n">ny0</span><span class="p">,</span> <span class="n">previous</span><span class="p">.</span><span class="n">x</span><span class="p">,</span> <span class="n">previous</span><span class="p">.</span><span class="n">y</span><span class="p">,</span> <span class="n">current</span><span class="p">.</span><span class="n">x</span><span class="p">,</span> <span class="n">current</span><span class="p">.</span><span class="n">y</span><span class="p">);</span></div><div class='line' id='LC28'>		<span class="n">CalculateEdgeNormal</span><span class="p">(</span><span class="n">nx1</span><span class="p">,</span> <span class="n">ny1</span><span class="p">,</span> <span class="n">current</span><span class="p">.</span><span class="n">x</span><span class="p">,</span> <span class="n">current</span><span class="p">.</span><span class="n">y</span><span class="p">,</span> <span class="n">next</span><span class="p">.</span><span class="n">x</span><span class="p">,</span> <span class="n">next</span><span class="p">.</span><span class="n">y</span><span class="p">);</span></div><div class='line' id='LC29'><br/></div><div class='line' id='LC30'>		<span class="kt">float</span> <span class="n">nx</span> <span class="o">=</span> <span class="p">(</span><span class="n">nx0</span> <span class="o">+</span> <span class="n">nx1</span><span class="p">)</span> <span class="o">*</span> <span class="mf">0.5f</span><span class="p">;</span></div><div class='line' id='LC31'>		<span class="kt">float</span> <span class="n">ny</span> <span class="o">=</span> <span class="p">(</span><span class="n">ny0</span> <span class="o">+</span> <span class="n">ny1</span><span class="p">)</span> <span class="o">*</span> <span class="mf">0.5f</span><span class="p">;</span></div><div class='line' id='LC32'><br/></div><div class='line' id='LC33'>		<span class="n">extruded</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">=</span> <span class="n">Vertex</span><span class="p">(</span><span class="n">current</span><span class="p">.</span><span class="n">x</span> <span class="o">+</span> <span class="n">nx</span> <span class="o">*</span> <span class="n">amount</span><span class="p">,</span> <span class="n">current</span><span class="p">.</span><span class="n">y</span> <span class="o">+</span> <span class="n">ny</span> <span class="o">*</span> <span class="n">amount</span><span class="p">,</span> <span class="n">Color</span><span class="p">(</span><span class="n">current</span><span class="p">.</span><span class="n">r</span><span class="p">,</span> <span class="n">current</span><span class="p">.</span><span class="n">g</span><span class="p">,</span> <span class="n">current</span><span class="p">.</span><span class="n">b</span><span class="p">,</span> <span class="mf">0.0f</span><span class="p">));</span></div><div class='line' id='LC34'>	<span class="p">}</span></div><div class='line' id='LC35'><br/></div><div class='line' id='LC36'>	<span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o">&lt;</span> <span class="n">count</span><span class="p">;</span> <span class="o">++</span><span class="n">i</span><span class="p">)</span></div><div class='line' id='LC37'>	<span class="p">{</span></div><div class='line' id='LC38'>		<span class="k">const</span> <span class="kt">int</span> <span class="n">j</span> <span class="o">=</span> <span class="p">(</span><span class="n">i</span> <span class="o">+</span> <span class="mi">1</span><span class="p">)</span> <span class="o">%</span> <span class="n">count</span><span class="p">;</span></div><div class='line' id='LC39'>		<span class="n">AddQuad</span><span class="p">(</span><span class="n">primitives</span><span class="p">,</span> <span class="n">vertices</span><span class="p">[</span><span class="n">i</span><span class="p">],</span> <span class="n">extruded</span><span class="p">[</span><span class="n">i</span><span class="p">],</span> <span class="n">extruded</span><span class="p">[</span><span class="n">j</span><span class="p">],</span> <span class="n">vertices</span><span class="p">[</span><span class="n">j</span><span class="p">],</span> <span class="n">texture</span><span class="p">);</span></div><div class='line' id='LC40'>	<span class="p">}</span></div><div class='line' id='LC41'><br/></div><div class='line' id='LC42'>	<span class="n">Memory</span><span class="o">::</span><span class="n">Free</span><span class="p">(</span><span class="n">extruded</span><span class="p">);</span></div><div class='line' id='LC43'><span class="p">}</span></div></pre></div>
          </div>

          <div class="gist-meta">
            <a href="https://gist.github.com/raw/1579850/4e77ec03800a74fa9f85ed24211cc2fad85d5d6d/FeatherUI.cpp" style="float:right;">view raw</a>
            <a href="https://gist.github.com/1579850#file_feather_ui.cpp" style="float:right;margin-right:10px;color:#666">FeatherUI.cpp</a>
            <a href="https://gist.github.com/1579850">This Gist</a> brought to you by <a href="http://github.com">GitHub</a>.
          </div>
        </div>
</div>

]]></content:encoded>
			<wfw:commentRss>http://www.rorydriscoll.com/2012/01/08/ui-anti-aliasing/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Visual Studio Addins Updated</title>
		<link>http://www.rorydriscoll.com/2011/04/11/visual-studio-addins-updated/</link>
		<comments>http://www.rorydriscoll.com/2011/04/11/visual-studio-addins-updated/#comments</comments>
		<pubDate>Tue, 12 Apr 2011 02:43:40 +0000</pubDate>
		<dc:creator>Rory</dc:creator>
				<category><![CDATA[Addins]]></category>

		<guid isPermaLink="false">http://www.rorydriscoll.com/?p=561</guid>
		<description><![CDATA[DoItNow &#38; RevisionItNow I&#8217;ve updated the Google Code repository for DoItNow with a newer version. I&#8217;ve removed all source control features from DoItNow and separated them into their own add-in. This should make it more compatible with other add-ins you may be using to handle source control. I&#8217;ve uploaded the Mercurial version of the add-in, [...]]]></description>
			<content:encoded><![CDATA[<h2>DoItNow &amp; RevisionItNow</h2>
<p>I&#8217;ve updated the <a href="http://doitnow.googlecode.com">Google Code</a> repository for DoItNow with a newer version. I&#8217;ve removed all source control features from DoItNow and separated them into their own add-in. This should make it more compatible with other add-ins you may be using to handle source control. I&#8217;ve uploaded the Mercurial version of the add-in, but the full source is available should you want to change it back to Perforce.</p>
<p>I&#8217;m only using Visual Studio 2010 at home now, so the project files are all in that format at the moment. I provided Addin files which will work for Visual Studio 2008 as well though.</p>
<p>By request from someone at work, the open in solution dialog now performs matches on multiple (space-separated) search terms.</p>
<p><a href="http://www.rorydriscoll.com/wp-content/uploads/2011/04/DoItNow.png"><img class="aligncenter size-full wp-image-562" title="DoItNow" src="http://www.rorydriscoll.com/wp-content/uploads/2011/04/DoItNow.png" alt="" width="631" height="118" /></a></p>
<h2>FindItNow</h2>
<p>I&#8217;ve been testing out an idea for a replacement for the standard Visual Studio find-in-files. This is the first pass at it (let&#8217;s call it an alpha), so download at your own risk! It&#8217;s actually sits side-by-side with the existing find-in-files, so it&#8217;s pretty safe to install.</p>
<p>Here&#8217;s where it&#8217;s (possibly) better:</p>
<ul>
<li>It can match multiple search terms and will rank results accordingly.</li>
<li>Remembers all settings from the previous searches. Pushing up or down will set things like &#8216;match case&#8217; based on the search history.</li>
<li>It populates the file types drop down based on the files in the solution, and sorts them by frequency.</li>
</ul>
<p>Here&#8217;s where it&#8217;s (definitely) worse right now:</p>
<ul>
<li>It doesn&#8217;t remember the search paths you used in previous Visual Studio sessions.</li>
<li>Since it ranks results, it doesn&#8217;t present them incrementally. This means you might have to wait longer to get results.</li>
<li>You can&#8217;t cancel a search!</li>
</ul>
<p><a href="http://www.rorydriscoll.com/wp-content/uploads/2011/04/FindItNow.png"><img class="aligncenter size-full wp-image-563" title="FindItNow" src="http://www.rorydriscoll.com/wp-content/uploads/2011/04/FindItNow.png" alt="" width="403" height="159" /></a></p>
<p>FindItNow ranks search results based on the number of hits on each line, as well as hits in the surrounding few lines. In order for a result to even show up, it must have all search terms present in a seven-line block.</p>
<p>The top matches (100% quality) have all search terms on the line in question. Worse quality matches have progressively fewer matches on the line.</p>
<p>e.g. Here are the results for a search I did looking for a quaternion conjugate function on some of my code:</p>
<pre class="brush: plain; title: ; notranslate">
Query: &quot;quat conjugate&quot;
Options: case=ignore, match=partial
Source: Entire solution
Finding... Complete

Match Quality: 100%
--------------------
c:\Development.old\Libraries\C++\Math\Quaternion.h(79): 		inline Quaternion Conjugate(const Quaternion&amp; quat)
c:\Development.old\Libraries\C++\Math\Quaternion.h(96): 			return Conjugate(quat) / Length(quat);
c:\Development.old\Libraries\C++\Math\Quaternion.h(101): 			const Quaternion result = quat * Quaternion(vec.x, vec.y, vec.z, 0) * Conjugate(quat);

Match Quality: 62%
-------------------
c:\Development.old\Libraries\C++\Math\Quaternion.h(76): 			return Quaternion(lhs.x / f, lhs.y / f, lhs.z / f, lhs.w / f);
c:\Development.old\Libraries\C++\Math\Quaternion.h(81): 			return Quaternion(-quat.x, -quat.y, -quat.z, quat.w);
c:\Development.old\Libraries\C++\Math\Quaternion.h(94): 		inline Quaternion Invert(const Quaternion&amp; quat)
c:\Development.old\Libraries\C++\Math\Quaternion.h(99): 		inline Vector3 Rotate(const Vector3&amp; vec, const Quaternion&amp; quat)

Total files searched: 407
Matching lines: 34
Find Time: 90 ms
Output Time: 9 ms
</pre>
<p>I&#8217;m finding it pretty useful when exploring for functions I think *should* exist in a large code-base since you don&#8217;t have to get the exact string to match.</p>
<p>If you&#8217;re interested in either of these, you can grab the binaries <a href="http://doitnow.googlecode.com/files/DoItNow-2011-04-11.zip">here</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.rorydriscoll.com/2011/04/11/visual-studio-addins-updated/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>MockItNow: Now with Win64 goodness!</title>
		<link>http://www.rorydriscoll.com/2011/02/23/mockitnow-now-with-win64-goodness/</link>
		<comments>http://www.rorydriscoll.com/2011/02/23/mockitnow-now-with-win64-goodness/#comments</comments>
		<pubDate>Thu, 24 Feb 2011 04:32:55 +0000</pubDate>
		<dc:creator>Rory</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://www.rorydriscoll.com/?p=557</guid>
		<description><![CDATA[A couple of fine chaps named Julian Adams and Clement Dagneau from my previous company Black Rock Studios (née Climax Racing) took it upon themselves to tackle the daunting task of  porting MockItNow to x64 (MSVC). They&#8217;ve kindly shared their efforts back with the main repository in google code. Feel free to profit from their [...]]]></description>
			<content:encoded><![CDATA[<p>A couple of fine chaps named Julian Adams and Clement Dagneau from my previous company Black Rock Studios (née Climax Racing) took it upon themselves to tackle the daunting task of  porting MockItNow to x64 (MSVC).</p>
<p>They&#8217;ve kindly shared their efforts back with the main repository in <a href="http://code.google.com/p/mockitnow/">google code</a>. Feel free to profit from their hard work!</p>
<p>Thanks a lot Julian &amp; Clement!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.rorydriscoll.com/2011/02/23/mockitnow-now-with-win64-goodness/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Adventures with Arduino: From 41 Instructions to 1</title>
		<link>http://www.rorydriscoll.com/2011/02/01/adventures-with-arduino-from-41-instructions-to-1/</link>
		<comments>http://www.rorydriscoll.com/2011/02/01/adventures-with-arduino-from-41-instructions-to-1/#comments</comments>
		<pubDate>Tue, 01 Feb 2011 21:46:57 +0000</pubDate>
		<dc:creator>Rory</dc:creator>
				<category><![CDATA[Arduino]]></category>

		<guid isPermaLink="false">http://www.rorydriscoll.com/?p=463</guid>
		<description><![CDATA[A couple of weeks ago, I was looking for a fun project to do with my 8 year old son. By chance, I saw a recent issue of Make Magazine which had a feature about an electronics prototyping platform called Arduino. I thought it sounded pretty cool, and could be a good introduction for my [...]]]></description>
			<content:encoded><![CDATA[<p>A couple of weeks ago, I was looking for a fun project to do with my 8 year old son. By chance, I saw a recent issue of <a href="http://makezine.com/">Make Magazine</a> which had a feature about an electronics prototyping platform called <a href="http://www.arduino.cc">Arduino</a>. I thought it sounded pretty cool, and could be a good introduction for my son to some basic electronics as well as some simple programming.</p>
<p>The Arduino is designed to be very easy to use. It comes with a set of tools based on the <a href="http://wiring.org.co/">Wiring</a> programming environment and libraries. The board is controlled by an <a href="http://www.atmel.com/dyn/products/product_card.asp?category_id=163&amp;family_id=607&amp;subfamily_id=760&amp;part_id=4198">Atmel 8-bit microprocessor</a>. In comparison to what I spend my days working with, this is a very simple processor. No vector registers, no floating point registers. Even something as simple as adding two 32-bit numbers requires three assembly instructions (add word, add byte with carry, add byte with carry).</p>
<p><a href="http://www.rorydriscoll.com/wp-content/uploads/2011/01/arduino.jpg"><img class="size-full wp-image-469" src="http://www.rorydriscoll.com/wp-content/uploads/2011/01/arduino.jpg" alt="" width="500" height="385" /></a><br />
<span id="more-463"></span><br />
While I was waiting for my Arduino to be delivered, I downloaded the IDE and compiled the simple example code to blink an LED once every second. The status window of the IDE dutifully printed out the compilation progress, and at the end, also printed out the executable size &#8211; 1018 bytes.</p>
<p>Since I&#8217;ve never worked on this platform before I wasn&#8217;t sure what the executable size should be, but that seemed a little large to me. So I started to dig around a little bit&#8230;</p>
<p>Warning: There&#8217;s a fair amount of code and assembly in this post, but I&#8217;ll do my best to explain it all!</p>
<h2>Start Simple</h2>
<p>The Blink program is really simple. It just sets a specific pin on the board to output mode, then continually turns that pin on and off every second.</p>
<pre class="brush: cpp; title: ; notranslate">
void setup()
{
	pinMode(13, OUTPUT);
}

void loop()
{
	digitalWrite(13, HIGH);   // set the LED on
	delay(1000);              // wait for a second
	digitalWrite(13, LOW);    // set the LED off
	delay(1000);              // wait for a second
}
</pre>
<p>I thought I may as well start at the top, so I took a look at the pinMode function. Fortunately, the Arduino codebase is open source and comes as part of the install package.</p>
<p>The pinMode function just sets the data direction (i.e. input or output) of a particular pin on the microcontroller. Most of the pins on the ATmega328p chip are wired into the numbered pins on the Arduino board, but the mapping isn&#8217;t one-to-one.</p>
<pre class="brush: cpp; title: ; notranslate">
void pinMode(uint8_t pin, uint8_t mode)
{
	uint8_t bit = digitalPinToBitMask(pin); // * Fetch
	uint8_t port = digitalPinToPort(pin); // * Fetch

	volatile uint8_t *reg;

	if (port == NOT_A_PIN) return;

	// JWS: can I let the optimizer do this?
	reg = portModeRegister(port); // * Fetch

	if (mode == INPUT) {
		uint8_t oldSREG = SREG;
		cli();
		*reg &amp;= ~bit;
		SREG = oldSREG;
	} else {
		uint8_t oldSREG = SREG;
		cli();
		*reg |= bit;
		SREG = oldSREG;
	}
}
</pre>
<h3>What Does It Do?</h3>
<p>Here&#8217;s a quick overview of what this code does by line number:</p>
<ul>
<li>3, 4 &#038; 11: Look up which bit of which particular data direction register (DDR) needs to be set or unset.</li>
<li>14 &#038; 15/19 &#038; 20: Perform the first part of the disable interrupts dance &#8211; storing off the existing status register and clearing the global interrupt bit (cli).</li>
<li>16: Clear the DDR bit if the mode is input</li>
<li>21: Set the DDR bit if the mode is output</li>
<li>17 &#038; 22: The rest of the interrupt dance &#8211; restore the status register to its previous state.</li>
</ul>
<h3>Some Thoughts</h3>
<ul>
<li>This code performs three fetches (marked) from program memory to convert the Arduino pin number into the correct data direction register and bit. Each fetch from program memory costs 3 cycles.</li>
<li>It needs to disable interrupts because setting or clearing the DDR bit requires a read/modify/write pattern. If an interrupt happened after the read but before the write, the handler might change that DDR but the portMode function would change it back.</li>
<li>It stores off the existing status register before disabling interrupts. I presume this is so that it can restore global interrupts to the previous state at the end of the function rather than arbitrarily turning them back on.</li>
<li>It&#8217;s interesting that the author (JWS) hints that there may be a faster way to do this in his comment&#8230;</li>
</ul>
<h3>What are we trying to do again?</h3>
<p>After looking at all of this code, it&#8217;s easy to forget what the point of this function actually is. All it really does is set or clear one bit in a specific data direction register &#8211; that&#8217;s it! All the other code is just there for support.</p>
<h2>Digging Deeper</h2>
<p>It&#8217;s all very well looking at the source code, but this isn’t what the microcontroller sees. In order to see that, we need to look at the result of the compilation. We can do this really easily by taking the <a href="http://en.wikipedia.org/wiki/Executable_and_Linkable_Format">elf</a> that is produced and running it through the disassembler (avr-objdump) included with the toolchain. </p>
<pre style="padding-left: 30px;">avr-objdump -D Blink.elf &gt; Blink.asm</pre>
<p>You can also get objdump to interleave the original source code with the disassembly. I sometimes find this more confusing than the plain assembly since it can be misleading with where it places some of the source code.</p>
<pre style="padding-left: 30px;">avr-objdump -S Blink.elf &gt; Blink.asm</pre>
<h3>Disassemble!</h3>
<p>This might look intimidating at first, but please bear with it. The assembly proceeds in the same order as the original source code, and I’ve added comments to explain what’s going on. You can really dig much further into it by looking at the assembly instruction reference in the <a href="http://www.atmel.com/dyn/resources/prod_documents/doc8271.pdf">Atmel Microcontroller Guide</a>, but it&#8217;s not necessary to understand the rest of this post.</p>
<pre class="brush: cpp; title: ; notranslate">
000002d8 :
// Fetch the port bit mask
2d8:	48 2f       	mov	r20, r24
2da:	50 e0       	ldi	r21, 0x00	; 0
2dc:	ca 01       	movw	r24, r20
2de:	86 56       	subi	r24, 0x66	; 102
2e0:	9f 4f       	sbci	r25, 0xFF	; 255
2e2:	fc 01       	movw	r30, r24
2e4:	24 91       	lpm	r18, Z+
// Fetch the processor port
2e6:	4a 57       	subi	r20, 0x7A	; 122
2e8:	5f 4f       	sbci	r21, 0xFF	; 255
2ea:	fa 01       	movw	r30, r20
2ec:	84 91       	lpm	r24, Z+
// Return if the port is invalid
2ee:	88 23       	and	r24, r24
2f0:	c1 f0       	breq	.+48     	; 0x322
// Fetch the DDR address
2f2:	e8 2f       	mov	r30, r24
2f4:	f0 e0       	ldi	r31, 0x00	; 0
2f6:	ee 0f       	add	r30, r30
2f8:	ff 1f       	adc	r31, r31
2fa:	e8 59       	subi	r30, 0x98	; 152
2fc:	ff 4f       	sbci	r31, 0xFF	; 255
2fe:	a5 91       	lpm	r26, Z+
300:	b4 91       	lpm	r27, Z+
// Check the mode and branch if necessary
302:	66 23       	and	r22, r22
304:	41 f4       	brne	.+16     	; 0x316
// Clear the DDR bit and return
306:	9f b7       	in	r25, 0x3f	; 63
308:	f8 94       	cli
30a:	8c 91       	ld	r24, X
30c:	20 95       	com	r18
30e:	82 23       	and	r24, r18
310:	8c 93       	st	X, r24
312:	9f bf       	out	0x3f, r25	; 63
314:	08 95       	ret
// Set the DDR bit and return
316:	9f b7       	in	r25, 0x3f	; 63
318:	f8 94       	cli
31a:	8c 91       	ld	r24, X
31c:	82 2b       	or	r24, r18
31e:	8c 93       	st	X, r24
320:	9f bf       	out	0x3f, r25	; 63
322:	08 95       	ret
000002ce :
// Copy the parameters into registers
2ce:	8d e0       	ldi	r24, 0x0D	; 13
2d0:	61 e0       	ldi	r22, 0x01	; 1
// Call pinMode
2d2:	0e 94 6c 01 	call	0x2d8	; 0x2d8
2d6:	08 95       	ret
</pre>
<p>The pin mode function is comprised of 38 assembly instructions. The call to pinMode from Setup call requires an additional 3 instructions &#8211; 2 to put the parameters into registers, and 1 to call the function. That&#8217;s a grand total of 41 instructions. That sounds like an awful lot of work just to set one bit.</p>
<p>Isn’t there an easier way to set or clear a bit in a register? Of course there is. If you look at the assembly instruction reference, there&#8217;s actually a single instruction (sbi) that sets a specific bit in a register. There is a corresponding instruction (cbi) to clear that bit.</p>
<h2>Improving pinMode</h2>
<p>Ok, so we know in theory it could be more efficient, but how do we get the compiler to do this so that we don&#8217;t have to look at assembly all of the time? Aren&#8217;t there other things we need to worry about? What about interrupts? What about the memory fetches?</p>
<p>I made a few different changes to the pinMode function to try and reduce the instruction count. I&#8217;m going to go over them in the order I attempted them.</p>
<h3>Removing One Fetch</h3>
<p>As the old adage goes, two memory fetches are better than three. Well, perhaps it&#8217;s not an adage, but it&#8217;s true nonetheless. </p>
<p>The original code looks up the port number using the pin index, then uses the port number to look up the DDR address. We can skip that first lookup by creating a table that goes directly from pin index to DDR address.</p>
<p>You&#8217;re probably thinking, &#8220;but that&#8217;ll cost more memory&#8221;, and you&#8217;d be right. We&#8217;ll address that shortly though. First, let&#8217;s see what that does to the assembly.</p>
<pre class="brush: cpp; title: ; notranslate">
000002e0 :
// Fetch the port bit mask
2e0:	a8 2f       	mov	r26, r24
2e2:	b0 e0       	ldi	r27, 0x00	; 0
2e4:	cd 01       	movw	r24, r26
2e6:	8e 53       	subi	r24, 0x3E	; 62
2e8:	9f 4f       	sbci	r25, 0xFF	; 255
2ea:	fc 01       	movw	r30, r24
2ec:	24 91       	lpm	r18, Z+
// Fetch the DDR address
2ee:	a4 58       	subi	r26, 0x84	; 132
2f0:	bf 4f       	sbci	r27, 0xFF	; 255
2f2:	8c 91       	ld	r24, X
2f4:	e8 2f       	mov	r30, r24
2f6:	f0 e0       	ldi	r31, 0x00	; 0
// Check the mode and branch if necessary
2f8:	66 23       	and	r22, r22
2fa:	31 f4       	brne	.+12     	; 0x308
// Clear the DDR bit
2fc:	9f b7       	in	r25, 0x3f	; 63
2fe:	f8 94       	cli
300:	80 81       	ld	r24, Z
302:	20 95       	com	r18
304:	28 23       	and	r18, r24
306:	04 c0       	rjmp	.+8      	; 0x310
// Set the DDR bit
308:	9f b7       	in	r25, 0x3f	; 63
30a:	f8 94       	cli
30c:	80 81       	ld	r24, Z
30e:	28 2b       	or	r18, r24
310:	20 83       	st	Z, r18
312:	9f bf       	out	0x3f, r25	; 63
// Return
314:	08 95       	ret
</pre>
<p>It&#8217;s definitely better &#8211; we&#8217;ve reduced the function size by 11 instructions, but we&#8217;ve also added another constant table of 20 bytes (not shown). Each assembly instruction is 2 bytes, so we have a net saving of only 2 bytes. Let&#8217;s do something about that.</p>
<h3>Inlining</h3>
<p>Inlining a function just means that the compiler puts the code directly into the caller site rather than jumping to a separate function. No function call means that we don&#8217;t pay the cost (three instructions in this case) of calling the function. It also allows the compiler to be more aggressive about optimizations.</p>
<p>Often, the compiler will inline functions in the same file automatically when it can. If you want to share that function, then you can put it in a header file and explicitly mark it as inline by including &#8216;inline&#8217; in front of the function declaration.</p>
<p>The main drawback of inlining function calls is that it can make your code larger since duplicates of the same function can be scattered around the compiled code.</p>
<p>In this case, the code size for calling the non-inlined function just once is 3 instructions per call, plus 27 instructions for the function itself. Assuming we call the function N times, and the inlined function size is Y, then a rough guide for when it&#8217;s better (spacewise) to inline is when N * Y &lt; N * 3 + 27.</p>
<p>Wouldn’t it be nice if Y were 3 or less? Then it’s always better to inline.</p>
<p>Let&#8217;s try it out for portMode and see how it affects the assembly size. Note that we&#8217;re now looking at the Setup function, since that&#8217;s the place where pinMode was being called from, hence the place where the inlined function now resides.  </p>
<pre class="brush: cpp; title: ; notranslate">
00000368 :
// Fetch the port bit mask
368:	e7 ea       	ldi	r30, 0xA7	; 167
36a:	f0 e0       	ldi	r31, 0x00	; 0
36c:	e4 91       	lpm	r30, Z+
// Set the DDR bit
36e:	9f b7       	in	r25, 0x3f	; 63
370:	f8 94       	cli
372:	84 b1       	in	r24, 0x04	; 4
374:	e8 2b       	or	r30, r24
376:	e4 b9       	out	0x04, r30	; 4
378:	9f bf       	out	0x3f, r25	; 63
// Return
37a:	08 95       	ret
</pre>
<p>Wow, that’s pretty small &#8211; just 9 instructions (not including the return for Setup) now&#8230; Where did all of the code go? </p>
<p>The compiler now knows exactly what the function parameters are, so it can go to town on optimizing the code for those particular parameters rather than having to support the general case.</p>
<ul>
<li>It got rid of the branch when the pin index is out of bounds since it knows what the pin index is now.</li>
<li>It removed the entire part of the function to deal with the input mode &#8211; not needed.</li>
<li>It worked out which data direction register is needed by looking at the array and indexing into it at compile time.</li>
<li>It’s still loading the bit mask from program memory though&#8230;</li>
</ul>
<p>Why could it resolve the DDR address at compile time, but not the bit mask? They&#8217;re both just arrays of bytes indexed by the pin.</p>
<p>The answer to this one is to do with where the arrays are defined. Because I made the DDR address array myself, I put it in the same file as pinMode. The bit mask array is in an entirely different file, so the compiler couldn&#8217;t index into it during compilation.</p>
<p>That&#8217;s pretty easy to fix &#8211; just move the array into the same file as pinMode.</p>
<pre class="brush: cpp; title: ; notranslate">
00000368 :
// Set the DDR bit
368:	8f b7       	in	r24, 0x3f	; 63
36a:	f8 94       	cli
36c:	25 9a       	sbi	0x04, 5	; 4
36e:	8f bf       	out	0x3f, r24	; 63
// Return
370:	08 95       	ret
</pre>
<p>Great! No more memory fetches! But hang on&#8230; What happened to where it sets the DDR bit? It used to load the DDR register, &#8216;or&#8217; in the mask, the write the DDR register back out.</p>
<p>Again, the compiler is being clever. It knows that the mask only ever contains one bit, so it can use the more optimal &#8216;sbi&#8217; instruction.</p>
<h3>Interrupts</h3>
<p>If you remember, we needed the interrupt because of the fact that we were reading, modifying then writing to a register. But now we only have on instruction to set the bit. </p>
<p>The Atmel guide says that an instruction will finish before an interrupt is processed &#8211; even if that instruction takes multiple cycles (sbi takes 2 cycles). This means that we don’t need to disable interrupts since you can’t get an interrupt at a bad time.</p>
<h3>Et Voila!</h3>
<p>Removing interrupts removes three instructions, so that’s it. We’re down to a single instruction to set a bit, as it should be.</p>
<pre class="brush: cpp; title: ; notranslate">
00000368 :
// Set the DDR bit
 368:	25 9a       	sbi	0x04, 5	; 4
// Return
 36a:	08 95       	ret
</pre>
<h2>Summary</h2>
<p>The code went from 41 instructions for a single call to just 1. What&#8217;s more, the savings increase the more frequently we call it. Not only is it smaller, it&#8217;s also faster.</p>
<p>I compiled and ran the Blink application using the modified pinMode function, and the program size went down to 936 bytes. Not bad for a quick bit of experimentation.</p>
<pre class="brush: cpp; title: ; notranslate">
void pinMode(uint8_t pin, uint8_t mode)
{
	const uint8_t digital_pin_to_bit_mask[] = { 0x01, 0x02, 0x04, 0x08, 0x10, 0x20, 0x40, 0x80, 0x01, 0x02, 0x04, 0x08, 0x10, 0x20, 0x01, 0x02, 0x04, 0x08, 0x10, 0x20 };
	const uint8_t digital_pin_to_ddr[] = { 0x2A, 0x2A, 0x2A, 0x2A, 0x2A, 0x2A, 0x2A, 0x2A, 0x24, 0x24, 0x24, 0x24, 0x24, 0x24, 0x27, 0x27, 0x27, 0x27, 0x27, 0x27 };

	uint8_t bit = digital_pin_to_bit_mask[pin];
	volatile uint8_t* reg = (volatile uint8_t*)digital_pin_to_ddr[pin];

	if (mode == INPUT)
		*reg &amp;= ~bit;
	else
		*reg |= bit;
}
</pre>
<p>I think there&#8217;s the potential for performance optimizations such as these the elsewhere in the Arduino codebase. For example, digitalWrite has many of the same issues as pinMode, so the same solutions would apply. </p>
<p>I suspect that even more optimizations could be applied by putting more of an onus on the programmer to do things like remember to turn off pulse width modulation on a pin when it&#8217;s not needed.</p>
<p>On the other hand, I understand why some of the decisions have been made as they stand &#8211; to keep things simple. The nice thing is that the source code is provided, so it&#8217;s easy for those who want to look at the original code to change it to suit their needs.</p>
<p>I would definitely recommend having a look at the assembly for your Arduino programs every now and then. It gives you a good idea of what&#8217;s really happening at the hardware level. It can also give you insight into how you might be able to speed things up, or reduce program size.</p>
<p>The biggest lesson I learned throughout this is this: <strong>8 year olds don&#8217;t care about optimizations and disassembly.</strong> They like flashing lights, buttons, sensors, motors and other cool stuff like that.</p>
<p>I should have known&#8230;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.rorydriscoll.com/2011/02/01/adventures-with-arduino-from-41-instructions-to-1/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>What&#8217;s wrong with this picture?</title>
		<link>http://www.rorydriscoll.com/2010/01/30/whats-wrong-with-this-picture/</link>
		<comments>http://www.rorydriscoll.com/2010/01/30/whats-wrong-with-this-picture/#comments</comments>
		<pubDate>Sun, 31 Jan 2010 00:31:58 +0000</pubDate>
		<dc:creator>Rory</dc:creator>
				<category><![CDATA[Global Illumination]]></category>
		<category><![CDATA[Graphics]]></category>

		<guid isPermaLink="false">http://www.rorydriscoll.com/?p=393</guid>
		<description><![CDATA[Well, you could point out a number of things to answer that question. There&#8217;s some pretty obvious aliasing, a random pixel on the ground which should be in shadow but isn&#8217;t, it&#8217;s noisy, boring etc. But that&#8217;s not my point. The point is: It&#8217;s too dark! I know it&#8217;s too dark because I know how [...]]]></description>
			<content:encoded><![CDATA[<p><img class="aligncenter size-full wp-image-398" title="montecarlo256samples2bounces" src="http://www.rorydriscoll.com/wp-content/uploads/2010/01/montecarlo256samples2bounces.png" alt="montecarlo256samples2bounces" width="656" height="396"/></p>
<p>Well, you could point out a number of things to answer that question. There&#8217;s some pretty obvious aliasing, a random pixel on the ground which should be in shadow but isn&#8217;t, it&#8217;s noisy, boring etc. But that&#8217;s not my point. The point is: It&#8217;s too dark!</p>
<p>I know it&#8217;s too dark because I know how I rendered it, and I rendered it wrong. It still kind of looks acceptable (well to me at least) though. I&#8217;m not sure that I would say that it&#8217;s implausibly dark if I didn&#8217;t know it.</p>
<p><span id="more-393"></span></p>
<h2>How many bounces are enough?</h2>
<p>I rendered this image using a Monte Carlo estimator with two bounces of indirect light. Each bounce estimated the irradiance at the intersection point using 256 rays in a cosine-weighted stratified-sampling pattern. It&#8217;s the two bounce part that makes it wrong. Any light that bounced more than twice before heading toward the camera is totally ignored. Since this approach doesn&#8217;t converge on the correct solution to the rendering equation, it&#8217;s classified as &#8216;biased&#8217;.</p>
<p>How much does light that bounced more than twice really contribute to a scene? Of course that depends on the materials in the scene quite a bit, but in this case, there&#8217;s a noticeable difference. I rendered the same exact scene using the Monte Carlo estimator for the first bounce, but for the subsequent bounces I used a path tracer. By using Russian Roulette (a topic unto itself) to terminate the path, you can get an unbiased approximation of the irradiance.</p>
<p><img class="aligncenter size-full wp-image-399" title="pathtracer4096paths80percentsurvival" src="http://www.rorydriscoll.com/wp-content/uploads/2010/01/pathtracer4096paths80percentsurvival.png" alt="pathtracer4096paths80percentsurvival" width="656" height="396" /></p>
<p>Ok, great. It&#8217;s brighter, but is it actually correct? I was wondering about this, then I happened to come across an idea while reading <a href="http://www.amazon.com/Physically-Based-Rendering-Implementation-Interactive/dp/012553180X">Physically Based Rendering</a> to compare the results of my integrators with something that has an analytical solution to the rendering equation.</p>
<h2>I now present an analytical solution to the Rendering Equation!</h2>
<h5>(&#8230; in a very simple case)</h5>
<p>Solving the rendering equation analytically for most scenes is just impossible, that&#8217;s why we have to rely on numerical methods like Monte Carlo Estimation. However, the book suggests a <i>very</i> simple scene for which it can be solved. The suggested setup is that of light bouncing around the inside of a sphere. The sphere emits light internally, and reflects it diffusely to other points on the inside of the sphere. Since the sphere is rotationally invariant and the reflections are diffuse, every point on the sphere reflects and emits the same radiance in all directions.</p>
<p>So how do you solve the rendering equation for this situation? It&#8217;s fairly easy. Recall the rendering equation:</p>
<p><a href="http://www.codecogs.com/eqnedit.php?latex=L(\vec{x},\omega)=L_e@plus;\int_{\Omega}\rho(\vec{x},\omega,{\omega}')L_i(\vec{x},{\omega}')\cos\theta\,\delta{\omega}'" target="_blank"><img src="http://latex.codecogs.com/gif.latex?L(\vec{x},\omega)=L_e+\int_{\Omega}\rho(\vec{x},\omega,{\omega}')L_i(\vec{x},{\omega}')\cos\theta\,\delta{\omega}'" title="L(\vec{x},\omega)=L_e+\int_{\Omega}\rho(\vec{x},\omega,{\omega}')L_i(\vec{x},{\omega}')\cos\theta\,\delta{\omega}'" /></a></p>
<p>In this setup, we are using a diffuse BRDF with reflectance <i>d</i>. It&#8217;s also normalized to maintain energy conservation</p>
<p><a href="http://www.codecogs.com/eqnedit.php?latex=\rho(\vec{x},\omega,{\omega}')=\frac{d}{\pi}" target="_blank"><img src="http://latex.codecogs.com/gif.latex?\rho(\vec{x},\omega,{\omega}')=\frac{d}{\pi}" title="\rho(\vec{x},\omega,{\omega}')=\frac{d}{\pi}" /></a></p>
<p>Also, as mentioned previously, the outgoing radiance in all directions is the same as the incoming radiance, so this reduces the rendering equation to:</p>
<p><a href="http://www.codecogs.com/eqnedit.php?latex=L=L_e@plus;\frac{d}{\pi}\int_{\Omega}L\cos\theta\,\delta{\omega}'" target="_blank"><img src="http://latex.codecogs.com/gif.latex?L=L_e+\frac{dL}{\pi}\int_{\Omega}\cos\theta\,\delta{\omega}'" title="L=L_e+\frac{dL}{\pi}\int_{\Omega}\cos\theta\,\delta{\omega}'" /></a></p>
<p>Solving this equation for L is now pretty easy. First we have to convert from an integral over solid angles, to a double-angle version. The important thing to remember when doing this is to introduce the new <i>sine</i> term:</p>
<p><a href="http://www.codecogs.com/eqnedit.php?latex=L=L_e@plus;\frac{dL}{\pi}\int_{\phi=0}^{2\pi}\int_{\theta=0}^{\frac{\pi}{2}}\cos\theta\sin\theta\,\delta{\theta}\,\delta{\phi}" target="_blank"><img src="http://latex.codecogs.com/gif.latex?L=L_e+\frac{dL}{\pi}\int_{\phi=0}^{2\pi}\int_{\theta=0}^{\frac{\pi}{2}}\cos\theta\sin\theta\,\delta{\theta}\,\delta{\phi}" title="L=L_e+\frac{dL}{\pi}\int_{\phi=0}^{2\pi}\int_{\theta=0}^{\frac{\pi}{2}}\cos\theta\sin\theta\,\delta{\theta}\,\delta{\phi}" /></a></p>
<p>There&#8217;s a double angle trigonometric identity that makes this easier:</p>
<p><a href="http://www.codecogs.com/eqnedit.php?latex=\cos{x}sin{x}=\frac{\sin{2x}}{2}" target="_blank"><img src="http://latex.codecogs.com/gif.latex?\cos{x}sin{x}=\frac{\sin{2x}}{2}" title="\cos{x}sin{x}=\frac{\sin{2x}}{2}" /></a></p>
<p>So we just need to integrate the following:</p>
<p><a href="http://www.codecogs.com/eqnedit.php?latex=L=L_e@plus;\frac{dL}{2\pi}\int_{\phi=0}^{2\pi}\int_{\theta=0}^{\frac{\pi}{2}}\sin2\theta\,\delta{\theta}\,\delta{\phi}" target="_blank"><img src="http://latex.codecogs.com/gif.latex?L=L_e+\frac{dL}{2\pi}\int_{\phi=0}^{2\pi}\int_{\theta=0}^{\frac{\pi}{2}}\sin2\theta\,\delta{\theta}\,\delta{\phi}" title="L=L_e+\frac{dL}{2\pi}\int_{\phi=0}^{2\pi}\int_{\theta=0}^{\frac{\pi}{2}}\sin2\theta\,\delta{\theta}\,\delta{\phi}" /></a></p>
<p>Integrate over theta:</p>
<p><a href="http://www.codecogs.com/eqnedit.php?latex=L=L_e@plus;\frac{dL}{2\pi}\int_{\phi=0}^{2\pi}\left[\frac{-\cos2\theta}{2}\right]_{0}^{\frac{\pi}{2}}\,\delta{\phi}" target="_blank"><img src="http://latex.codecogs.com/gif.latex?L=L_e+\frac{dL}{2\pi}\int_{\phi=0}^{2\pi}\left[\frac{-\cos2\theta}{2}\right]_{0}^{\frac{\pi}{2}}\,\delta{\phi}" title="L=L_e+\frac{dL}{2\pi}\int_{\phi=0}^{2\pi}\left[\frac{-\cos2\theta}{2}\right]_{0}^{\frac{\pi}{2}}\,\delta{\phi}" /></a></p>
<p>This integral over theta is just 1, so now integrate over phi:</p>
<p><a href="http://www.codecogs.com/eqnedit.php?latex=L=L_e@plus;\frac{dL}{2\pi}\int_{\phi=0}^{2\pi}\,\delta{\phi}" target="_blank"><img src="http://latex.codecogs.com/gif.latex?L=L_e+\frac{dL}{2\pi}\int_{\phi=0}^{2\pi}\,\delta{\phi}" title="L=L_e+\frac{dL}{2\pi}\int_{\phi=0}^{2\pi}\,\delta{\phi}" /></a></p>
<p>This integral is of course just 2 &Pi;. So we&#8217;re left with a very simple equation:</p>
<p><a href="http://www.codecogs.com/eqnedit.php?latex=L=L_e@plus;dL" target="_blank"><img src="http://latex.codecogs.com/gif.latex?L=L_e+dL" title="L=L_e+dL" /></a></p>
<p>Solving for L gives us the final expect radiance at every point in the sphere:</p>
<p><a href="http://www.codecogs.com/eqnedit.php?latex=L=\frac{L_e}{1-d}" target="_blank"><img src="http://latex.codecogs.com/gif.latex?L=\frac{L_e}{1-d}" title="L=\frac{L_e}{1-d}" /></a></p>
<p>This intuitively makes sense. As d grows, so does L. When d is 1, all hell breaks loose, and when d is 0 all we are left with is the emitted light. Obviously d can never be greater than 1 or the energy conservation rule would have been broken.</p>
<h2>The test application</h2>
<p>Alright, so all I need to do now is make a test application that fires a bunch of rays around the inside of a sphere and compare the results to the analytical solution. Well&#8230; So I thought. Due to the curvature of the inside of the sphere, I found that a good number of rays I fired near the horizon were escaping the sphere.</p>
<p>Currently I apply an epsilon to the minimum ray intersection to try and prevent self-intersections and this causes problems. For now (and just for these tests), instead I&#8217;m pushing the ray starting point away from the intersection a small amount in the direction of the intersection normal. I made the sphere really big too. I&#8217;d welcome any better ideas for alleviating this problem.</p>
<p>For the tests below, I set the emitted light value to 1, and the diffuse reflectance, d, to 0.5, meaning that the outgoing radiance, L, should be 2.</p>
<h2>Multi-bounce Monte Carlo results</h2>
<p>You can work out from the radiance equation how different numbers of bounces of light will affect the final solution. In this case, I just ran my Monte Carlo integrator from 0 to 16 bounces and produced the following graph showing the percentage (of the expected result) absolute error.</p>
<p><img class="aligncenter" src="http://spreadsheets.google.com/oimg?key=0AliVyEtgVru8dHJDZ2w3V0FBcmRSb0t4TElJbHhXNXc&amp;oid=5&amp;v=1264822032867" alt="" /></p>
<p>You can see that the error starts off really high, but drops off pretty quickly as more bounces are added. Each subsequent bounce has less and less effect on the error reduction, as expected.</p>
<p>What you have to remember though, is that each bounce adds exponentially to the number of rays that have to be cast to achieve that error. In real-world situations we need to cast a lot of rays over the hemisphere in order to get a correct solution. To get under 5% error you&#8217;d need four bounces with this method. Even using a very modest number of rays per estimation quickly becomes unruly: 256 rays with 4 bounces = over 4 billion rays needed! Per pixel! Without anti-aliasing!</p>
<h2>Path Tracer results</h2>
<p>Ok, so how about the path tracer? In theory the path tracer should average out to zero error, but can have potentially very high variance. Here&#8217;s the average error over 1000 runs with different survival probabilities:</p>
<p><img src="http://spreadsheets.google.com/oimg?key=0AliVyEtgVru8dFIyWlMtWVdMTGl4RUJFZkd5blRhRXc&amp;oid=1&amp;v=1264822087193" alt="" /></p>
<p>Well, it&#8217;s not zero everywhere, but it&#8217;s a pretty close in most cases. I think that the results obtained from this kind of method probably depend quite a bit on the quality of the random number generator used. I&#8217;m just using the stdlib version, so I&#8217;ll try switching it up at some point to see how it affects things. I haven&#8217;t shown it here, but the variance at low survival probabilities is incredibly high. This surely accounts for how far off the results are under about 10% survival.</p>
<p>The good thing about the path tracer is that it doesn&#8217;t require exorbitant numbers of rays. Yes, the variance can be high, but it can be reduced by tracing more paths. The net result is that it produces low error images much quicker than using standard Monte Carlo integration.</p>
<h2>The End</h2>
<p>Well, the point of this post really was: How do you know your integrator is correct? Comparing it to something simple that can be calculated analytically is not the be-all and end-all, but it&#8217;s a good start. There are many other things that could still be wrong even if your integrator gets good results against this simple test, like how you&#8217;re calculating sample directions, your PDF etc.</p>
<p> For me, being able to compare my integrators to a known solution definitely helped me to make sure that my path tracer was getting correct results.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.rorydriscoll.com/2010/01/30/whats-wrong-with-this-picture/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Direct3D 11 Multithreading</title>
		<link>http://www.rorydriscoll.com/2009/04/21/direct3d-11-multithreading/</link>
		<comments>http://www.rorydriscoll.com/2009/04/21/direct3d-11-multithreading/#comments</comments>
		<pubDate>Wed, 22 Apr 2009 04:11:02 +0000</pubDate>
		<dc:creator>Rory</dc:creator>
				<category><![CDATA[C++]]></category>
		<category><![CDATA[Graphics]]></category>

		<guid isPermaLink="false">http://www.rorydriscoll.com/?p=310</guid>
		<description><![CDATA[I&#8217;ve been putting it off for a while, but with my recent trip to GDC and the arrival of the Direct3D 11 beta, I thought it was about time I switched my renderer to be multithreaded. One of the things I learned at a Direct3D 11 talk at GDC is that it works on &#8216;down-level [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve been putting it off for a while, but with my recent trip to GDC and the arrival of the Direct3D 11 beta, I thought it was about time I switched my renderer to be multithreaded. One of the things I learned at a <a href="http://developer.amd.com/gpu_assets/Your%20Game%20Needs%20Direct3D%2011,%20So%20Get%20Started%20Now.pps">Direct3D 11 talk</a> at GDC is that it works on &#8216;down-level hardware&#8217;, which means DirectX 9 &amp; 10 cards. Of course, you don&#8217;t get the snazzy new hardware features, but you do get some of the benefits of the new API, like multithreading and limited compute shaders (albeit not as fast as it will be on the real hardware).<br />
<span id="more-310"></span><br />
There has been some multithreading support in earlier DirectX versions for a while now by using the multithreaded flag when creating the device. Typically though, the pattern has been to run a dedicated rendering thread and submit objects to be rendered to that thread. This allows the device to stay in single threaded mode where it is faster.</p>
<p>Things have changed a lot with Direct3D 11. The rendering API has been separated from the factory functions into a separate object called the device context. The factory functions on the device are all free threaded, meaning that they can be called from any thread. The device context functions are designed to be called from the same thread.</p>
<p><img class="aligncenter size-full wp-image-331" title="directx-opengl5-w-157028-13" src="http://www.rorydriscoll.com/wp-content/uploads/2009/04/directx-opengl5-w-157028-13.png" alt="directx-opengl5-w-157028-13" width="450" height="338" /></p>
<p>The basic idea behind multithreading in Direct3D 11 is that you create an immediate device context on the main thread. Then, for each thread on which you&#8217;d like to be able to render, you create a deferred context. As you can probably guess from the names, commands executed on the immediate context get executed immediately, but those on the deferred context just get saved off into a command list. You then execute the deferred command lists on the main thread using the immediate device context. Sounds easy enough.</p>
<h2>Thread Pools</h2>
<p>Given that you can submit draw calls to deferred contexts on multiple threads, it makes sense to ditch the single rendering thread concept and switch to using something like a thread pool for issuing the draw calls. This scales far better than a dedicated rendering thread. It&#8217;s also pretty easy to set up a simple thread pool, and give each worker thread a deferred render context.</p>
<p>There are plenty of places on the internet to read about thread pools so I&#8217;m not going to get into it here, but one thing I can&#8217;t stress enough is to make sure that you get your synchronization right! In my initial implementation, I used my normal queue data structure, but wrapped it up in mutexes (mutices?) to make sure it was thread-safe. This worked out well since I was very confident that things were working correctly, but a quick foray into VTune told me that I was spending 40% of the time waiting on synchronization points!</p>
<p><img class="size-full wp-image-343 alignleft" title="queues" src="http://www.rorydriscoll.com/wp-content/uploads/2009/04/queues.jpg" alt="queues" width="119" height="181" /></p>
<p>After some quick digging around, I came across a few articles that Herb Sutter wrote for Dr Dobb&#8217;s Journal about producer/consumer queues. I implemented the low-lock queue recommended by Sutter, and got a good speedup of at least 30% (that number is off the top of my head, but I remember it was a lot). The relevant articles I read are <a href="http://www.ddj.com/architect/210604448">single producer/consumer queue</a>, <a href="http://www.ddj.com/architect/211601363">generalized concurrent queue</a>, and <a href="http://www.ddj.com/architect/212201163">measuring performance</a>. I still use events for sending the worker threads to sleep when there is nothing left to work on, and to wake them up when data is added to the queue.</p>
<p>My application already stores up all of the state needed for a draw call in an object called a RenderContext, so instead of passing off this render context to the renderer on the main thread, instead it just gets enqueued to be rendered by one of the threads in the thread pool. When the worker thread gets to it, it passes the render context off to a thread-local renderer object initialized with a deferred device context. This renderer sets all of the <em>changed</em> state and issues the final draw call.</p>
<p>Finally, back on the main thread, it waits until all of the render contexts have been submitted to the deferred device contexts, and then executes each of these on the immediate device context.</p>
<h2>Test Scenario</h2>
<p>In order to stress my renderer a bit, I fabricated a scenario with 10,000 models. Each model has a sphere and a ground plane with their own material. I use a loose octree for culling out the models outside of the frustum, but I don&#8217;t do any sorting of any kind. This means that the alternating materials that get rendered for the sphere and then the ground put a fair amount of stress on the CPU side of the renderer.</p>
<p>My single threaded renderer took about 50 ms to render the intial view of the scene. By switching to using the thread pool, this went down to about 30 ms. A nice improvement, that&#8217;s for sure. Obviously, as fewer objects are visible, the gains of using the multithreaded renderer disappeared.</p>
<h2>Profiling</h2>
<p>I was happy that the multithreading appeared to be doing its job, but I wasn&#8217;t quite satisfied because I couldn&#8217;t really tell <em>how</em> well it was doing. Time for some profiling!</p>
<p>There appear to be quite a few CPU profilers out there. First of all I downloaded an evaluation of Intel VTune. It&#8217;s pretty overwhelming, but it gave me a lot of pertinent information. The bugger is that you have to pay a hefty sum for it, so I tossed it out of the window. I also tried out <a href="http://msdn.microsoft.com/en-us/library/cc305187.aspx">Microsoft xperf</a>. This sampling profiler gave me a pretty good overview of what was expensive with the standard inclusive/exclusive view. It was a great help for quickly tracking down some areas of the code that I could very easily improve. I still use this.</p>
<p>The trouble with most of the sampling profilers is that they don&#8217;t know about frames. They just add up all of the samples over the given time period which gives you an idea on average what is happening. I wanted to get information about what was happening within the frame, so I implemented a really simple frame profiler.</p>
<h3>Frame Profiler</h3>
<p>An in-game profiler is a really handy tool to have. It lets you see in real-time exactly how your CPU time is being spent in one frame on each of your threads. It&#8217;s also pretty easy to set up.</p>
<p>First of all, I created a class called ThreadProfiler. As the name suggests, the ThreadProfiler class is responsible for recording events on a specific thread. This class has functions to notify it of the beginning and end of the frame, as well as when a profiling event begins and ends. All it really does is to record the name of the event, a color for display, and the timestamps when the event begins and ends. The events can be nested, so it maintains a stack of active events and records the depth of the stack for each event.</p>
<p>Next I created the singleton FrameProfiler class. The idea for this class is to hold all of the ThreadProfiler objects, and to forward events onto those classes based on the current thread ID. Threads are required to register their thread ID with the frame profiler in order for events to be recorded.</p>
<pre>
<pre class="brush: cpp; title: ; notranslate">
        class FrameProfiler : public Core::Singleton&lt;FrameProfiler&gt;
        {
        public:

            FrameProfiler();

            void RegisterThread(int threadId);

            void BeginFrame(bool enabled);
            void EndFrame();

            void BeginEvent(int threadId, const Core::String&amp; name, uint32 color);
            void EndEvent(int threadId);

            DataStructures::ArrayList&lt;ThreadProfiler&gt;&amp; GetThreadProfilers();
            const DataStructures::ArrayList&lt;ThreadProfiler&gt;&amp; GetThreadProfilers() const;

        private:

            DataStructures::ArrayList&lt;ThreadProfiler&gt; m_threadProfilers;
        };
</pre>
</pre>
<p>The final piece is a really simple macro which grabs the function name and creates an object which tells the FrameProfiler when it is created and destroyed. This is the macro that I place into whatever function or loop I&#8217;d like to profile.</p>
<pre>
<pre class="brush: cpp; title: ; notranslate">
class ScopedProfileEvent
{
public:

        ScopedProfileEvent(const Core::String&amp; name, uint32 color)
        {
                if (FrameProfiler::IsCreated())
                {
                        FrameProfiler::Instance().BeginEvent(Core::Platform::GetCurrentThreadId(), name, color);
                }
        }

        ~ScopedProfileEvent()
        {
                if (FrameProfiler::IsCreated())
                {
                        FrameProfiler::Instance().EndEvent(Core::Platform::GetCurrentThreadId());
                }
        }
};
#define PROFILE(X) const Profile::ScopedProfileEvent event__LINE__(String(__FUNCTION__), (uint32)X)
</pre>
</pre>
<p>Unlike a sampling profiler, this kind of profiling has a certain amount of processing overhead. There are a couple of quick things you can do to help with this though. The first is just to make sure that you don&#8217;t always have the overhead, and compile it out for your final builds. It&#8217;s important to do your profiling on an optimized build, so I would recommend debug, release, and final configurations or something similar. The second thing you can do is to just not run it every frame. I have it set on a key press so that I can get to the area I&#8217;d like to profile without the overhead, then hit the button to profile the next frame and display the results.</p>
<p>I&#8217;m not sure about how accurate this would be, but you could probably compare the previous frame&#8217;s duration to the profiled frame to get a rough estimate of the overhead that the profiling functions added. I wouldn&#8217;t rely on that though.</p>
<p>There&#8217;s actually quite a bit of information that can be gleaned from these profiling events, but the first thing I did was to render out the events as rectangles on a timeline. In the image below, I have two threads running. The main thread at the bottom has three levels of nested events being shown, and the top worker thread just has one.</p>
<p><img class="aligncenter size-large wp-image-313" title="oneworkerthread" src="http://www.rorydriscoll.com/wp-content/uploads/2009/04/oneworkerthread-1024x597.png" alt="oneworkerthread" width="700" height="394" /></p>
<p>Ok, there&#8217;s no legend right now, but I&#8217;m working on it. Each black/grey bar in the background represents one millisecond of frame time.</p>
<p>The bottom row on the main thread represents the update in green, the render in blue, and the call to Device::Present in red. Given the long red bar, I&#8217;d say I&#8217;m GPU limited in this scene.</p>
<p>The row above represents the breakdown of the render function from the bottom row. The cyan sliver is shadow rendering (actually I&#8217;m not rendering any shadows which is why it&#8217;s tiny). The huge magenta bar is the model rendering, and the yellow bar is post-processing.</p>
<p>The top row in the bottom thread represents the breakdown of the model rendering function. The green slivers are models being found in the octree and the red blocks are models being prepared for rendering. The large white bar is actually the command list from the worker thread being executed on the immediate device context. I was pretty surprised to see this segment so large, since I didn&#8217;t notice it in the other profilers at all.</p>
<h3>Experiments</h3>
<p>Now that I have a frame profiler, I can really experiment with my thread pool setup to see how it affects the frame. My computer has a dual core processor, so based on Sutter&#8217;s articles, I was expecting that one main thread and one worker would be the best setup. Even so, I tried running a variety of numbers of worker threads to see how it looked. Here&#8217;s what four threads looks like:</p>
<p><img class="aligncenter size-large wp-image-312" title="fourworkerthreads" src="http://www.rorydriscoll.com/wp-content/uploads/2009/04/fourworkerthreads-1024x597.png" alt="fourworkerthreads" width="700" height="394" /></p>
<p>The first thing I noticed was just how much worse all of the threads fared. Each worker thread appeared to perform a tiny bit of work, and then get swapped out for another thread. The main thread really suffered due to this too. This is a great example of how visualizing this data is really illuminating. The scene was already GPU bound, so even though the rendering code was performing far worse, the frame rate actually stayed the same.</p>
<p>Another experiment I wanted to run was just how much other applications could affect the frame rate of my application. In this case, I just had sysinternals process exlporer running and polling the system processes every half second. It only took me a few tries to hit a frame where I could see the effect:</p>
<p><img class="aligncenter size-large wp-image-314" title="stolen" src="http://www.rorydriscoll.com/wp-content/uploads/2009/04/stolen-1024x597.png" alt="stolen" width="700" height="394" /></p>
<p>Notice the scale of the millisecond bars now &#8211; this frame took over twice as long to run as my first example with the exact same setup. You can see a big gap on the worker thread where another process stole its time. Event when it did get some time, it appears to be running very slowly.</p>
<p>Also, you can now see a large grey bar in the middle row of the main thread which shows the main thread waiting for the worker thread to finish.</p>
<p>The execution of the command list is pretty consistently taking up three and half milliseconds or so. This is much higher than I had thought it would be. I really hope that this time gets reduced with newer drivers or hardware.</p>
<p>One last thing I&#8217;ve done to investigate what is happening in my application is to display the frame rate history. I use a moving average to calculate the frame time, so I have the last 100 frames stored anyway. It&#8217;s a simple enough task to just display this.</p>
<p><img class="aligncenter size-full wp-image-364" title="frametime" src="http://www.rorydriscoll.com/wp-content/uploads/2009/04/frametime.png" alt="frametime" width="656" height="396" /></p>
<p>You can see how varied the frame times are even though the camera isn&#8217;t moving. This is probably due to other processes on my computer interfering I&#8217;d imagine.</p>
<h2>Final Thoughts</h2>
<p>It was a fun adventure porting my code to Direct3D 11, particularly implementing a multithreaded renderer using a thread pool. I would recommend trying it out to those of you who have Direct3D 10 engines at the moment.</p>
<p>The jump from Direct3D 10 to 11 is nowhere near as bad as the previous jump from 9 to 10. It took me about three hours to change my rendering code to deal with the changes. The most awkward part was probably having to pass in the device context to functions which need to map buffers, since these functions are no longer on the buffers themselves.</p>
<p>Visualizing profiling data in real time can be a real eye-opener for understanding how your code is actually running rather than how you think it may be running. It has really helped me identify good candidates for moving to using the thread pool as well as pointing out areas of the code that are taking a surprisingly large amount of frame time.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.rorydriscoll.com/2009/04/21/direct3d-11-multithreading/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
	</channel>
</rss>

