<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>CodeItNow &#187; Graphics</title>
	<atom:link href="http://www.rorydriscoll.com/category/graphics/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.rorydriscoll.com</link>
	<description></description>
	<lastBuildDate>Mon, 23 Jan 2012 01:50:36 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Derivative Maps</title>
		<link>http://www.rorydriscoll.com/2012/01/11/derivative-maps/</link>
		<comments>http://www.rorydriscoll.com/2012/01/11/derivative-maps/#comments</comments>
		<pubDate>Wed, 11 Jan 2012 16:17:26 +0000</pubDate>
		<dc:creator>Rory</dc:creator>
				<category><![CDATA[Graphics]]></category>

		<guid isPermaLink="false">http://www.rorydriscoll.com/?p=609</guid>
		<description><![CDATA[I recently came across an interesting paper, Bump Mapping Unparametrized Surfaces on the GPU by Morten Mikkelsen of Naughty Dog. This paper describes an alternative method to normal mapping, closely related to bump mapping. The alluring prospect of this technique is that it doesn’t require that a tangent space be defined. Mikkelsen is apparently well-versed [...]]]></description>
			<content:encoded><![CDATA[<p>I recently came across an interesting paper, <a href="http://jbit.net/~sparky/sfgrad_bump/mm_sfgrad_bump.pdf">Bump Mapping Unparametrized Surfaces on the GPU</a> by <a href="http://mmikkelsen3d.blogspot.com/">Morten Mikkelsen</a> of Naughty Dog. This paper describes an alternative method to normal mapping, closely related to bump mapping. The alluring prospect of this technique is that it doesn’t require that a tangent space be defined.</p>
<p>Mikkelsen is apparently well-versed in academic obfuscation (tsk!), so the paper itself can be a little hard to read. If you&#8217;re interested in reading it, then I would recommend first reading Jim Blinn’s <a href="http://research.microsoft.com/pubs/73939/p286-blinn.pdf">original bump mapping paper</a> to understand some of the derivations.</p>
<h2>But Wait! What’s Wrong with Normal Maps?</h2>
<p>Nothing really. But if something comes along that can improve quality, performance or memory consumption then it&#8217;s worth taking a a look.</p>
<h2>A Quick Detour into Gradients</h2>
<p>Given a scalar height field (i.e. a two-dimensional array of scalar values), the gradient of that field is a 2D vector field where each vector points in the direction of greatest change. The length of the vectors corresponds to the rate of change.</p>
<p>The contour map below represents the scalar field generated from the function <img src="http://www.rorydriscoll.com/wp-content/ql-cache/quicklatex.com-1915089e5a5c29010638af5758da1d42_l3.png" class="ql-img-inline-formula" alt="&#102;&#40;&#120;&#44;&#121;&#41;&#32;&#61;&#32;&#49;&#32;&#45;&#32;&#40;&#120;&#94;&#50;&#32;&#43;&#32;&#121;&#94;&#50;&#41;" title="Rendered by QuickLaTeX.com" style="vertical-align: -4px;"/>. The vector field shows the gradient of that scalar field. Note how each vector points towards the center, and how the vectors in the center are smaller due to the lower rate of change.</p>
<p><a href="http://www.rorydriscoll.com/wp-content/uploads/2012/01/ScalarField.png"><img class="alignright width=300 height=300 wp-image-706" title="ScalarField" src="http://www.rorydriscoll.com/wp-content/uploads/2012/01/ScalarField.png" alt="" /></a><a href="http://www.rorydriscoll.com/wp-content/uploads/2012/01/GradientOfScalarField.png"><img class="alignright width=300 height=300 wp-image-705" title="GradientOfScalarField" src="http://www.rorydriscoll.com/wp-content/uploads/2012/01/GradientOfScalarField.png" alt="" /></a></p>
<h2>Derivative Maps</h2>
<p>The main premise of the paper is that we can project the gradient of the height field onto an underlying surface and use it to skew the surface normal to approximate the normal of the height-map surface. We can do all of this without requiring tangent vectors.</p>
<p>As with the original bump-mapping technique, it’s not exact due to some terms being dropped due to their relatively small influence, but it’s close.</p>
<p>There are really only two important formulae to consider from the paper. The first shows how to perturb the surface normal using the <em>surface gradient</em>. Don&#8217;t confuse the surface gradient with the gradient of the height field mentioned above! As you&#8217;ll see shortly, they&#8217;re different.</p>
<p class="ql-center-displayed-equation" style="line-height: 20px;"><span class="ql-right-eqno"> &nbsp; </span><span class="ql-left-eqno"> &nbsp; </span><img src="http://www.rorydriscoll.com/wp-content/ql-cache/quicklatex.com-da2c8c0e98f692b80aabe646b5371512_l3.png"class="ql-img-displayed-equation" alt="&#92;&#98;&#101;&#103;&#105;&#110;&#123;&#97;&#108;&#105;&#103;&#110;&#42;&#125; &#123;&#92;&#98;&#97;&#114;&#123;&#110;&#125;&#125;&#39;&#61;&#92;&#98;&#97;&#114;&#123;&#110;&#125;&#45;&#92;&#110;&#97;&#98;&#108;&#97;&#95;&#115;&#92;&#98;&#101;&#116;&#97; &#92;&#101;&#110;&#100;&#123;&#97;&#108;&#105;&#103;&#110;&#42;&#125;" title="Rendered by QuickLaTeX.com"/></p>
<p>Here, <img src="http://www.rorydriscoll.com/wp-content/ql-cache/quicklatex.com-4819a3c333fc44a343764b1342001d8f_l3.png" class="ql-img-inline-formula" alt="&#123;&#92;&#98;&#97;&#114;&#123;&#110;&#125;&#125;&#39;" title="Rendered by QuickLaTeX.com" style="vertical-align: 0px;"/> represents the perturbed normal, <img src="http://www.rorydriscoll.com/wp-content/ql-cache/quicklatex.com-c7e9798e3ddea5e522bf0c1f7f53e13c_l3.png" class="ql-img-inline-formula" alt="&#92;&#98;&#97;&#114;&#123;&#110;&#125;" title="Rendered by QuickLaTeX.com" style="vertical-align: 0px;"/> is the underlying surface normal, and <img src="http://www.rorydriscoll.com/wp-content/ql-cache/quicklatex.com-3ddb240a7d1306cc2890a8e18e325135_l3.png" class="ql-img-inline-formula" alt="&#92;&#110;&#97;&#98;&#108;&#97;&#95;&#115;&#92;&#98;&#101;&#116;&#97;" title="Rendered by QuickLaTeX.com" style="vertical-align: -4px;"/> is the surface gradient. So basically, this says that the perturbed normal is the surface normal offset in the negative surface gradient direction.</p>
<p>So how do we calculate the surface gradient from the height field gradient? Well, there&#8217;s some fun math in there which I don&#8217;t want to repeat, but if you&#8217;re interested, I would recommend reading Blinn&#8217;s paper first, then Mikkelsen&#8217;s paper. You eventually arrive at:</p>
<p class="ql-center-displayed-equation" style="line-height: 43px;"><span class="ql-right-eqno"> &nbsp; </span><span class="ql-left-eqno"> &nbsp; </span><img src="http://www.rorydriscoll.com/wp-content/ql-cache/quicklatex.com-c91a24911f75d5b28639848dc2f26253_l3.png"class="ql-img-displayed-equation" alt="&#92;&#98;&#101;&#103;&#105;&#110;&#123;&#97;&#108;&#105;&#103;&#110;&#42;&#125; &#92;&#110;&#97;&#98;&#108;&#97;&#95;&#115;&#92;&#98;&#101;&#116;&#97;&#61;&#92;&#100;&#102;&#114;&#97;&#99;&#123;&#40;&#92;&#115;&#105;&#103;&#109;&#97;&#95;&#116;&#32;&#92;&#116;&#105;&#109;&#101;&#115;&#32;&#92;&#98;&#97;&#114;&#123;&#110;&#125;&#32;&#41;&#92;&#98;&#101;&#116;&#97;&#95;&#115;&#32;&#43;&#32;&#40;&#92;&#98;&#97;&#114;&#123;&#110;&#125;&#32;&#92;&#116;&#105;&#109;&#101;&#115;&#32;&#92;&#115;&#105;&#103;&#109;&#97;&#95;&#115;&#41;&#92;&#98;&#101;&#116;&#97;&#95;&#116;&#125;&#123;&#92;&#98;&#97;&#114;&#123;&#110;&#125;&#32;&#92;&#99;&#100;&#111;&#116;&#32;&#40;&#92;&#115;&#105;&#103;&#109;&#97;&#95;&#115;&#32;&#92;&#116;&#105;&#109;&#101;&#115;&#32;&#92;&#115;&#105;&#103;&#109;&#97;&#95;&#116;&#41;&#125; &#92;&#101;&#110;&#100;&#123;&#97;&#108;&#105;&#103;&#110;&#42;&#125;" title="Rendered by QuickLaTeX.com"/></p>
<p>In addition to the symbols defined previously, <img src="http://www.rorydriscoll.com/wp-content/ql-cache/quicklatex.com-7edd0cc0c18eef3f61257c1cc800b3c9_l3.png" class="ql-img-inline-formula" alt="&#92;&#115;&#105;&#103;&#109;&#97;&#95;&#115;" title="Rendered by QuickLaTeX.com" style="vertical-align: -3px;"/> and <img src="http://www.rorydriscoll.com/wp-content/ql-cache/quicklatex.com-3b8516ca4e18b5434a5638717a0c3f45_l3.png" class="ql-img-inline-formula" alt="&#92;&#115;&#105;&#103;&#109;&#97;&#95;&#116;" title="Rendered by QuickLaTeX.com" style="vertical-align: -3px;"/> are the partial derivatives of the surface position, and <img src="http://www.rorydriscoll.com/wp-content/ql-cache/quicklatex.com-11be5102f7ef033283dc776a65d7767d_l3.png" class="ql-img-inline-formula" alt="&#92;&#98;&#101;&#116;&#97;&#95;&#115;" title="Rendered by QuickLaTeX.com" style="vertical-align: -4px;"/> and <img src="http://www.rorydriscoll.com/wp-content/ql-cache/quicklatex.com-87c3e16c80862454881cd3a4832b53fb_l3.png" class="ql-img-inline-formula" alt="&#92;&#98;&#101;&#116;&#97;&#95;&#116;" title="Rendered by QuickLaTeX.com" style="vertical-align: -4px;"/> are the partial derivatives of the height field. The derivative directions <img src="http://www.rorydriscoll.com/wp-content/ql-cache/quicklatex.com-ae1901659f469e6be883797bfd30f4f8_l3.png" class="ql-img-inline-formula" alt="&#115;" title="Rendered by QuickLaTeX.com" style="vertical-align: 0px;"/> and <img src="http://www.rorydriscoll.com/wp-content/ql-cache/quicklatex.com-b4e3cbf5d4c5c6d9b702dd139f14c147_l3.png" class="ql-img-inline-formula" alt="&#116;" title="Rendered by QuickLaTeX.com" style="vertical-align: 0px;"/> are not explictly defined here.</p>
<p>It&#8217;s easiest to think of this as the projection of the 2D gradient onto a 3D surface along the normal. Intuitively, this says that the surface gradient direction is pushed out on orthogonal vectors to the s/n and t/n planes by however much the gradient specifies. The denominator term is there to scale up the result when the <img src="http://www.rorydriscoll.com/wp-content/ql-cache/quicklatex.com-ae1901659f469e6be883797bfd30f4f8_l3.png" class="ql-img-inline-formula" alt="&#115;" title="Rendered by QuickLaTeX.com" style="vertical-align: 0px;"/> and <img src="http://www.rorydriscoll.com/wp-content/ql-cache/quicklatex.com-b4e3cbf5d4c5c6d9b702dd139f14c147_l3.png" class="ql-img-inline-formula" alt="&#116;" title="Rendered by QuickLaTeX.com" style="vertical-align: 0px;"/> are not orthogonal, or are flipped.</p>
<h2>Implementation</h2>
<p>Implementing this technique is fairly straightforward once you realise the meaning of some of the variables. Since we&#8217;re free to choose the partial derivative directions <img src="http://www.rorydriscoll.com/wp-content/ql-cache/quicklatex.com-ae1901659f469e6be883797bfd30f4f8_l3.png" class="ql-img-inline-formula" alt="&#115;" title="Rendered by QuickLaTeX.com" style="vertical-align: 0px;"/> and <img src="http://www.rorydriscoll.com/wp-content/ql-cache/quicklatex.com-b4e3cbf5d4c5c6d9b702dd139f14c147_l3.png" class="ql-img-inline-formula" alt="&#116;" title="Rendered by QuickLaTeX.com" style="vertical-align: 0px;"/>, it&#8217;s convenient for the shader to use screen-space x and y. The value <img src="http://www.rorydriscoll.com/wp-content/ql-cache/quicklatex.com-1c9cc40f96a1492e298e7da85a2c1692_l3.png" class="ql-img-inline-formula" alt="&#92;&#115;&#105;&#103;&#109;&#97;" title="Rendered by QuickLaTeX.com" style="vertical-align: 0px;"/> is the position, and the value <img src="http://www.rorydriscoll.com/wp-content/ql-cache/quicklatex.com-b6a7605b1bcca8f1b416eaf733f34e08_l3.png" class="ql-img-inline-formula" alt="&#92;&#98;&#101;&#116;&#97;" title="Rendered by QuickLaTeX.com" style="vertical-align: -4px;"/> is the height field sample.</p>
<pre class="brush: cpp; title: ; notranslate">
// Project the surface gradient (dhdx, dhdy) onto the surface (n, dpdx, dpdy)
float3 CalculateSurfaceGradient(float3 n, float3 dpdx, float3 dpdy, float dhdx, float dhdy)
{
	float3 r1 = cross(dpdy, n);
	float3 r2 = cross(n, dpdx);

	return (r1 * dhdx - r2 * dhdy) / dot(dpdx, r1);
}

// Move the normal away from the surface normal in the opposite surface gradient direction
float3 PerturbNormal(float3 n, float3 dpdx, float3 dpdy, float dhdx, float dhdy)
{
	return normalize(normal - CalculateSurfaceGradient(normal, dpdx, dpdy, dhdx, dhdy));
}
</pre>
<p>So far, so good. Next we need to work out how to calculate the partial derivatives. The reason why we chose screen-space x and y to be our partial derivative directions is so that we can use the ddx and ddy shader instructions to generate the partial derivatives of both the position and the height.</p>
<p>Given a position and normal in the same coordinate-space, and a height map sample, calculating the final normal is straighforward:</p>
<pre class="brush: cpp; title: ; notranslate">
// Calculate the surface normal using screen-space partial derivatives of the height field
float3 CalculateSurfaceNormal(float3 position, float3 normal, float height)
{
	float3 dpdx = ddx(position);
	float3 dpdy = ddy(position);

	float dhdx = ddx(height);
	float dhdy = ddy(height);

	return PerturbNormal(normal, dpdx, dpdy, dhdx, dhdy);
}
</pre>
<p>Note that in shader model 5.0, you can use ddx_fine/ddy_fine instead of ddx/ddy to get high-precision partial derivatives.</p>
<p>So how does this look? At a medium distance, I would say that it looks pretty good:</p>
<p><a href="http://www.rorydriscoll.com/wp-content/uploads/2012/01/HeightMapFar.png"><img class="aligncenter size-full wp-image-700" title="HeightMapFar" src="http://www.rorydriscoll.com/wp-content/uploads/2012/01/HeightMapFar.png" alt="" /></a></p>
<p>But what about up close?</p>
<p><a href="http://www.rorydriscoll.com/wp-content/uploads/2012/01/HeightMapNear.png"><img class="aligncenter size-full wp-image-701" title="HeightMapNear" src="http://www.rorydriscoll.com/wp-content/uploads/2012/01/HeightMapNear.png" alt="" /></a></p>
<p>Uh oh! What’s happening here? Well, there are a couple of problems&#8230;</p>
<p>The main problem is that the height texture is using bilinear filtering, so the gradient between any two texels is constant. This causes large blocks to become very obvious when up close. There are a couple of options for alleviating this somewhat.</p>
<p>One option is to use bicubic filtering. I haven&#8217;t tried it, but I would expect this to make a good difference. The problem is that it will incur an extra cost. Another option, suggested in the paper, is to add a detail bump texture on top. This helps quite a lot, but again it adds more cost.</p>
<p>In the image below I&#8217;ve just tiled the same texture at 10x frequency over the top. It would be better to apply some kind of noise function as in the original paper. </p>
<p><a href="http://www.rorydriscoll.com/wp-content/uploads/2012/01/HeightMapWithDetailNear.png"><img class="aligncenter size-full wp-image-702" title="HeightMapWithDetailNear" src="http://www.rorydriscoll.com/wp-content/uploads/2012/01/HeightMapWithDetailNear.png" alt="" /></a></p>
<p>The second problem is more subtle. We&#8217;re getting some small block artifacts because of the way that the ddx and ddy shader instructions work. They take pairs of pixels in a pixel quad and subtract the relevant values to get the derivative. In the case of the height derivatives, we can alleviate this by performing the differencing ourselves with extra texture samples.</p>
<p>The first problem is pretty much a killer for me. I would rather not have to cover up a fundamental implementation issue with extra fudges and more cost.</p>
<h2>What Now?</h2>
<p>It&#8217;s unfortunate that this didn&#8217;t make it into the original paper, but Mikkelsen mentions in a <a href="http://mmikkelsen3d.blogspot.com/2011/07/derivative-maps.html">blog post</a> that you can increase the quality by using precomputed height derivatives. This method requires double the texture storage (or half the resolution) of the ddx/ddy method, but produces much better results.</p>
<p>You&#8217;re probably wondering how you can possibly precompute screen-space derivatives. We don&#8217;t actually have to. Instead we can use the chain rule to transform a partial derivative from one space to another. In our case we can transform our derivatives from uv-space to screen-space if we have the partial derivatives of the uvs in screen-space.</p>
<p>To calculate dhdx you need dhdu, dhdv, dudx and dvdx:</p>
<p class="ql-center-displayed-equation" style="line-height: 38px;"><span class="ql-right-eqno"> &nbsp; </span><span class="ql-left-eqno"> &nbsp; </span><img src="http://www.rorydriscoll.com/wp-content/ql-cache/quicklatex.com-976eff4caf3ee8bc616750362015dbbb_l3.png"class="ql-img-displayed-equation" alt="&#92;&#98;&#101;&#103;&#105;&#110;&#123;&#97;&#108;&#105;&#103;&#110;&#42;&#125; &#92;&#100;&#102;&#114;&#97;&#99;&#123;&#92;&#100;&#101;&#108;&#116;&#97;&#32;&#104;&#125;&#123;&#92;&#100;&#101;&#108;&#116;&#97;&#32;&#120;&#125;&#32;&#61;&#32;&#92;&#100;&#102;&#114;&#97;&#99;&#123;&#92;&#100;&#101;&#108;&#116;&#97;&#32;&#104;&#125;&#123;&#92;&#100;&#101;&#108;&#116;&#97;&#32;&#117;&#125;&#32;&#92;&#99;&#100;&#111;&#116;&#32;&#92;&#100;&#102;&#114;&#97;&#99;&#123;&#92;&#100;&#101;&#108;&#116;&#97;&#32;&#117;&#125;&#123;&#92;&#100;&#101;&#108;&#116;&#97;&#32;&#120;&#125;&#32;&#43;&#32;&#92;&#100;&#102;&#114;&#97;&#99;&#123;&#92;&#100;&#101;&#108;&#116;&#97;&#32;&#104;&#125;&#123;&#92;&#100;&#101;&#108;&#116;&#97;&#32;&#118;&#125;&#32;&#92;&#99;&#100;&#111;&#116;&#32;&#92;&#100;&#102;&#114;&#97;&#99;&#123;&#92;&#100;&#101;&#108;&#116;&#97;&#32;&#118;&#125;&#123;&#92;&#100;&#101;&#108;&#116;&#97;&#32;&#120;&#125; &#92;&#101;&#110;&#100;&#123;&#97;&#108;&#105;&#103;&#110;&#42;&#125;" title="Rendered by QuickLaTeX.com"/></p>
<p>To calculate dhdy you need dhdu, dhdv, dudy and dvdy:</p>
<p class="ql-center-displayed-equation" style="line-height: 42px;"><span class="ql-right-eqno"> &nbsp; </span><span class="ql-left-eqno"> &nbsp; </span><img src="http://www.rorydriscoll.com/wp-content/ql-cache/quicklatex.com-0bb139f70074e81bd9b3deccbfdf55b3_l3.png"class="ql-img-displayed-equation" alt="&#92;&#98;&#101;&#103;&#105;&#110;&#123;&#97;&#108;&#105;&#103;&#110;&#42;&#125; &#92;&#100;&#102;&#114;&#97;&#99;&#123;&#92;&#100;&#101;&#108;&#116;&#97;&#32;&#104;&#125;&#123;&#92;&#100;&#101;&#108;&#116;&#97;&#32;&#121;&#125;&#32;&#61;&#32;&#92;&#100;&#102;&#114;&#97;&#99;&#123;&#92;&#100;&#101;&#108;&#116;&#97;&#32;&#104;&#125;&#123;&#92;&#100;&#101;&#108;&#116;&#97;&#32;&#117;&#125;&#32;&#92;&#99;&#100;&#111;&#116;&#32;&#92;&#100;&#102;&#114;&#97;&#99;&#123;&#92;&#100;&#101;&#108;&#116;&#97;&#32;&#117;&#125;&#123;&#92;&#100;&#101;&#108;&#116;&#97;&#32;&#121;&#125;&#32;&#43;&#32;&#92;&#100;&#102;&#114;&#97;&#99;&#123;&#92;&#100;&#101;&#108;&#116;&#97;&#32;&#104;&#125;&#123;&#92;&#100;&#101;&#108;&#116;&#97;&#32;&#118;&#125;&#32;&#92;&#99;&#100;&#111;&#116;&#32;&#92;&#100;&#102;&#114;&#97;&#99;&#123;&#92;&#100;&#101;&#108;&#116;&#97;&#32;&#118;&#125;&#123;&#92;&#100;&#101;&#108;&#116;&#97;&#32;&#121;&#125; &#92;&#101;&#110;&#100;&#123;&#97;&#108;&#105;&#103;&#110;&#42;&#125;" title="Rendered by QuickLaTeX.com"/></p>
<p>The hlsl for this is very simple:</p>
<pre class="brush: cpp; title: ; notranslate">
float ApplyChainRule(float dhdu, float dhdv, float dud_, float dvd_)
{
	return dhdu * dud_ + dhdv * dvd_;
}
</pre>
<p>Assuming that we have a texture that stores the <em>texel-space</em> height derivatives, we can scale this up in the shader to uv-space by simply multiplying by the texture dimensions. We can then use the screen space uv derivatives and the chain rule to transform from dhdu/dhdv to dhdx/dhdy.</p>
<pre class="brush: cpp; title: ; notranslate">
// Calculate the surface normal using the uv-space gradient (dhdu, dhdv)
float3 CalculateSurfaceNormal(float3 position, float3 normal, float2 gradient)
{
	float3 dpdx = ddx(position);
	float3 dpdy = ddy(position);

	float dhdx = ApplyChainRule(gradient.x, gradient.y, ddx(uv.x), ddx(uv.y));
	float dhdy = ApplyChainRule(gradient.x, gradient.y, ddy(uv.x), ddy(uv.y));

	return PerturbNormal(normal, dpdx, dpdy, dhdx, dhdy);
}
</pre>
<p>So how does this look? Well, it&#8217;s pretty much the same at medium distance.</p>
<p><a href="http://www.rorydriscoll.com/wp-content/uploads/2012/01/DerivativeMapFar.png"><img class="aligncenter size-full wp-image-698" title="DerivativeMapFar" src="http://www.rorydriscoll.com/wp-content/uploads/2012/01/DerivativeMapFar.png" alt="" /></a></p>
<p>But it&#8217;s way better up close, since we&#8217;re now interpolating the derivatives.</p>
<p><a href="http://www.rorydriscoll.com/wp-content/uploads/2012/01/DerivativeMapNear.png"><img class="aligncenter size-full wp-image-699" title="DerivativeMapNear" src="http://www.rorydriscoll.com/wp-content/uploads/2012/01/DerivativeMapNear.png" alt="" /></a></p>
<h2>Conclusions</h2>
<p>In order to really draw any conclusions about this technique, I&#8217;m going to need to compare the quality, performance and memory consumption to that of normal mapping. That&#8217;s a whole other blog post waiting to happen&#8230;</p>
<p>But in theory, the pros are:</p>
<ul>
<li><b>Less mesh memory:</b> We don&#8217;t need to store a tangent vector, so this should translate into some pretty significant mesh memory savings.</li>
<li><b>Fewer interpolators:</b> We don&#8217;t need to pass the tangent vector from the vertex shader to the pixel shader, so this should be a performance gain.</li>
<li><b>Possible less texture memory:</b> At worst this method requires two channels in a texture. At best, a normal map takes up two channels.</li>
<li><del datetime="2012-01-15T20:50:01+00:00"><b>Easy scaling:</b> It&#8217;s easy to change the height scale on the fly by scaling the height derivatives. This isn&#8217;t quite so easy to get right when using normal maps. See <a href="http://www.j3l7h.de/talks/2008-02-18_Care_and_Feeding_of_Normal_Vectors.pdf">here</a>.</del> As Stephen Hill points out in the comments below, this is a pretty weak argument, so I&#8217;m removing it.</li>
</ul>
<p>And the cons are:</p>
<ul>
<li><b>More ALU:</b> It&#8217;s going to be interesting to see the actual numbers, but this is probably the only thing that could put the nail in the coffin for derivative maps. The extra cost for ALU might be compensated partially by the fewer interpolators, but we&#8217;ll have to see.</li>
<li><b>Less flexible:</b> A normal map can represent any derivative map, but the reverse is not true. I&#8217;m not sure that this is a significant problem in practice though.</li>
<li><b>Worse quality?</b> I&#8217;m not sure about this one, but it&#8217;ll be interesting to see if the quality holds up.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.rorydriscoll.com/2012/01/11/derivative-maps/feed/</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
		<item>
		<title>UI Anti-Aliasing</title>
		<link>http://www.rorydriscoll.com/2012/01/08/ui-anti-aliasing/</link>
		<comments>http://www.rorydriscoll.com/2012/01/08/ui-anti-aliasing/#comments</comments>
		<pubDate>Sun, 08 Jan 2012 22:30:08 +0000</pubDate>
		<dc:creator>Rory</dc:creator>
				<category><![CDATA[General]]></category>
		<category><![CDATA[Graphics]]></category>

		<guid isPermaLink="false">http://www.rorydriscoll.com/?p=576</guid>
		<description><![CDATA[I&#8217;ve been working on making a really simple IMGUI implementation for my engine at home. I like to do a little bit of research when I&#8217;m approaching something new to me like this, so I went hunting around for publicly available implementations. While doing this, I came across Mikko Mononen&#8217;s implementation in Recast. I was [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve been working on making a really simple IMGUI implementation for my engine at home. I like to do a little bit of research when I&#8217;m approaching something new to me like this, so I went hunting around for publicly available implementations. While doing this, I came across <a href="http://digestingduck.blogspot.com/">Mikko Mononen&#8217;s</a> implementation in <a href="http://code.google.com/p/recastnavigation/">Recast</a>.</p>
<p>I was impressed when I ran the demo with how smooth his UI looked. It turns out that he&#8217;s using a little trick (which I&#8217;d never seen before, but I&#8217;m sure is old to many) to smooth of the edges of his UI elements.</p>
<p>Basically, the trick is to create a ring of extra vertices by extruding the edges of the polygon out by a certain amount. These extra vertices take the same color as the originals, but their alpha is set to zero. Mikko calls this &#8216;feathering&#8217;.</p>
<p>In my case, I found that I got good results by feathering just one pixel. Here&#8217;s a quick before/after comparison of the my IMGUI check box at 800% zoom:</p>
<p><a href="http://www.rorydriscoll.com/wp-content/uploads/2012/01/CheckBoxNoAA.png"><img class="aligncenter size-full wp-image-581" title="CheckBoxNoAA" src="http://www.rorydriscoll.com/wp-content/uploads/2012/01/CheckBoxNoAA.png" alt="" width="376" height="160" /></a></p>
<p><a href="http://www.rorydriscoll.com/wp-content/uploads/2012/01/CheckBoxWithAA.png"><img class="aligncenter size-full wp-image-580" title="CheckBoxWithAA" src="http://www.rorydriscoll.com/wp-content/uploads/2012/01/CheckBoxWithAA.png" alt="" width="376" height="160" /></a></p>
<p>And here&#8217;s a 1-to-1 example showing rounded button corners:</p>
<p><a href="http://www.rorydriscoll.com/wp-content/uploads/2012/01/ButtonNoAA.png"><img class="aligncenter size-full wp-image-584" title="ButtonNoAA" src="http://www.rorydriscoll.com/wp-content/uploads/2012/01/ButtonNoAA.png" alt="" width="160" height="60" /></a><a href="http://www.rorydriscoll.com/wp-content/uploads/2012/01/ButtonWithAA.png"><img class="aligncenter size-full wp-image-583" title="ButtonWithAA" src="http://www.rorydriscoll.com/wp-content/uploads/2012/01/ButtonWithAA.png" alt="" width="160" height="60" /></a></p>
<p>It&#8217;s a pretty nice improvement for a very simple technique! If you&#8217;re interested in what the code looks like, then either take a look at <a href="http://code.google.com/p/recastnavigation/source/browse/trunk/RecastDemo/Source/imgui.cpp?r=213">Mikko&#8217;s IMGUI implementation</a>, or you can find the code I use to feather my convex polygons below. </p>
<p>My implementation is a little less efficient since I recalculate each edge normal twice, but I chose to keep it simple for readability.</p>
<div id="gist-1579850" class="gist">

        <div class="gist-file">
          <div class="gist-data gist-syntax">
              <div class="highlight"><pre><div class='line' id='LC1'><span class="kt">void</span> <span class="n">CalculateEdgeNormal</span><span class="p">(</span><span class="kt">float</span><span class="o">&amp;</span> <span class="n">nx</span><span class="p">,</span> <span class="kt">float</span><span class="o">&amp;</span> <span class="n">ny</span><span class="p">,</span> <span class="kt">float</span> <span class="n">x0</span><span class="p">,</span> <span class="kt">float</span> <span class="n">y0</span><span class="p">,</span> <span class="kt">float</span> <span class="n">x1</span><span class="p">,</span> <span class="kt">float</span> <span class="n">y1</span><span class="p">)</span></div><div class='line' id='LC2'><span class="p">{</span></div><div class='line' id='LC3'>	<span class="k">const</span> <span class="kt">float</span> <span class="n">x01</span> <span class="o">=</span> <span class="n">x1</span> <span class="o">-</span> <span class="n">x0</span><span class="p">;</span></div><div class='line' id='LC4'>	<span class="k">const</span> <span class="kt">float</span> <span class="n">y01</span> <span class="o">=</span> <span class="n">y1</span> <span class="o">-</span> <span class="n">y0</span><span class="p">;</span></div><div class='line' id='LC5'><br/></div><div class='line' id='LC6'>	<span class="k">const</span> <span class="kt">float</span> <span class="n">length</span> <span class="o">=</span> <span class="n">Sqrt</span><span class="p">(</span><span class="n">x01</span> <span class="o">*</span> <span class="n">x01</span> <span class="o">+</span> <span class="n">y01</span> <span class="o">*</span> <span class="n">y01</span><span class="p">);</span></div><div class='line' id='LC7'><br/></div><div class='line' id='LC8'>	<span class="k">const</span> <span class="kt">float</span> <span class="n">dx</span> <span class="o">=</span> <span class="n">x01</span> <span class="o">/</span> <span class="n">length</span><span class="p">;</span></div><div class='line' id='LC9'>	<span class="k">const</span> <span class="kt">float</span> <span class="n">dy</span> <span class="o">=</span> <span class="n">y01</span> <span class="o">/</span> <span class="n">length</span><span class="p">;</span></div><div class='line' id='LC10'><br/></div><div class='line' id='LC11'>	<span class="n">nx</span> <span class="o">=</span> <span class="n">dy</span><span class="p">;</span></div><div class='line' id='LC12'>	<span class="n">ny</span> <span class="o">=</span> <span class="o">-</span><span class="n">dx</span><span class="p">;</span></div><div class='line' id='LC13'><span class="p">}</span></div><div class='line' id='LC14'><br/></div><div class='line' id='LC15'><span class="kt">void</span> <span class="n">FeatherConvexPolygon</span><span class="p">(</span><span class="n">Primitives</span><span class="o">&amp;</span> <span class="n">primitives</span><span class="p">,</span> <span class="k">const</span> <span class="n">Vertex</span><span class="o">*</span> <span class="n">vertices</span><span class="p">,</span> <span class="kt">int</span> <span class="n">count</span><span class="p">,</span> <span class="kt">float</span> <span class="n">amount</span><span class="p">,</span> <span class="k">const</span> <span class="n">Texture</span><span class="o">*</span> <span class="n">texture</span><span class="p">)</span></div><div class='line' id='LC16'><span class="p">{</span></div><div class='line' id='LC17'>	<span class="n">Vertex</span><span class="o">*</span> <span class="n">extruded</span> <span class="o">=</span> <span class="n">Memory</span><span class="o">::</span><span class="n">Allocate</span><span class="o">&lt;</span><span class="n">Vertex</span><span class="o">&gt;</span><span class="p">(</span><span class="n">Memory</span><span class="o">::</span><span class="n">Temp</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">Vertex</span><span class="p">)</span> <span class="o">*</span> <span class="n">count</span><span class="p">);</span></div><div class='line' id='LC18'><br/></div><div class='line' id='LC19'>	<span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o">&lt;</span> <span class="n">count</span><span class="p">;</span> <span class="o">++</span><span class="n">i</span><span class="p">)</span></div><div class='line' id='LC20'>	<span class="p">{</span></div><div class='line' id='LC21'>		<span class="k">const</span> <span class="n">Vertex</span><span class="o">&amp;</span> <span class="n">previous</span> <span class="o">=</span> <span class="n">vertices</span><span class="p">[(</span><span class="n">i</span> <span class="o">+</span> <span class="n">count</span> <span class="o">-</span> <span class="mi">1</span><span class="p">)</span> <span class="o">%</span> <span class="n">count</span><span class="p">];</span></div><div class='line' id='LC22'>		<span class="k">const</span> <span class="n">Vertex</span><span class="o">&amp;</span> <span class="n">current</span> <span class="o">=</span> <span class="n">vertices</span><span class="p">[</span><span class="n">i</span><span class="p">];</span></div><div class='line' id='LC23'>		<span class="k">const</span> <span class="n">Vertex</span><span class="o">&amp;</span> <span class="n">next</span> <span class="o">=</span> <span class="n">vertices</span><span class="p">[(</span><span class="n">i</span> <span class="o">+</span> <span class="mi">1</span><span class="p">)</span> <span class="o">%</span> <span class="n">count</span><span class="p">];</span></div><div class='line' id='LC24'><br/></div><div class='line' id='LC25'>		<span class="kt">float</span> <span class="n">nx0</span><span class="p">,</span> <span class="n">ny0</span><span class="p">,</span> <span class="n">nx1</span><span class="p">,</span> <span class="n">ny1</span><span class="p">;</span></div><div class='line' id='LC26'><br/></div><div class='line' id='LC27'>		<span class="n">CalculateEdgeNormal</span><span class="p">(</span><span class="n">nx0</span><span class="p">,</span> <span class="n">ny0</span><span class="p">,</span> <span class="n">previous</span><span class="p">.</span><span class="n">x</span><span class="p">,</span> <span class="n">previous</span><span class="p">.</span><span class="n">y</span><span class="p">,</span> <span class="n">current</span><span class="p">.</span><span class="n">x</span><span class="p">,</span> <span class="n">current</span><span class="p">.</span><span class="n">y</span><span class="p">);</span></div><div class='line' id='LC28'>		<span class="n">CalculateEdgeNormal</span><span class="p">(</span><span class="n">nx1</span><span class="p">,</span> <span class="n">ny1</span><span class="p">,</span> <span class="n">current</span><span class="p">.</span><span class="n">x</span><span class="p">,</span> <span class="n">current</span><span class="p">.</span><span class="n">y</span><span class="p">,</span> <span class="n">next</span><span class="p">.</span><span class="n">x</span><span class="p">,</span> <span class="n">next</span><span class="p">.</span><span class="n">y</span><span class="p">);</span></div><div class='line' id='LC29'><br/></div><div class='line' id='LC30'>		<span class="kt">float</span> <span class="n">nx</span> <span class="o">=</span> <span class="p">(</span><span class="n">nx0</span> <span class="o">+</span> <span class="n">nx1</span><span class="p">)</span> <span class="o">*</span> <span class="mf">0.5f</span><span class="p">;</span></div><div class='line' id='LC31'>		<span class="kt">float</span> <span class="n">ny</span> <span class="o">=</span> <span class="p">(</span><span class="n">ny0</span> <span class="o">+</span> <span class="n">ny1</span><span class="p">)</span> <span class="o">*</span> <span class="mf">0.5f</span><span class="p">;</span></div><div class='line' id='LC32'><br/></div><div class='line' id='LC33'>		<span class="n">extruded</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">=</span> <span class="n">Vertex</span><span class="p">(</span><span class="n">current</span><span class="p">.</span><span class="n">x</span> <span class="o">+</span> <span class="n">nx</span> <span class="o">*</span> <span class="n">amount</span><span class="p">,</span> <span class="n">current</span><span class="p">.</span><span class="n">y</span> <span class="o">+</span> <span class="n">ny</span> <span class="o">*</span> <span class="n">amount</span><span class="p">,</span> <span class="n">Color</span><span class="p">(</span><span class="n">current</span><span class="p">.</span><span class="n">r</span><span class="p">,</span> <span class="n">current</span><span class="p">.</span><span class="n">g</span><span class="p">,</span> <span class="n">current</span><span class="p">.</span><span class="n">b</span><span class="p">,</span> <span class="mf">0.0f</span><span class="p">));</span></div><div class='line' id='LC34'>	<span class="p">}</span></div><div class='line' id='LC35'><br/></div><div class='line' id='LC36'>	<span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o">&lt;</span> <span class="n">count</span><span class="p">;</span> <span class="o">++</span><span class="n">i</span><span class="p">)</span></div><div class='line' id='LC37'>	<span class="p">{</span></div><div class='line' id='LC38'>		<span class="k">const</span> <span class="kt">int</span> <span class="n">j</span> <span class="o">=</span> <span class="p">(</span><span class="n">i</span> <span class="o">+</span> <span class="mi">1</span><span class="p">)</span> <span class="o">%</span> <span class="n">count</span><span class="p">;</span></div><div class='line' id='LC39'>		<span class="n">AddQuad</span><span class="p">(</span><span class="n">primitives</span><span class="p">,</span> <span class="n">vertices</span><span class="p">[</span><span class="n">i</span><span class="p">],</span> <span class="n">extruded</span><span class="p">[</span><span class="n">i</span><span class="p">],</span> <span class="n">extruded</span><span class="p">[</span><span class="n">j</span><span class="p">],</span> <span class="n">vertices</span><span class="p">[</span><span class="n">j</span><span class="p">],</span> <span class="n">texture</span><span class="p">);</span></div><div class='line' id='LC40'>	<span class="p">}</span></div><div class='line' id='LC41'><br/></div><div class='line' id='LC42'>	<span class="n">Memory</span><span class="o">::</span><span class="n">Free</span><span class="p">(</span><span class="n">extruded</span><span class="p">);</span></div><div class='line' id='LC43'><span class="p">}</span></div></pre></div>
          </div>

          <div class="gist-meta">
            <a href="https://gist.github.com/raw/1579850/4e77ec03800a74fa9f85ed24211cc2fad85d5d6d/FeatherUI.cpp" style="float:right;">view raw</a>
            <a href="https://gist.github.com/1579850#file_feather_ui.cpp" style="float:right;margin-right:10px;color:#666">FeatherUI.cpp</a>
            <a href="https://gist.github.com/1579850">This Gist</a> brought to you by <a href="http://github.com">GitHub</a>.
          </div>
        </div>
</div>

]]></content:encoded>
			<wfw:commentRss>http://www.rorydriscoll.com/2012/01/08/ui-anti-aliasing/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>What&#8217;s wrong with this picture?</title>
		<link>http://www.rorydriscoll.com/2010/01/30/whats-wrong-with-this-picture/</link>
		<comments>http://www.rorydriscoll.com/2010/01/30/whats-wrong-with-this-picture/#comments</comments>
		<pubDate>Sun, 31 Jan 2010 00:31:58 +0000</pubDate>
		<dc:creator>Rory</dc:creator>
				<category><![CDATA[Global Illumination]]></category>
		<category><![CDATA[Graphics]]></category>

		<guid isPermaLink="false">http://www.rorydriscoll.com/?p=393</guid>
		<description><![CDATA[Well, you could point out a number of things to answer that question. There&#8217;s some pretty obvious aliasing, a random pixel on the ground which should be in shadow but isn&#8217;t, it&#8217;s noisy, boring etc. But that&#8217;s not my point. The point is: It&#8217;s too dark! I know it&#8217;s too dark because I know how [...]]]></description>
			<content:encoded><![CDATA[<p><img class="aligncenter size-full wp-image-398" title="montecarlo256samples2bounces" src="http://www.rorydriscoll.com/wp-content/uploads/2010/01/montecarlo256samples2bounces.png" alt="montecarlo256samples2bounces" width="656" height="396"/></p>
<p>Well, you could point out a number of things to answer that question. There&#8217;s some pretty obvious aliasing, a random pixel on the ground which should be in shadow but isn&#8217;t, it&#8217;s noisy, boring etc. But that&#8217;s not my point. The point is: It&#8217;s too dark!</p>
<p>I know it&#8217;s too dark because I know how I rendered it, and I rendered it wrong. It still kind of looks acceptable (well to me at least) though. I&#8217;m not sure that I would say that it&#8217;s implausibly dark if I didn&#8217;t know it.</p>
<p><span id="more-393"></span></p>
<h2>How many bounces are enough?</h2>
<p>I rendered this image using a Monte Carlo estimator with two bounces of indirect light. Each bounce estimated the irradiance at the intersection point using 256 rays in a cosine-weighted stratified-sampling pattern. It&#8217;s the two bounce part that makes it wrong. Any light that bounced more than twice before heading toward the camera is totally ignored. Since this approach doesn&#8217;t converge on the correct solution to the rendering equation, it&#8217;s classified as &#8216;biased&#8217;.</p>
<p>How much does light that bounced more than twice really contribute to a scene? Of course that depends on the materials in the scene quite a bit, but in this case, there&#8217;s a noticeable difference. I rendered the same exact scene using the Monte Carlo estimator for the first bounce, but for the subsequent bounces I used a path tracer. By using Russian Roulette (a topic unto itself) to terminate the path, you can get an unbiased approximation of the irradiance.</p>
<p><img class="aligncenter size-full wp-image-399" title="pathtracer4096paths80percentsurvival" src="http://www.rorydriscoll.com/wp-content/uploads/2010/01/pathtracer4096paths80percentsurvival.png" alt="pathtracer4096paths80percentsurvival" width="656" height="396" /></p>
<p>Ok, great. It&#8217;s brighter, but is it actually correct? I was wondering about this, then I happened to come across an idea while reading <a href="http://www.amazon.com/Physically-Based-Rendering-Implementation-Interactive/dp/012553180X">Physically Based Rendering</a> to compare the results of my integrators with something that has an analytical solution to the rendering equation.</p>
<h2>I now present an analytical solution to the Rendering Equation!</h2>
<h5>(&#8230; in a very simple case)</h5>
<p>Solving the rendering equation analytically for most scenes is just impossible, that&#8217;s why we have to rely on numerical methods like Monte Carlo Estimation. However, the book suggests a <i>very</i> simple scene for which it can be solved. The suggested setup is that of light bouncing around the inside of a sphere. The sphere emits light internally, and reflects it diffusely to other points on the inside of the sphere. Since the sphere is rotationally invariant and the reflections are diffuse, every point on the sphere reflects and emits the same radiance in all directions.</p>
<p>So how do you solve the rendering equation for this situation? It&#8217;s fairly easy. Recall the rendering equation:</p>
<p><a href="http://www.codecogs.com/eqnedit.php?latex=L(\vec{x},\omega)=L_e@plus;\int_{\Omega}\rho(\vec{x},\omega,{\omega}')L_i(\vec{x},{\omega}')\cos\theta\,\delta{\omega}'" target="_blank"><img src="http://latex.codecogs.com/gif.latex?L(\vec{x},\omega)=L_e+\int_{\Omega}\rho(\vec{x},\omega,{\omega}')L_i(\vec{x},{\omega}')\cos\theta\,\delta{\omega}'" title="L(\vec{x},\omega)=L_e+\int_{\Omega}\rho(\vec{x},\omega,{\omega}')L_i(\vec{x},{\omega}')\cos\theta\,\delta{\omega}'" /></a></p>
<p>In this setup, we are using a diffuse BRDF with reflectance <i>d</i>. It&#8217;s also normalized to maintain energy conservation</p>
<p><a href="http://www.codecogs.com/eqnedit.php?latex=\rho(\vec{x},\omega,{\omega}')=\frac{d}{\pi}" target="_blank"><img src="http://latex.codecogs.com/gif.latex?\rho(\vec{x},\omega,{\omega}')=\frac{d}{\pi}" title="\rho(\vec{x},\omega,{\omega}')=\frac{d}{\pi}" /></a></p>
<p>Also, as mentioned previously, the outgoing radiance in all directions is the same as the incoming radiance, so this reduces the rendering equation to:</p>
<p><a href="http://www.codecogs.com/eqnedit.php?latex=L=L_e@plus;\frac{d}{\pi}\int_{\Omega}L\cos\theta\,\delta{\omega}'" target="_blank"><img src="http://latex.codecogs.com/gif.latex?L=L_e+\frac{dL}{\pi}\int_{\Omega}\cos\theta\,\delta{\omega}'" title="L=L_e+\frac{dL}{\pi}\int_{\Omega}\cos\theta\,\delta{\omega}'" /></a></p>
<p>Solving this equation for L is now pretty easy. First we have to convert from an integral over solid angles, to a double-angle version. The important thing to remember when doing this is to introduce the new <i>sine</i> term:</p>
<p><a href="http://www.codecogs.com/eqnedit.php?latex=L=L_e@plus;\frac{dL}{\pi}\int_{\phi=0}^{2\pi}\int_{\theta=0}^{\frac{\pi}{2}}\cos\theta\sin\theta\,\delta{\theta}\,\delta{\phi}" target="_blank"><img src="http://latex.codecogs.com/gif.latex?L=L_e+\frac{dL}{\pi}\int_{\phi=0}^{2\pi}\int_{\theta=0}^{\frac{\pi}{2}}\cos\theta\sin\theta\,\delta{\theta}\,\delta{\phi}" title="L=L_e+\frac{dL}{\pi}\int_{\phi=0}^{2\pi}\int_{\theta=0}^{\frac{\pi}{2}}\cos\theta\sin\theta\,\delta{\theta}\,\delta{\phi}" /></a></p>
<p>There&#8217;s a double angle trigonometric identity that makes this easier:</p>
<p><a href="http://www.codecogs.com/eqnedit.php?latex=\cos{x}sin{x}=\frac{\sin{2x}}{2}" target="_blank"><img src="http://latex.codecogs.com/gif.latex?\cos{x}sin{x}=\frac{\sin{2x}}{2}" title="\cos{x}sin{x}=\frac{\sin{2x}}{2}" /></a></p>
<p>So we just need to integrate the following:</p>
<p><a href="http://www.codecogs.com/eqnedit.php?latex=L=L_e@plus;\frac{dL}{2\pi}\int_{\phi=0}^{2\pi}\int_{\theta=0}^{\frac{\pi}{2}}\sin2\theta\,\delta{\theta}\,\delta{\phi}" target="_blank"><img src="http://latex.codecogs.com/gif.latex?L=L_e+\frac{dL}{2\pi}\int_{\phi=0}^{2\pi}\int_{\theta=0}^{\frac{\pi}{2}}\sin2\theta\,\delta{\theta}\,\delta{\phi}" title="L=L_e+\frac{dL}{2\pi}\int_{\phi=0}^{2\pi}\int_{\theta=0}^{\frac{\pi}{2}}\sin2\theta\,\delta{\theta}\,\delta{\phi}" /></a></p>
<p>Integrate over theta:</p>
<p><a href="http://www.codecogs.com/eqnedit.php?latex=L=L_e@plus;\frac{dL}{2\pi}\int_{\phi=0}^{2\pi}\left[\frac{-\cos2\theta}{2}\right]_{0}^{\frac{\pi}{2}}\,\delta{\phi}" target="_blank"><img src="http://latex.codecogs.com/gif.latex?L=L_e+\frac{dL}{2\pi}\int_{\phi=0}^{2\pi}\left[\frac{-\cos2\theta}{2}\right]_{0}^{\frac{\pi}{2}}\,\delta{\phi}" title="L=L_e+\frac{dL}{2\pi}\int_{\phi=0}^{2\pi}\left[\frac{-\cos2\theta}{2}\right]_{0}^{\frac{\pi}{2}}\,\delta{\phi}" /></a></p>
<p>This integral over theta is just 1, so now integrate over phi:</p>
<p><a href="http://www.codecogs.com/eqnedit.php?latex=L=L_e@plus;\frac{dL}{2\pi}\int_{\phi=0}^{2\pi}\,\delta{\phi}" target="_blank"><img src="http://latex.codecogs.com/gif.latex?L=L_e+\frac{dL}{2\pi}\int_{\phi=0}^{2\pi}\,\delta{\phi}" title="L=L_e+\frac{dL}{2\pi}\int_{\phi=0}^{2\pi}\,\delta{\phi}" /></a></p>
<p>This integral is of course just 2 &Pi;. So we&#8217;re left with a very simple equation:</p>
<p><a href="http://www.codecogs.com/eqnedit.php?latex=L=L_e@plus;dL" target="_blank"><img src="http://latex.codecogs.com/gif.latex?L=L_e+dL" title="L=L_e+dL" /></a></p>
<p>Solving for L gives us the final expect radiance at every point in the sphere:</p>
<p><a href="http://www.codecogs.com/eqnedit.php?latex=L=\frac{L_e}{1-d}" target="_blank"><img src="http://latex.codecogs.com/gif.latex?L=\frac{L_e}{1-d}" title="L=\frac{L_e}{1-d}" /></a></p>
<p>This intuitively makes sense. As d grows, so does L. When d is 1, all hell breaks loose, and when d is 0 all we are left with is the emitted light. Obviously d can never be greater than 1 or the energy conservation rule would have been broken.</p>
<h2>The test application</h2>
<p>Alright, so all I need to do now is make a test application that fires a bunch of rays around the inside of a sphere and compare the results to the analytical solution. Well&#8230; So I thought. Due to the curvature of the inside of the sphere, I found that a good number of rays I fired near the horizon were escaping the sphere.</p>
<p>Currently I apply an epsilon to the minimum ray intersection to try and prevent self-intersections and this causes problems. For now (and just for these tests), instead I&#8217;m pushing the ray starting point away from the intersection a small amount in the direction of the intersection normal. I made the sphere really big too. I&#8217;d welcome any better ideas for alleviating this problem.</p>
<p>For the tests below, I set the emitted light value to 1, and the diffuse reflectance, d, to 0.5, meaning that the outgoing radiance, L, should be 2.</p>
<h2>Multi-bounce Monte Carlo results</h2>
<p>You can work out from the radiance equation how different numbers of bounces of light will affect the final solution. In this case, I just ran my Monte Carlo integrator from 0 to 16 bounces and produced the following graph showing the percentage (of the expected result) absolute error.</p>
<p><img class="aligncenter" src="http://spreadsheets.google.com/oimg?key=0AliVyEtgVru8dHJDZ2w3V0FBcmRSb0t4TElJbHhXNXc&amp;oid=5&amp;v=1264822032867" alt="" /></p>
<p>You can see that the error starts off really high, but drops off pretty quickly as more bounces are added. Each subsequent bounce has less and less effect on the error reduction, as expected.</p>
<p>What you have to remember though, is that each bounce adds exponentially to the number of rays that have to be cast to achieve that error. In real-world situations we need to cast a lot of rays over the hemisphere in order to get a correct solution. To get under 5% error you&#8217;d need four bounces with this method. Even using a very modest number of rays per estimation quickly becomes unruly: 256 rays with 4 bounces = over 4 billion rays needed! Per pixel! Without anti-aliasing!</p>
<h2>Path Tracer results</h2>
<p>Ok, so how about the path tracer? In theory the path tracer should average out to zero error, but can have potentially very high variance. Here&#8217;s the average error over 1000 runs with different survival probabilities:</p>
<p><img src="http://spreadsheets.google.com/oimg?key=0AliVyEtgVru8dFIyWlMtWVdMTGl4RUJFZkd5blRhRXc&amp;oid=1&amp;v=1264822087193" alt="" /></p>
<p>Well, it&#8217;s not zero everywhere, but it&#8217;s a pretty close in most cases. I think that the results obtained from this kind of method probably depend quite a bit on the quality of the random number generator used. I&#8217;m just using the stdlib version, so I&#8217;ll try switching it up at some point to see how it affects things. I haven&#8217;t shown it here, but the variance at low survival probabilities is incredibly high. This surely accounts for how far off the results are under about 10% survival.</p>
<p>The good thing about the path tracer is that it doesn&#8217;t require exorbitant numbers of rays. Yes, the variance can be high, but it can be reduced by tracing more paths. The net result is that it produces low error images much quicker than using standard Monte Carlo integration.</p>
<h2>The End</h2>
<p>Well, the point of this post really was: How do you know your integrator is correct? Comparing it to something simple that can be calculated analytically is not the be-all and end-all, but it&#8217;s a good start. There are many other things that could still be wrong even if your integrator gets good results against this simple test, like how you&#8217;re calculating sample directions, your PDF etc.</p>
<p> For me, being able to compare my integrators to a known solution definitely helped me to make sure that my path tracer was getting correct results.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.rorydriscoll.com/2010/01/30/whats-wrong-with-this-picture/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Direct3D 11 Multithreading</title>
		<link>http://www.rorydriscoll.com/2009/04/21/direct3d-11-multithreading/</link>
		<comments>http://www.rorydriscoll.com/2009/04/21/direct3d-11-multithreading/#comments</comments>
		<pubDate>Wed, 22 Apr 2009 04:11:02 +0000</pubDate>
		<dc:creator>Rory</dc:creator>
				<category><![CDATA[C++]]></category>
		<category><![CDATA[Graphics]]></category>

		<guid isPermaLink="false">http://www.rorydriscoll.com/?p=310</guid>
		<description><![CDATA[I&#8217;ve been putting it off for a while, but with my recent trip to GDC and the arrival of the Direct3D 11 beta, I thought it was about time I switched my renderer to be multithreaded. One of the things I learned at a Direct3D 11 talk at GDC is that it works on &#8216;down-level [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve been putting it off for a while, but with my recent trip to GDC and the arrival of the Direct3D 11 beta, I thought it was about time I switched my renderer to be multithreaded. One of the things I learned at a <a href="http://developer.amd.com/gpu_assets/Your%20Game%20Needs%20Direct3D%2011,%20So%20Get%20Started%20Now.pps">Direct3D 11 talk</a> at GDC is that it works on &#8216;down-level hardware&#8217;, which means DirectX 9 &amp; 10 cards. Of course, you don&#8217;t get the snazzy new hardware features, but you do get some of the benefits of the new API, like multithreading and limited compute shaders (albeit not as fast as it will be on the real hardware).<br />
<span id="more-310"></span><br />
There has been some multithreading support in earlier DirectX versions for a while now by using the multithreaded flag when creating the device. Typically though, the pattern has been to run a dedicated rendering thread and submit objects to be rendered to that thread. This allows the device to stay in single threaded mode where it is faster.</p>
<p>Things have changed a lot with Direct3D 11. The rendering API has been separated from the factory functions into a separate object called the device context. The factory functions on the device are all free threaded, meaning that they can be called from any thread. The device context functions are designed to be called from the same thread.</p>
<p><img class="aligncenter size-full wp-image-331" title="directx-opengl5-w-157028-13" src="http://www.rorydriscoll.com/wp-content/uploads/2009/04/directx-opengl5-w-157028-13.png" alt="directx-opengl5-w-157028-13" width="450" height="338" /></p>
<p>The basic idea behind multithreading in Direct3D 11 is that you create an immediate device context on the main thread. Then, for each thread on which you&#8217;d like to be able to render, you create a deferred context. As you can probably guess from the names, commands executed on the immediate context get executed immediately, but those on the deferred context just get saved off into a command list. You then execute the deferred command lists on the main thread using the immediate device context. Sounds easy enough.</p>
<h2>Thread Pools</h2>
<p>Given that you can submit draw calls to deferred contexts on multiple threads, it makes sense to ditch the single rendering thread concept and switch to using something like a thread pool for issuing the draw calls. This scales far better than a dedicated rendering thread. It&#8217;s also pretty easy to set up a simple thread pool, and give each worker thread a deferred render context.</p>
<p>There are plenty of places on the internet to read about thread pools so I&#8217;m not going to get into it here, but one thing I can&#8217;t stress enough is to make sure that you get your synchronization right! In my initial implementation, I used my normal queue data structure, but wrapped it up in mutexes (mutices?) to make sure it was thread-safe. This worked out well since I was very confident that things were working correctly, but a quick foray into VTune told me that I was spending 40% of the time waiting on synchronization points!</p>
<p><img class="size-full wp-image-343 alignleft" title="queues" src="http://www.rorydriscoll.com/wp-content/uploads/2009/04/queues.jpg" alt="queues" width="119" height="181" /></p>
<p>After some quick digging around, I came across a few articles that Herb Sutter wrote for Dr Dobb&#8217;s Journal about producer/consumer queues. I implemented the low-lock queue recommended by Sutter, and got a good speedup of at least 30% (that number is off the top of my head, but I remember it was a lot). The relevant articles I read are <a href="http://www.ddj.com/architect/210604448">single producer/consumer queue</a>, <a href="http://www.ddj.com/architect/211601363">generalized concurrent queue</a>, and <a href="http://www.ddj.com/architect/212201163">measuring performance</a>. I still use events for sending the worker threads to sleep when there is nothing left to work on, and to wake them up when data is added to the queue.</p>
<p>My application already stores up all of the state needed for a draw call in an object called a RenderContext, so instead of passing off this render context to the renderer on the main thread, instead it just gets enqueued to be rendered by one of the threads in the thread pool. When the worker thread gets to it, it passes the render context off to a thread-local renderer object initialized with a deferred device context. This renderer sets all of the <em>changed</em> state and issues the final draw call.</p>
<p>Finally, back on the main thread, it waits until all of the render contexts have been submitted to the deferred device contexts, and then executes each of these on the immediate device context.</p>
<h2>Test Scenario</h2>
<p>In order to stress my renderer a bit, I fabricated a scenario with 10,000 models. Each model has a sphere and a ground plane with their own material. I use a loose octree for culling out the models outside of the frustum, but I don&#8217;t do any sorting of any kind. This means that the alternating materials that get rendered for the sphere and then the ground put a fair amount of stress on the CPU side of the renderer.</p>
<p>My single threaded renderer took about 50 ms to render the intial view of the scene. By switching to using the thread pool, this went down to about 30 ms. A nice improvement, that&#8217;s for sure. Obviously, as fewer objects are visible, the gains of using the multithreaded renderer disappeared.</p>
<h2>Profiling</h2>
<p>I was happy that the multithreading appeared to be doing its job, but I wasn&#8217;t quite satisfied because I couldn&#8217;t really tell <em>how</em> well it was doing. Time for some profiling!</p>
<p>There appear to be quite a few CPU profilers out there. First of all I downloaded an evaluation of Intel VTune. It&#8217;s pretty overwhelming, but it gave me a lot of pertinent information. The bugger is that you have to pay a hefty sum for it, so I tossed it out of the window. I also tried out <a href="http://msdn.microsoft.com/en-us/library/cc305187.aspx">Microsoft xperf</a>. This sampling profiler gave me a pretty good overview of what was expensive with the standard inclusive/exclusive view. It was a great help for quickly tracking down some areas of the code that I could very easily improve. I still use this.</p>
<p>The trouble with most of the sampling profilers is that they don&#8217;t know about frames. They just add up all of the samples over the given time period which gives you an idea on average what is happening. I wanted to get information about what was happening within the frame, so I implemented a really simple frame profiler.</p>
<h3>Frame Profiler</h3>
<p>An in-game profiler is a really handy tool to have. It lets you see in real-time exactly how your CPU time is being spent in one frame on each of your threads. It&#8217;s also pretty easy to set up.</p>
<p>First of all, I created a class called ThreadProfiler. As the name suggests, the ThreadProfiler class is responsible for recording events on a specific thread. This class has functions to notify it of the beginning and end of the frame, as well as when a profiling event begins and ends. All it really does is to record the name of the event, a color for display, and the timestamps when the event begins and ends. The events can be nested, so it maintains a stack of active events and records the depth of the stack for each event.</p>
<p>Next I created the singleton FrameProfiler class. The idea for this class is to hold all of the ThreadProfiler objects, and to forward events onto those classes based on the current thread ID. Threads are required to register their thread ID with the frame profiler in order for events to be recorded.</p>
<pre>
<pre class="brush: cpp; title: ; notranslate">
        class FrameProfiler : public Core::Singleton&lt;FrameProfiler&gt;
        {
        public:

            FrameProfiler();

            void RegisterThread(int threadId);

            void BeginFrame(bool enabled);
            void EndFrame();

            void BeginEvent(int threadId, const Core::String&amp; name, uint32 color);
            void EndEvent(int threadId);

            DataStructures::ArrayList&lt;ThreadProfiler&gt;&amp; GetThreadProfilers();
            const DataStructures::ArrayList&lt;ThreadProfiler&gt;&amp; GetThreadProfilers() const;

        private:

            DataStructures::ArrayList&lt;ThreadProfiler&gt; m_threadProfilers;
        };
</pre>
</pre>
<p>The final piece is a really simple macro which grabs the function name and creates an object which tells the FrameProfiler when it is created and destroyed. This is the macro that I place into whatever function or loop I&#8217;d like to profile.</p>
<pre>
<pre class="brush: cpp; title: ; notranslate">
class ScopedProfileEvent
{
public:

        ScopedProfileEvent(const Core::String&amp; name, uint32 color)
        {
                if (FrameProfiler::IsCreated())
                {
                        FrameProfiler::Instance().BeginEvent(Core::Platform::GetCurrentThreadId(), name, color);
                }
        }

        ~ScopedProfileEvent()
        {
                if (FrameProfiler::IsCreated())
                {
                        FrameProfiler::Instance().EndEvent(Core::Platform::GetCurrentThreadId());
                }
        }
};
#define PROFILE(X) const Profile::ScopedProfileEvent event__LINE__(String(__FUNCTION__), (uint32)X)
</pre>
</pre>
<p>Unlike a sampling profiler, this kind of profiling has a certain amount of processing overhead. There are a couple of quick things you can do to help with this though. The first is just to make sure that you don&#8217;t always have the overhead, and compile it out for your final builds. It&#8217;s important to do your profiling on an optimized build, so I would recommend debug, release, and final configurations or something similar. The second thing you can do is to just not run it every frame. I have it set on a key press so that I can get to the area I&#8217;d like to profile without the overhead, then hit the button to profile the next frame and display the results.</p>
<p>I&#8217;m not sure about how accurate this would be, but you could probably compare the previous frame&#8217;s duration to the profiled frame to get a rough estimate of the overhead that the profiling functions added. I wouldn&#8217;t rely on that though.</p>
<p>There&#8217;s actually quite a bit of information that can be gleaned from these profiling events, but the first thing I did was to render out the events as rectangles on a timeline. In the image below, I have two threads running. The main thread at the bottom has three levels of nested events being shown, and the top worker thread just has one.</p>
<p><img class="aligncenter size-large wp-image-313" title="oneworkerthread" src="http://www.rorydriscoll.com/wp-content/uploads/2009/04/oneworkerthread-1024x597.png" alt="oneworkerthread" width="700" height="394" /></p>
<p>Ok, there&#8217;s no legend right now, but I&#8217;m working on it. Each black/grey bar in the background represents one millisecond of frame time.</p>
<p>The bottom row on the main thread represents the update in green, the render in blue, and the call to Device::Present in red. Given the long red bar, I&#8217;d say I&#8217;m GPU limited in this scene.</p>
<p>The row above represents the breakdown of the render function from the bottom row. The cyan sliver is shadow rendering (actually I&#8217;m not rendering any shadows which is why it&#8217;s tiny). The huge magenta bar is the model rendering, and the yellow bar is post-processing.</p>
<p>The top row in the bottom thread represents the breakdown of the model rendering function. The green slivers are models being found in the octree and the red blocks are models being prepared for rendering. The large white bar is actually the command list from the worker thread being executed on the immediate device context. I was pretty surprised to see this segment so large, since I didn&#8217;t notice it in the other profilers at all.</p>
<h3>Experiments</h3>
<p>Now that I have a frame profiler, I can really experiment with my thread pool setup to see how it affects the frame. My computer has a dual core processor, so based on Sutter&#8217;s articles, I was expecting that one main thread and one worker would be the best setup. Even so, I tried running a variety of numbers of worker threads to see how it looked. Here&#8217;s what four threads looks like:</p>
<p><img class="aligncenter size-large wp-image-312" title="fourworkerthreads" src="http://www.rorydriscoll.com/wp-content/uploads/2009/04/fourworkerthreads-1024x597.png" alt="fourworkerthreads" width="700" height="394" /></p>
<p>The first thing I noticed was just how much worse all of the threads fared. Each worker thread appeared to perform a tiny bit of work, and then get swapped out for another thread. The main thread really suffered due to this too. This is a great example of how visualizing this data is really illuminating. The scene was already GPU bound, so even though the rendering code was performing far worse, the frame rate actually stayed the same.</p>
<p>Another experiment I wanted to run was just how much other applications could affect the frame rate of my application. In this case, I just had sysinternals process exlporer running and polling the system processes every half second. It only took me a few tries to hit a frame where I could see the effect:</p>
<p><img class="aligncenter size-large wp-image-314" title="stolen" src="http://www.rorydriscoll.com/wp-content/uploads/2009/04/stolen-1024x597.png" alt="stolen" width="700" height="394" /></p>
<p>Notice the scale of the millisecond bars now &#8211; this frame took over twice as long to run as my first example with the exact same setup. You can see a big gap on the worker thread where another process stole its time. Event when it did get some time, it appears to be running very slowly.</p>
<p>Also, you can now see a large grey bar in the middle row of the main thread which shows the main thread waiting for the worker thread to finish.</p>
<p>The execution of the command list is pretty consistently taking up three and half milliseconds or so. This is much higher than I had thought it would be. I really hope that this time gets reduced with newer drivers or hardware.</p>
<p>One last thing I&#8217;ve done to investigate what is happening in my application is to display the frame rate history. I use a moving average to calculate the frame time, so I have the last 100 frames stored anyway. It&#8217;s a simple enough task to just display this.</p>
<p><img class="aligncenter size-full wp-image-364" title="frametime" src="http://www.rorydriscoll.com/wp-content/uploads/2009/04/frametime.png" alt="frametime" width="656" height="396" /></p>
<p>You can see how varied the frame times are even though the camera isn&#8217;t moving. This is probably due to other processes on my computer interfering I&#8217;d imagine.</p>
<h2>Final Thoughts</h2>
<p>It was a fun adventure porting my code to Direct3D 11, particularly implementing a multithreaded renderer using a thread pool. I would recommend trying it out to those of you who have Direct3D 10 engines at the moment.</p>
<p>The jump from Direct3D 10 to 11 is nowhere near as bad as the previous jump from 9 to 10. It took me about three hours to change my rendering code to deal with the changes. The most awkward part was probably having to pass in the device context to functions which need to map buffers, since these functions are no longer on the buffers themselves.</p>
<p>Visualizing profiling data in real time can be a real eye-opener for understanding how your code is actually running rather than how you think it may be running. It has really helped me identify good candidates for moving to using the thread pool as well as pointing out areas of the code that are taking a surprisingly large amount of frame time.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.rorydriscoll.com/2009/04/21/direct3d-11-multithreading/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Energy Conservation In Games</title>
		<link>http://www.rorydriscoll.com/2009/01/25/energy-conservation-in-games/</link>
		<comments>http://www.rorydriscoll.com/2009/01/25/energy-conservation-in-games/#comments</comments>
		<pubDate>Mon, 26 Jan 2009 05:53:24 +0000</pubDate>
		<dc:creator>Rory</dc:creator>
				<category><![CDATA[Graphics]]></category>

		<guid isPermaLink="false">http://www.rorydriscoll.com/?p=262</guid>
		<description><![CDATA[Recently at work I was chatting with a colleague, and the topic of energy conservation for specular reflections came up. This reminded me that I&#8217;ve been sitting on a blog post for a while about just this subject, so I thought it was time to finish it. First of all, I&#8217;d like to start by [...]]]></description>
			<content:encoded><![CDATA[<p>Recently at work I was chatting with a colleague, and the topic of energy conservation for specular reflections came up. This reminded me that I&#8217;ve been sitting on a blog post for a while about just this subject, so I thought it was time to finish it.</p>
<p>First of all, I&#8217;d like to start by looking at the standard diffuse reflection model. In games, the typical formula for calculating diffuse reflection from a particular light is:</p>
<p><a href="http://www.codecogs.com/eqnedit.php?latex=L_o = C_d L_i (\vec{N} \cdot \vec{L})"><img title="L_o = C_d L_i (\vec{N} \cdot \vec{L})" src="http://latex.codecogs.com/gif.latex?L_o&amp;space;=&amp;space;C_d&amp;space;L_i&amp;space;(\vec{N}&amp;space;\cdot&amp;space;\vec{L})" alt="" border="0" /></a></p>
<p>Where Cd is the diffuse material color, Li is the light color, <em>N</em> is the normal, and <em>L</em> is the normalized direction to the light. What&#8217;s the problem with this? Well, it&#8217;s not energy conserving. In itself, this isn&#8217;t really a problem since we don&#8217;t calculate multiple bounces of light in games, so we&#8217;re not adding energy to the scene as light bounces around like would happen in a ray tracer. It&#8217;s a good starting point for discussion though.<br />
<span id="more-262"></span></p>
<h2>Energy Conservation</h2>
<p>As the name suggests, energy conservation is a restriction on the reflection model that requires that the total amount of reflected light cannot be more than the incoming light. It sounds sensible, but it&#8217;s often not practiced. A more formal way of stating this restriction is:</p>
<p><a href="http://www.codecogs.com/eqnedit.php?latex=\int_{\Omega} \rho(\vec{x}, \phi, \theta) L_i \cos \theta \,\delta\omega\leq L_i"><img title="\int_{\Omega} \rho(\vec{x}, \phi, \theta) L_i \cos \theta \,\delta\omega\leq L_i" src="http://latex.codecogs.com/gif.latex?\int_{\Omega}&amp;space;\rho(\vec{x},&amp;space;\phi,&amp;space;\theta)&amp;space;L_i&amp;space;\cos&amp;space;\theta&amp;space;\,\delta\omega\leq&amp;space;L_i" alt="" border="0" /></a></p>
<p>The function ρ represents the bidirectional reflection distribution function (BRDF), and could be anything from a simple Lambertian diffuse model, to a complicated microfacet model. Either way, the energy conservation restriction still stands.</p>
<h3>Diffuse Energy Conservation</h3>
<p>I&#8217;m going to show why the diffuse lighting equation above isn&#8217;t energy conserving, and how to make it so. Let&#8217;s start by replacing the BRDF with the constant diffuse color:</p>
<p><a href="http://www.codecogs.com/eqnedit.php?latex=\int_{\Omega} C_d L_i \cos \theta \, \delta\omega \leq L_i"><img title="\int_{\Omega} C_d L_i \cos \theta \, \delta\omega \leq L_i" src="http://latex.codecogs.com/gif.latex?\int_{\Omega}&amp;space;C_d&amp;space;L_i&amp;space;\cos&amp;space;\theta&amp;space;\,&amp;space;\delta\omega&amp;space;\leq&amp;space;L_i" alt="" border="0" /></a></p>
<p>The incoming light direction is fixed here, and we are integrating over outgoing directions. Because of this, both Cd and Li are constant over the integral, and can be pulled outside. Also, the incoming light appears on both sides of the inequality, so we can divide by Li leaving:</p>
<p><a href="http://www.codecogs.com/eqnedit.php?latex=C_d \int_{\Omega} \cos \theta \, \delta\omega \leq 1"><img title="C_d \int_{\Omega} \cos \theta \, \delta\omega \leq 1" src="http://latex.codecogs.com/gif.latex?C_d&amp;space;\int_{\Omega}&amp;space;\cos&amp;space;\theta&amp;space;\,&amp;space;\delta\omega&amp;space;\leq&amp;space;1" alt="" border="0" /></a></p>
<p>This integral can be solved analytically. First of all, we need to rewrite it as a double integral of the two polar coordinates φ and θ:</p>
<p><a href="http://www.codecogs.com/eqnedit.php?latex=C_d \int_{\phi = 0}^{2\pi}{\int_{\theta = 0}^{\frac{\pi}{2}}{\cos \theta \sin \theta\, \delta \theta}\, \delta \phi} \leq 1"><img title="C_d \int_{\phi = 0}^{2\pi}{\int_{\theta = 0}^{\frac{\pi}{2}}{\cos \theta \sin \theta\, \delta \theta}\, \delta \phi} \leq 1" src="http://latex.codecogs.com/gif.latex?C_d&amp;space;\int_{\phi&amp;space;=&amp;space;0}^{2\pi}{\int_{\theta&amp;space;=&amp;space;0}^{\frac{\pi}{2}}{\cos&amp;space;\theta&amp;space;\sin&amp;space;\theta\,&amp;space;\delta&amp;space;\theta}\,&amp;space;\delta&amp;space;\phi}&amp;space;\leq&amp;space;1" alt="" border="0" /></a></p>
<p>The extra sin θ may seem a little bit confusing at first, but it&#8217;s necessary to take into account the smaller area towards the polar region. Using the double angle <a href="http://en.wikipedia.org/wiki/List_of_trigonometric_identities">trigonometric identity</a>, this is the same as:</p>
<p><a href="http://www.codecogs.com/eqnedit.php?latex=\frac{1}{2}C_d \int_{\phi = 0}^{2\pi}{\int_{\theta = 0}^{\frac{\pi}{2}}{\sin 2 \theta \, \delta \theta}\, \delta \phi} \leq 1"><img title="\frac{1}{2}C_d \int_{\phi = 0}^{2\pi}{\int_{\theta = 0}^{\frac{\pi}{2}}{\sin 2 \theta \, \delta \theta}\, \delta \phi} \leq 1" src="http://latex.codecogs.com/gif.latex?\frac{1}{2}C_d&amp;space;\int_{\phi&amp;space;=&amp;space;0}^{2\pi}{\int_{\theta&amp;space;=&amp;space;0}^{\frac{\pi}{2}}{\sin&amp;space;2&amp;space;\theta&amp;space;\,&amp;space;\delta&amp;space;\theta}\,&amp;space;\delta&amp;space;\phi}&amp;space;\leq&amp;space;1" alt="" border="0" /></a></p>
<p>Now we can start using some <a href="http://en.wikipedia.org/wiki/List_of_integrals_of_trigonometric_functions">trigonometric integrals</a> to integrate, firstly over θ:</p>
<p><a href="http://www.codecogs.com/eqnedit.php?latex=\frac{1}{2}C_d \int_{\phi = 0}^{2\pi}{\left[ -\frac{1}{2} \cos 2 \theta \right]_{\theta=0}^{\frac{\pi}{2}} \, \delta \phi} \leq 1"><img title="\frac{1}{2}C_d \int_{\phi = 0}^{2\pi}{\left[ -\frac{1}{2} \cos 2 \theta \right]_{\theta=0}^{\frac{\pi}{2}} \, \delta \phi} \leq 1" src="http://latex.codecogs.com/gif.latex?\frac{1}{2}C_d&amp;space;\int_{\phi&amp;space;=&amp;space;0}^{2\pi}{\left[&amp;space;-\frac{1}{2}&amp;space;\cos&amp;space;2&amp;space;\theta&amp;space;\right]_{\theta=0}^{\frac{\pi}{2}}&amp;space;\,&amp;space;\delta&amp;space;\phi}&amp;space;\leq&amp;space;1" alt="" border="0" /></a></p>
<p>This integral completely disappears down to nothing:</p>
<p><a href="http://www.codecogs.com/eqnedit.php?latex=\frac{1}{2}C_d \int_{\phi = 0}^{2\pi}{1 \, \delta \phi} \leq 1"><img title="\frac{1}{2}C_d \int_{\phi = 0}^{2\pi}{1 \, \delta \phi} \leq 1" src="http://latex.codecogs.com/gif.latex?\frac{1}{2}C_d&amp;space;\int_{\phi&amp;space;=&amp;space;0}^{2\pi}{1&amp;space;\,&amp;space;\delta&amp;space;\phi}&amp;space;\leq&amp;space;1" alt="" border="0" /></a></p>
<p>So now, integrate over φ:</p>
<p><a href="http://www.codecogs.com/eqnedit.php?latex=\frac{1}{2}C_d \left[ \phi \right]_{\phi = 0}^{2\pi} \leq 1"><img title="\frac{1}{2}C_d \left[ \phi \right]_{\phi = 0}^{2\pi} \leq 1" src="http://latex.codecogs.com/gif.latex?\frac{1}{2}C_d&amp;space;\left[&amp;space;\phi&amp;space;\right]_{\phi&amp;space;=&amp;space;0}^{2\pi}&amp;space;\leq&amp;space;1" alt="" border="0" /></a></p>
<p>This gives us the final inequality:</p>
<p><a href="http://www.codecogs.com/eqnedit.php?latex=\pi C_d \leq 1"><img title="\pi C_d \leq 1" src="http://latex.codecogs.com/gif.latex?\pi&amp;space;C_d&amp;space;\leq&amp;space;1" alt="" border="0" /></a></p>
<p>Assuming that we want to keep our diffuse material color in the range [0,1], all this says is that we need to divide it by π in order to remain energy conserving. Since this is just a constant scale, it might not be worth doing in a game if the only lighting model it uses is diffuse. Most games use a more sophisticated reflection model though, at least including specular reflections.</p>
<h3>Specular Energy Conservation</h3>
<p>The standard Blinn-Phong specular model is also not energy conserving. In fact, in some ways it is even worse than the diffuse model, because as you increase the specular power, you lose more and more energy. A manifestation of this problem can be that artists find it hard to get a really tight specular highlight.</p>
<p>Here&#8217;s an example of a sphere rendered at three different specular powers. Notice how much light there appears to be on the left image, and how it seems to have disappeared on the right, even though it&#8217;s supposed to be more focused. It&#8217;s at this point that artists start to ask to ramp the specular reflection color over one to compensate. This isn&#8217;t a good idea!</p>
<p style="text-align: center;"><img class="aligncenter size-full wp-image-268" title="nonenergyconserving" src="http://www.rorydriscoll.com/wp-content/uploads/2009/01/nonenergyconserving.png" alt="nonenergyconserving" width="610" height="205" /></p>
<p>Instead of boosting the specular reflection color, you can switch to an energy conserving specular model. If you do this, the same spheres with the same specular powers now look like this:</p>
<p style="text-align: center;"><img class="aligncenter size-full wp-image-269" title="energyconserving" src="http://www.rorydriscoll.com/wp-content/uploads/2009/01/energyconserving.png" alt="energyconserving" width="610" height="205" /></p>
<p>The specular reflection on sphere on the left is actually dimmer than the non-conserving model. This is because the non-conserving specular model reflected too much light in this case. As the specular power increases, this time we compensate for the energy loss, and you get a really nice and tight hotspot on the right.</p>
<p>The typical Blinn-Phong reflection model is:</p>
<p><a href="http://www.codecogs.com/eqnedit.php?latex=L_o = C_s L_i (\vec{N} \cdot \vec{H})^n"><img title="L_o = C_s L_i (\vec{N} \cdot \vec{H})^n" src="http://latex.codecogs.com/gif.latex?L_o&amp;space;=&amp;space;C_s&amp;space;L_i&amp;space;(\vec{N}&amp;space;\cdot&amp;space;\vec{H})^n" alt="" border="0" /></a></p>
<p>Somebody who knows more than me has worked out that to ensure energy conservation, the normalization factor is:</p>
<p><a href="http://www.codecogs.com/eqnedit.php?latex=\frac{n + 2}{2\pi}"><img title="\frac{n + 2}{2\pi}" src="http://latex.codecogs.com/gif.latex?\frac{n&amp;space;+&amp;space;2}{2\pi}" alt="" border="0" /></a></p>
<p>[Edit: Thanks to Fabian "ryg" Giesen for showing that this is actually the normalization factor for the regular Phong specular model, not Blinn-Phong. A commonly used normalization factor (according to Real-Time Rendering) for Blinn-Phong is shown below.]</p>
<p><a href="http://www.codecogs.com/eqnedit.php?latex=\frac{n + 8}{8\pi}"><img title="\frac{n + 8}{8\pi}" src="http://latex.codecogs.com/gif.latex?\frac{n&amp;space;+&amp;space;8}{8\pi}" alt="" border="0" /></a></p>
<p>So this makes the energy conserving function for Blinn-Phong specular reflection:</p>
<p><a href="http://www.codecogs.com/eqnedit.php?latex=L_o = C_s L_i \frac{n + 8}{8\pi} (\vec{N} \cdot \vec{H})^n"><img title="L_o = C_s L_i \frac{n + 8}{8\pi} (\vec{N} \cdot \vec{H})^n" src="http://latex.codecogs.com/gif.latex?L_o&amp;space;=&amp;space;C_s&amp;space;L_i&amp;space;\frac{n&amp;space;+&amp;space;8}{8\pi}&amp;space;(\vec{N}&amp;space;\cdot&amp;space;\vec{H})^n" alt="" border="0" /></a></p>
<p>I&#8217;d love to know how this was derived, but I haven&#8217;t found anything so far that explains it. The integral for raising a cosine function to a power gets pretty hairy very quickly. All I know is that it appears to work. If anyone reading this knows why, then please let me know!</p>
<p>[Edit: Fabian "ryg" Giesen very kindly posted up a derivation of the normalization factor for both the regular Phong, and Blinn-Phong specular reflection models <a href="http://www.farbrausch.de/~fg/articles/phong.pdf">here</a>. Interestingly he doesn't come up with the exact same answer for the Blinn-Phong normalization factor as shown in Real-Time Rendering. Check out the comments below to find out more about this.]</p>
<h3>Combined Diffuse and Specular</h3>
<p>Once you have energy-conserving models for diffuse and specular,  it&#8217;s easy to make sure that the combined model is also energy conserving. You just need to make sure that your diffuse and specular material colors don&#8217;t sum to more than one:</p>
<p><a href="http://www.codecogs.com/eqnedit.php?latex=C_d + C_s \leq 1"><img title="C_d + C_s \leq 1" src="http://latex.codecogs.com/gif.latex?C_d&amp;space;+&amp;space;C_s&amp;space;\leq&amp;space;1" alt="" border="0" /></a></p>
<p>This means that if you want your material to have more specular, you may have to reduce the diffuse. Here&#8217;s a range of variations of the same material, going from 100% diffuse to 100% specular:</p>
<p style="text-align: center;"><img class="aligncenter size-full wp-image-277" title="diffusetospecular" src="http://www.rorydriscoll.com/wp-content/uploads/2009/01/diffusetospecular.png" alt="diffusetospecular" width="614" height="154" /></p>
<h2>Is It Worth It For Games?</h2>
<p>Clearly, the specular model benefits significantly from being energy conserving so I think most people would say that it&#8217;s worth it. Switching to a model where the diffuse and specular have to compete for energy might not be though, since it&#8217;s can be harder to tweak. I personally use this kind of model for my projects at home, but that&#8217;s because I don&#8217;t have artists to please.</p>
<p>One thing I do like about an energy conserving reflection model is that it enforces some kind of reasonable limits to the material reflections. This might help to make materials created by different people sit better together.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.rorydriscoll.com/2009/01/25/energy-conservation-in-games/feed/</wfw:commentRss>
		<slash:comments>34</slash:comments>
		</item>
		<item>
		<title>Irradiance Caching: Part 2</title>
		<link>http://www.rorydriscoll.com/2009/01/24/irradiance-caching-part-2/</link>
		<comments>http://www.rorydriscoll.com/2009/01/24/irradiance-caching-part-2/#comments</comments>
		<pubDate>Sat, 24 Jan 2009 22:45:17 +0000</pubDate>
		<dc:creator>Rory</dc:creator>
				<category><![CDATA[Global Illumination]]></category>
		<category><![CDATA[Graphics]]></category>
		<category><![CDATA[Irradiance Caching]]></category>

		<guid isPermaLink="false">http://www.rorydriscoll.com/?p=200</guid>
		<description><![CDATA[In my previous post, I wrote very briefly about an  important improvement to the irradiance caching algorithm &#8211; irradiance gradients &#8211; and I&#8217;m going to expand on rotational gradients this time. Gradients The gradient of a function represents both the direction and rate of change of that function as the inputs vary. For a one [...]]]></description>
			<content:encoded><![CDATA[<p>In my previous post, I wrote very briefly about an  important improvement to the irradiance caching algorithm &#8211; irradiance gradients &#8211; and I&#8217;m going to expand on rotational gradients this time.</p>
<h2>Gradients</h2>
<p>The gradient of a function represents both the direction and rate of change of that function as the inputs vary. For a one dimensional function this is simply the derivative of the function. As you move into higher dimensions, you need to consider which coordinate system the inputs for the function are specified in, as this will change how you need to calculate the gradient.</p>
<p>For now, I&#8217;m just going to focus on calculating the gradient of a function defined using normalized spherical coordinates. Unfortunately, there&#8217;s no real standard way to define spherical coordinates, and despite similar looking symbols, the values are often interchanged. I&#8217;m going to define the spherical coordinates on the unit sphere as azimuthal value φ [0, π), and polar value θ [0, 2π).</p>
<p style="text-align: center;"><img class="size-full wp-image-238 aligncenter" title="sphericalcoordinates" src="http://www.rorydriscoll.com/wp-content/uploads/2009/01/sphericalcoordinates.png" alt="sphericalcoordinates" width="290" height="264" /></p>
<p><span id="more-200"></span>When dealing with multiple dimensions, you can calculate the gradient by splitting the gradient calculation into multiple partial derivatives and summing them with appropriate vector weights. For normalized spherical coordinates the gradient is:</p>
<p><img src="http://latex.codecogs.com/gif.latex?\nabla&amp;space;f(\phi,&amp;space;\theta)&amp;space;=&amp;space;\frac{\delta&amp;space;f}{\delta&amp;space;\theta}&amp;space;\vec{v_\theta}&amp;space;+&amp;space;\frac{1}{\sin&amp;space;\theta}\frac{\delta&amp;space;f}{\delta&amp;space;\phi}\vec{v_\phi}" alt="" /></p>
<p>The function <em>f</em> represents a scalar field, so the gradient is a vector field. Each vector points in the direction of greatest increase. Much like the integrals of functions described using spherical coordinates, you have to take care to weight the azimuthal contribution by the sine of the polar angle.</p>
<p>As a real-world example of using gradients, let&#8217;s calculate the gradient for a simple function:</p>
<p><img src="http://latex.codecogs.com/gif.latex?f(\phi,&amp;space;\theta)&amp;space;=&amp;space;\cos&amp;space;\theta" border="0" alt="" /></p>
<p>The derivatives of the function for each argument are easy to calculate:</p>
<p><img src="http://latex.codecogs.com/gif.latex?\frac{\delta&amp;space;f}{\delta&amp;space;\theta}&amp;space;=&amp;space;-\sin&amp;space;\theta" border="0" alt="" /></p>
<p><img src="http://latex.codecogs.com/gif.latex?\frac{\delta&amp;space;f}{\delta&amp;space;\phi}&amp;space;=&amp;space;0" border="0" alt="" /></p>
<p>Combining these together with the previous definition, you can calculate the gradient vector at any point on the unit sphere for <em>f </em>using:</p>
<p><img src="http://latex.codecogs.com/gif.latex?\nabla&amp;space;f(\phi,&amp;space;\theta)&amp;space;=&amp;space;-\vec{v_\theta}&amp;space;\sin&amp;space;\theta" border="0" alt="" /></p>
<p>There&#8217;s lots more to be learned about gradients, and a good start would be <a href="http://en.wikipedia.org/wiki/Gradient">Wikipedia</a>, and also <a href="http://www-math.mit.edu/18.013A/HTML/chapter09/section04.html">this page</a> on the MIT website.</p>
<h2>Rotational Irradiance Gradient</h2>
<p>The irradiance contribution from a direction on the hemisphere about the surface normal, specified using spherical coordinates φ [0, 2π) and θ [0, π / 2) is:</p>
<p><img src="http://latex.codecogs.com/gif.latex?f&amp;space;(\phi,&amp;space;\theta)&amp;space;=&amp;space;L(\phi,&amp;space;\theta)&amp;space;\cos&amp;space;\theta" alt="" /></p>
<p>Where L is the incident radiance in the supplied direction. So to calculate the gradient vector at any point on the hemisphere, you just need to evaluate:</p>
<p><img src="http://latex.codecogs.com/gif.latex?\nabla&amp;space;f(\phi,&amp;space;\theta)&amp;space;=&amp;space;-\vec{v_\theta}L(\phi,&amp;space;\theta)&amp;space;\sin&amp;space;\theta" alt="" /></p>
<p>This just calculates the gradient in a specific direction, but for the irradiance gradient we need to calculate the average gradient over the entire hemisphere. We can do this at the same time as we calculate the irradiance by using a similar Monte Carlo estimator. We want to share the sampling strategy between the irradiance calculation and the rotational gradient calculation, so we&#8217;re stuck using the same pdf:</p>
<p><img src="http://latex.codecogs.com/gif.latex?p(x)&amp;space;=&amp;space;\frac&amp;space;{\cos&amp;space;\theta}{\pi}" alt="" /></p>
<p>So the estimator for the irradiance gradient becomes:</p>
<p><img src="http://latex.codecogs.com/gif.latex?\nabla&amp;space;E&amp;space;\approx&amp;space;\frac&amp;space;{1}{N}&amp;space;\sum_{i=1}^{N}{&amp;space;\frac&amp;space;{-\vec{v_\theta}L_i\sin&amp;space;\theta}{\frac{\cos&amp;space;\theta}{\pi}}&amp;space;}" alt="" /></p>
<p>Which collapses down to:</p>
<p><img src="http://latex.codecogs.com/gif.latex?\nabla&amp;space;E&amp;space;\approx&amp;space;\frac&amp;space;{\pi}{N}&amp;space;\sum_{i=1}^{N}{-\vec{v_\theta}L_i\tan&amp;space;\theta}" alt="" /></p>
<p>The vector <em>v</em> is a unit vector on the plane of the hemisphere pointing in the perpendicular direction to the angle φ.  There are two perpendicular vectors to φ, and which one you decide to use depends on the order you do the cross product on when evaluating the gradient. Using the left hand rule for rotation, I&#8217;m doing a clockwise rotation of φ by ninety degrees.</p>
<h2>Using the Rotational Irradiance Gradient</h2>
<p>The irradiance estimate at a point is defined by a weighted sum of irradiance cache entries:</p>
<p><img src="http://latex.codecogs.com/gif.latex?E&amp;space;=&amp;space;\frac{\sum_{1}^{N}{w_i&amp;space;E_i}}{\sum_{1}^{N}{w_i}}" alt="" /></p>
<p>Now we have the rotational gradient, we can use it to improve the estimate. The cross product of the surface normal and cache entry normal represents both a direction and magnitude of rotational difference. We then project this difference onto the irradiance gradient to calculate how much the irradiance is changing in that direction:</p>
<p><img src="http://latex.codecogs.com/gif.latex?E&amp;space;=&amp;space;\frac{\sum_{1}^{N}{w_i&amp;space;(E_i&amp;space;+&amp;space;\nabla&amp;space;E_i&amp;space;\cdot&amp;space;(\vec{n_i}&amp;space;\times&amp;space;\vec{n}))}}{\sum_{1}^{N}{w_i}}" alt="" /></p>
<p>Note that conceptually this calculation needs to be performed once for each color channel, since the gradient of the irradiance is really the gradient of three scalar fields &#8211; red, green and blue. In practice it&#8217;s easier to assume that the gradient is a three dimensional vector of colors, rather than scalars.</p>
<h2>Implementation</h2>
<p>The implementation is pretty straightforward, but there are a couple of optimizations that can be made because we are using a unit hemisphere for sampling. Assuming that we have the sample direction in Cartesian coordinates in local space, where the z-axis points in the polar direction, we can get the cos weighting from the z coordinate:</p>
<p><img title="z = \cos \theta" src="http://latex.codecogs.com/gif.latex?z&amp;space;=&amp;space;\cos&amp;space;\theta" alt="" /></p>
<p>Also, we can get the sine weighted projected unit vector by simply setting the z value to zero, since:</p>
<p><img src="http://latex.codecogs.com/gif.latex?\lvert&amp;space;(x,&amp;space;y,&amp;space;0)&amp;space;\rvert&amp;space;=&amp;space;\sin&amp;space;\theta" alt="" /></p>
<p>So, to get the local space tan weighted perpendicular vector, we just need to use the following:</p>
<p><img src="http://latex.codecogs.com/gif.latex?\vec{v_\theta}&amp;space;\tan&amp;space;\theta&amp;space;=&amp;space;(\frac{y}{z},&amp;space;-\frac{x}{z},&amp;space;0)" alt="" /></p>
<p>Note that the x and y values were swapped, and the x value negated to get the perpendicular vector.</p>
<h2>Results</h2>
<p>I&#8217;ve rendered out a before and after shot, showing just the indirect irradiance. There are no translational gradients being used at the moment, so there are still some artefacts. Here&#8217;s the before shot:</p>
<p><img class="aligncenter size-full wp-image-231" title="nogradient" src="http://www.rorydriscoll.com/wp-content/uploads/2009/01/nogradient.png" alt="nogradient" width="656" height="396" /></p>
<p>And here&#8217;s the same render, but with rotational gradients this time:</p>
<p><img class="aligncenter size-full wp-image-232" title="rotationalgradient" src="http://www.rorydriscoll.com/wp-content/uploads/2009/01/rotationalgradient.png" alt="rotationalgradient" width="656" height="396" /></p>
<p>Note that the time to render each frame is almost exactly the same, yet the rotational gradients provide a much smoother result. Next I&#8217;ll implement translational gradients and hopefully the image will look considerably better.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.rorydriscoll.com/2009/01/24/irradiance-caching-part-2/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Irradiance Caching: Part 1</title>
		<link>http://www.rorydriscoll.com/2009/01/18/irradiance-caching-part-1/</link>
		<comments>http://www.rorydriscoll.com/2009/01/18/irradiance-caching-part-1/#comments</comments>
		<pubDate>Mon, 19 Jan 2009 01:35:35 +0000</pubDate>
		<dc:creator>Rory</dc:creator>
				<category><![CDATA[C++]]></category>
		<category><![CDATA[Global Illumination]]></category>
		<category><![CDATA[Graphics]]></category>
		<category><![CDATA[Irradiance Caching]]></category>

		<guid isPermaLink="false">http://www.rorydriscoll.com/?p=166</guid>
		<description><![CDATA[Solving the rendering equation with even just one bounce of indirect lighting can take a long time. The majority of time spent rendering a frame is in estimating the lighting integral. For example, rendering a single bounce of indirect lighting at 720p resolution with 256 sample rays for a Monte Carlo estimator requires about 237 [...]]]></description>
			<content:encoded><![CDATA[<p>Solving the rendering equation with even just one bounce of indirect lighting can take a long time. The majority of time spent rendering a frame is in estimating the lighting integral. For example, rendering a single bounce of indirect lighting at 720p resolution with 256 sample rays for a Monte Carlo estimator requires about 237 million rays to be cast. This doesn&#8217;t even include the rays needed for sampling the lights for direct lighting, so in practice, the total will be even higher.</p>
<p>One interesting observation made by Greg Ward in his <a href="http://radsite.lbl.gov/radiance/papers/sg88/paper.html">Siggraph &#8217;88 paper</a> is that contrary to direct lighting, where shadows and lights can cause harsh changes, the indirect lighting on a surface tends to vary relatively slowly. One way to picture why this is, is to imagine the computing average color from the what you can see from each of your eyes. Even though each eye has a slightly different view on the world, the images they see are nearly similar, and so the average color is also nearly the same.</p>
<p><span id="more-166"></span></p>
<p>The image below shows the same scene from my previous post with just the indirect irradiance, and it&#8217;s pretty clear that for each surface, the lighting varies in a very smooth fashion.</p>
<p><img class="aligncenter size-full wp-image-167" title="finalgatherindirectonly" src="http://www.rorydriscoll.com/wp-content/uploads/2009/01/finalgatherindirectonly.png" alt="finalgatherindirectonly" width="656" height="396" /></p>
<p>Ward proposed using this knowledge to reduce the number of times that the Monte Carlo estimator was evaluated by interpolating between nearby previously calculated values. At the time he just called it &#8216;lazy evaluation&#8217;, which I personally think is a good way to picture the idea. Later it became known as irradiance caching.</p>
<h2>Irradiance Caching</h2>
<p>The basic concept for irradiance caching is really simple: For each point on a surface at which you want to evaluate irradiance, if the cache contains any valid entries then interpolate between them. Otherwise, calculate a new irradiance entry, and add it to the cache.</p>
<p>A cache entry contains the position and normal for the point on the surface where the irradiance was evaluated as well as the irradiance value itself. One important additional piece of information that the cache requires is the range over which the entry is considered potentially valid. This range could be calculated in a number of ways, but the most common one is to use the <a href="http://en.wikipedia.org/wiki/Harmonic_mean">harmonic mean</a> of the hit distance of the rays used for the estimator. For n estimator samples, each with hit distance d, the harmonic mean is simply:</p>
<p><img src="http://latex.codecogs.com/gif.latex?H = \frac{N}{\sum_{1}^{N}{\frac{1}{d_i}}}" title="H = \frac{N}{\sum_{1}^{N}{\frac{1}{d_i}}}" /></p>
<p>Using the harmonic mean distance makes the cache entry distribution very dense in corners and crevices, and sparse in open spaces. This matches up very well with where the indirect irradiance is likely to be changing the fastest. To get an idea of how the cache entry distribution looks, here&#8217;s the scene above with the cache entry positions shown as red dots:</p>
<p><img class="aligncenter size-full wp-image-176" title="irradiancecachepoints" src="http://www.rorydriscoll.com/wp-content/uploads/2009/01/irradiancecachepoints.png" alt="irradiancecachepoints" width="656" height="396" /></p>
<p>Once you can add entries into the cache, you need to know how to find whether or not a particular cache entry can be used for interpolating the irradiance at a sample point. There are potentially quite a few ways that you can discard invalid cache entries depending on how fancy you want to get. For now, I&#8217;m using three simple tests.</p>
<p>Discard the entry if any of the following are true</p>
<ul>
<li>It is out of range of the sample point.</li>
<li>It has a normal that is too different than the sample normal.</li>
<li>It is in front of the sample point.</li>
</ul>
<p>Once you have a valid cache entry, you need to calculate a weight for that entry, then carry on looking for other entries that are potentially valid. As you come across each valid cache entry, you need to keep the sum of the weighted irradiance values, and the sum of the weights themselves. From these two sums, you can calculate the final interpolated irradiance:</p>
<p><img src="http://latex.codecogs.com/gif.latex?E = \frac{\sum_{1}^{N}{w_i E_i}}{\sum_{1}^{N}{w_i}}" title="E = \frac{\sum_{1}^{N}{w_i E_i}}{\sum_{1}^{N}{w_i}}" /></p>
<p>The weight for a particular cache entry is another part of the algorithm that can potentially be calculated in many different ways. For now, I&#8217;m using the weight that Ward proposes, but there&#8217;s some interesting information about the weights used at Dreamworks in <a href="http://www.graphics.cornell.edu/~jaroslav/papers/2008-irradiance_caching_class/10-EricSlides.pdf">this paper</a>. Here&#8217;s Ward&#8217;s initial weighting function:</p>
<p><img src="http://latex.codecogs.com/gif.latex?w_i = \frac{1}{\frac{\lvert \vec{P} - \vec{P_i} \rvert}{r_i} + \sqrt{1 + \vec{N} \cdot \vec{N_i}}}" title="w_i = \frac{1}{\frac{\lvert \vec{P} - \vec{P_i} \rvert}{r_i} + \sqrt{1 + \vec{N} \cdot \vec{N_i}}}" /></p>
<p>Note that you have to be a little bit wary of this function, since it is unbounded. When the sample point lies exactly at the same point as the cache entry then there you will get a divide by zero.</p>
<p>Typically, you would also discard cache entries that are below some weight threshold as specified by the user. This effectively scales the density of the cache entries and allows the user to make the trade off between speed and quality.</p>
<h2>Implementation</h2>
<p>I&#8217;ve made a very bare bones implementation of irradiance caching as outlined above. At the moment I&#8217;m not using a quad tree to store the cache entries, so each cache check requires iterating through an array of entries. Clearly this is a very slow way to process the cache entries, but for now it does a decent enough job to allow me to focus on the irradiance caching algorithm itself. Here are the results:</p>
<p><img class="aligncenter size-full wp-image-186" title="irradiancecacheindirectonly" src="http://www.rorydriscoll.com/wp-content/uploads/2009/01/irradiancecacheindirectonly.png" alt="irradiancecacheindirectonly" width="656" height="396" /></p>
<p>Not very impressive, or smooth, is it? I was hoping that the simple implementation I have made would provide better results than this, but apparently not. At the moment there&#8217;s one crucial improvement to the algorithm that my implementation is missing though &#8211; <a href="http://radsite.lbl.gov/radiance/papers/erw92/paper.pdf">Irradiance Gradients</a>. Irradiance Gradients basically give a better clue as to how to interpolate the irradiance cache entries, both positionally and rotationally. I&#8217;m hoping that they will significantly reduce the artefacts visible at the moment.</p>
<p>One problem that can occur when using an irradiance cache is that later cache entries don&#8217;t contribute to previously rendered pixels. When this happens, you can see blocky artefacts where the irradiance values have been interpolated differently. Something like this:</p>
<p><img class="aligncenter size-full wp-image-187" title="irradiancecachenopregather" src="http://www.rorydriscoll.com/wp-content/uploads/2009/01/irradiancecachenopregather.png" alt="irradiancecachenopregather" width="656" height="396" /></p>
<p>One thing you can do to avoid this situation is to perform an irradiance gathering pass before doing the final render. When you perform the final render, you should have no cache misses. In my case, I am using a progressive renderer, so the cache is actually fairly well primed before rendering the 1&#215;1 pixel size.</p>
<p><img class="aligncenter size-full wp-image-188" title="irradiancecacheprogressive" src="http://www.rorydriscoll.com/wp-content/uploads/2009/01/irradiancecacheprogressive.png" alt="irradiancecacheprogressive" width="656" height="396" /></p>
<h2>Improvements</h2>
<p>In addition to irradiance gradients, there have been a load of improvements made to irradiance caching since the inital paper. The <a href="http://www.graphics.cornell.edu/~jaroslav/papers/2008-irradiance_caching_class/index.htm">course notes </a>for the Siggraph 2008 course provide details of many of these. I&#8217;ll post up some screenshots when I&#8217;ve added the irradiance gradients.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.rorydriscoll.com/2009/01/18/irradiance-caching-part-1/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Better Sampling</title>
		<link>http://www.rorydriscoll.com/2009/01/07/better-sampling/</link>
		<comments>http://www.rorydriscoll.com/2009/01/07/better-sampling/#comments</comments>
		<pubDate>Thu, 08 Jan 2009 07:33:51 +0000</pubDate>
		<dc:creator>Rory</dc:creator>
				<category><![CDATA[C++]]></category>
		<category><![CDATA[Global Illumination]]></category>
		<category><![CDATA[Graphics]]></category>

		<guid isPermaLink="false">http://www.rorydriscoll.com/?p=77</guid>
		<description><![CDATA[A couple of days ago, I compared the images my ambient occlusion integrator produced with those of Modo using similar settings. I noticed immediately how much &#8216;cleaner&#8217; the render from Modo was. Clearly there was an issue with the way I was picking my samples, so I set about improving things. My approach for generating [...]]]></description>
			<content:encoded><![CDATA[<p>A couple of days ago, I compared the images my ambient occlusion integrator produced with those of Modo using similar settings. I noticed immediately how much &#8216;cleaner&#8217; the render from Modo was. Clearly there was an issue with the way I was picking my samples, so I set about improving things.</p>
<p>My approach for generating the ambient occlusion rays was to generate uniform random samples over the hemisphere about the normal. Based on two random numbers in the range [0,1), I calculate the normalized sample direction using the following function:</p>
<pre class="brush: cpp; title: ; notranslate">
Vector3 Sample::UniformSampleHemisphere(float u1, float u2)
{
	const float r = Sqrt(1.0f - u1 * u1);
	const float phi = 2 * kPi * u2;

	return Vector3(Cos(phi) * r, Sin(phi) * r, u1);
}
</pre>
<p>This generates points on a hemisphere from uniform variables u1 and u2, where each point has equal probability of being selected. The following image was generated with 256 random uniform samples:</p>
<p><img class="aligncenter size-full wp-image-78" title="ao256samplesrandomuniform" src="http://www.rorydriscoll.com/wp-content/uploads/2009/01/ao256samplesrandomuniform.png" alt="ao256samplesrandomuniform" width="656" height="396" /></p>
<p><span id="more-77"></span>It looks pretty noisy, that&#8217;s for sure. Part of the trouble comes from the fact that there&#8217;s no way to ensure that there&#8217;s an even distribution of the rays. A common way to alleviate this problem is to do <a href="http://en.wikipedia.org/wiki/Stratified_sampling">stratified sampling</a> instead of fully random sampling. The idea of stratified sampling is to split up the domain into evenly sized segments, and then to pick a random point from within each of those segments. You still get some randomness, but the points are more evenly distributed, which in turn reduces the variance. Less variance means less noise. Here&#8217;s the scene again, using 256 rays, but this time using stratified sampling:</p>
<p><img class="aligncenter size-full wp-image-79" title="ao256samplesstratifieduniform" src="http://www.rorydriscoll.com/wp-content/uploads/2009/01/ao256samplesstratifieduniform.png" alt="ao256samplesstratifieduniform" width="656" height="396" /></p>
<p>As expected, it&#8217;s much less noisy, and for the same amount of computation!</p>
<h2>Sampling for Diffuse Monte Carlo Estimator</h2>
<p>The stratified sampler helps out with the indirect diffuse lighting calculation too, but one other thing you can do to reduce noise for the Monte Carlo estimator is to choose random values that have a similar &#8216;shape&#8217; to the integral you are estimating. Looking at the integral for diffuse reflections, you will see the familiar cosine term inside the integral:</p>
<p><img title="L_o = \int \frac{c}{\pi} L_i \cos \theta ,d \omega" src="http://latex.codecogs.com/gif.latex?L_o = \int \frac{c}{\pi} L_i \cos \theta ,d \omega" alt="" /></p>
<p>Where c is the diffuse material color, Li is the incoming radiance, and pi is the energy conservation constant.</p>
<p>Rather than wasting samples on areas of the integral where they will get mulitiplied out by the cosine term, why not just choose proportionally fewer samples in those areas?</p>
<p>Recall that the Monte Carlo estimator for an the integral of the function f(x), with probability density function p(x) is:</p>
<p><img title="F_N = \frac{1}{N} \sum_{i=1}^{N}{\frac{f(x_i)}{p(x_i)}}" src="http://latex.codecogs.com/gif.latex?F_N = \frac{1}{N} \sum_{i=1}^{N}{\frac{f(x_i)}{p(x_i)}}" alt="" /></p>
<p>The probability density function is just a function that returns the probability that a particular value will be chosen. For the uniform hemisphere sampling function above, the pdf is just a constant, (1 / (2 * pi)). This makes the Monte Carlo estimator for the diffuse integral:</p>
<p><img title="L_o \approx \frac{2c}{N} \sum_{i=1}^{N}{L_i \cos \theta}" src="http://latex.codecogs.com/gif.latex?L_o \approx \frac{2c}{N} \sum_{i=1}^{N}{L_i \cos \theta}" alt="" /></p>
<p>Rather than mutliply by the cosine term above, we just want to generate proportionally fewer rays at the bottom of the hemisphere. The integral of the pdf over the hemisphere must equal one, so by switching to a cosine-weighted sample distribution, the pdf becomes (cos(theta) / pi).</p>
<p>This makes the estimator:</p>
<p><img title="L_o \approx \frac{c}{\pi N} \sum_{i=1}^{N}{\frac{L_i \cos \theta}{\frac{\cos \theta}{\pi}}}" src="http://latex.codecogs.com/gif.latex?L_o \approx \frac{c}{\pi N} \sum_{i=1}^{N}{\frac{L_i \cos \theta}{\frac{\cos \theta}{\pi}}}" alt="" /></p>
<p>Which cleans up rather nicely to:</p>
<p><img title="L_o \approx \frac{c}{N} \sum_{i=1}^{N}{L_i}" src="http://latex.codecogs.com/gif.latex?L_o \approx \frac{c}{N} \sum_{i=1}^{N}{L_i}" alt="" /></p>
<p>Normally I would post a couple of images up for comparison&#8217;s sake, but in this case, the difference is pretty difficult to perceive without being able to compare one on top of the other. The difference is small, but it is definitely worth it!</p>
<p>The common way to generate a cosine weighted hemisphere sampler is to generate uniform points on a disk, and then project them up to the hemisphere. Here&#8217;s some code:</p>
<pre class="brush: cpp; title: ; notranslate">
Vector3 Sample::CosineSampleHemisphere(float u1, float u2)
{
	const float r = Sqrt(u1);
	const float theta = 2 * kPi * u2;

	const float x = r * Cos(theta);
	const float y = r * Sin(theta);

	return Vector3(x, y, Sqrt(Max(0.0f, 1 - u1)));
}
</pre>
<p>Just by doing these two small steps, I&#8217;ve been able to clean up my images significantly. Here&#8217;s the scene from above again, this time with single bounce final gather with 256 rays, stratified cosine-sampled:</p>
<p><img class="aligncenter size-full wp-image-93" title="finalgather256samplescosinesampled" src="http://www.rorydriscoll.com/wp-content/uploads/2009/01/finalgather256samplescosinesampled.png" alt="finalgather256samplescosinesampled" width="656" height="396" /></p>
<p>Next on my list is to take a look at path tracing, followed by irradiance caching (wasn&#8217;t that the point of all this?). This should allow me to get fairly cheap multi-bounce diffuse lighting.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.rorydriscoll.com/2009/01/07/better-sampling/feed/</wfw:commentRss>
		<slash:comments>10</slash:comments>
		</item>
		<item>
		<title>The Holidays: Time for fun work!</title>
		<link>http://www.rorydriscoll.com/2009/01/03/the-holidays-time-for-fun-work/</link>
		<comments>http://www.rorydriscoll.com/2009/01/03/the-holidays-time-for-fun-work/#comments</comments>
		<pubDate>Sun, 04 Jan 2009 01:48:52 +0000</pubDate>
		<dc:creator>Rory</dc:creator>
				<category><![CDATA[C++]]></category>
		<category><![CDATA[Global Illumination]]></category>
		<category><![CDATA[Graphics]]></category>

		<guid isPermaLink="false">http://www.rorydriscoll.com/?p=70</guid>
		<description><![CDATA[For the first time in about three years, I&#8217;ve had two weeks off work. I&#8217;ve spent a lot of time just relaxing and taking a break from things, but I&#8217;ve also been able to get back to doing some graphics work. Ever since Vivendi bought Activision, the project that I was leading has been &#8220;put [...]]]></description>
			<content:encoded><![CDATA[<p>For the first time in about three years, I&#8217;ve had two weeks off work. I&#8217;ve spent a lot of time just relaxing and taking a break from things, but I&#8217;ve also been able to get back to doing some graphics work. Ever since Vivendi bought Activision, the project that I was leading has been &#8220;put on hold&#8221;, so I&#8217;ve been back on the game team. It&#8217;s not as fun for me, that&#8217;s for sure, but luckily, I have my code at home to play with, so all is not lost! With the holidays, I&#8217;ve found some motivation to get back to it.</p>
<p>What have I been doing? Well, as I was approaching the break, I read through the course notes from the Practical Global Illumination with Irradiance Caching course at Siggraph last year. I thought the course itself was really good, and very clearly presented. After blitzing through the notes again, I thought I&#8217;d have a go at writing a ray tracer. It seemed simple enough at the time, but like most things, the devil is in the details.</p>
<p>The first thing I did was to set up a really simple single-threaded ray tracer that just displayed the color of the surface it hit. This was fairly quick to get up and running once I had written a few supporting classes for the cameras and shapes. It&#8217;s not very glamorous, but it&#8217;s a start:</p>
<p style="text-align: center;"><a href="http://www.rorydriscoll.com/wp-content/uploads/2009/01/basicintegrator.png"><img class="size-full wp-image-72 aligncenter" title="basicintegrator" src="http://www.rorydriscoll.com/wp-content/uploads/2009/01/basicintegrator.png" alt="Simple Integrator" width="500" height="301" /></a></p>
<p><span id="more-70"></span>Next, I added point lights and directional lights, and wrote a new integrator to calculate direct diffuse lighting. Once you have a function to trace rays around the scene, it&#8217;s really easy to add hard-edged shadows. It looks a lot better than the solid color integrator I first used, but it still not very impressive.</p>
<p>Here&#8217;s the scene with a single directional light and hard-edged shadows:</p>
<p style="text-align: center;"><a href="http://www.rorydriscoll.com/wp-content/uploads/2009/01/directonly.png"><img class="size-full wp-image-73 aligncenter" title="directonly" src="http://www.rorydriscoll.com/wp-content/uploads/2009/01/directonly.png" alt="" width="500" height="301" /></a></p>
<p>I wanted to flex the ray tracer a little bit, so and easy next step was to add an ambient occlusion integrator. Initially, I just used a function generate random uniform rays on the hemisphere around the hit normal, and used the ratio of misses to hits as the occlusion value. I found that this was really pretty noisy, so I tried using the length of the ray hits to weight the occlusion values. This definitely improved things, but it&#8217;s still pretty noisy. The obvious way to reduce the noise is to use more rays, but I&#8217;d like to find a cheaper way to do this if possible.</p>
<p>Here&#8217;s the scene rendered with the ambient occlusion integrator using 4096 rays per hit:</p>
<p style="text-align: center;"><a href="http://www.rorydriscoll.com/wp-content/uploads/2009/01/ao4096samples.png"><img class="size-full wp-image-71 aligncenter" title="ao4096samples" src="http://www.rorydriscoll.com/wp-content/uploads/2009/01/ao4096samples.png" alt="" width="500" height="301" /></a></p>
<p>The first time I tried to render this scene using 4096 ambient occlusion rays per pixel, it took about thirteen minutes. I&#8217;ve never really used release builds at home and the settings weren&#8217;t great, so I tweaked some of the project settings, and defined out asserts. This got the time down to about ten minutes. I&#8217;m running these renders on my Macbook Pro, so I have a whole other core just sitting there doing nothing. Switching to using a multithreaded renderer basically sped the renders up by a factor of two.</p>
<p>Combining some of the concepts of the ambient occlusion integrator, and the direct diffuse integrator, I created a multi-bounce diffuse integrator. Like the direct diffuse integrator, it calculates the direct diffuse lighting at the hit point. Additionally though, it uses a Monte Carlo estimator to approximate the diffuse lighting integral over the hemisphere about the normal of the hit point. It can handle any number of bounces of indirect light, but the render time increases exponentially with each bounce added. Like the ambient occlusion integrator, it requires a large number of sample rays to get an acceptable level of noise.</p>
<p>Here&#8217;s the scene again with one bounce of indirect light, and 4096 rays per hit:</p>
<p style="text-align: center;"><a href="http://www.rorydriscoll.com/wp-content/uploads/2009/01/onebounce4096samples.png"><img class="size-full wp-image-76 aligncenter" title="onebounce4096samples" src="http://www.rorydriscoll.com/wp-content/uploads/2009/01/onebounce4096samples.png" alt="" width="500" height="301" /></a></p>
<p>When a ray misses the scene, it looks up an environment color, which you can see in the background. Most of the indirect rays actually miss the scene, so this background color actually has a huge effect over the look of the scene. I should mention as well that I&#8217;m using a really simple tone mapping operator to map the HDR ray tracer values down to the 8 bit per channel texture.</p>
<p>While working on the ray tracer, I would often be playing around with the objects and lights in the scene. I quickly found out that it&#8217;s really not very fun to wait for the ray trace to complete before getting some feedback. I can reduce the number of indirect rays to make things quicker, but even at relatively low values, it can take a while to render the final scene.</p>
<p>I had already split the rendering of the scene into 32 by 32 blocks when I switched to a multi-threaded ray tracer, so it was a really simple extension to change the resolution in each of these blocks on the fly. I basically start things off by rendering with each ray covering 32 by 32 pixels, then when that completes, I immediately kick off another render at 16 by 16, and so on. Each successive render takes four times as long as the previous render, so if the 1 by 1 render takes about a minute, then you get the 8 by 8 render in about a second!</p>
<p>Here&#8217;s the scene rendered using 512 indirect samples, paused at the 4 by 4 resolution:</p>
<p style="text-align: center;"><a href="http://www.rorydriscoll.com/wp-content/uploads/2009/01/full512samples4by4.png"><img class="size-full wp-image-75 aligncenter" title="full512samples4by4" src="http://www.rorydriscoll.com/wp-content/uploads/2009/01/full512samples4by4.png" alt="" width="500" height="301" /></a></p>
<p>And here&#8217;s the scene at the conclusion of rendering (note that the time is cumulative of all the previous renders):</p>
<p style="text-align: center;"><a href="http://www.rorydriscoll.com/wp-content/uploads/2009/01/full512samples1by1.png"><img class="size-full wp-image-74 aligncenter" title="full512samples1by1" src="http://www.rorydriscoll.com/wp-content/uploads/2009/01/full512samples1by1.png" alt="" width="500" height="301" /></a></p>
<p>It&#8217;s pretty clear at the 4 by 4 resolution how the render is going to look, and it only took four seconds to get there, whereas the final scene took nearly a minute. The 1 by 1 resolution actually took only 40 seconds of that minute to render, but still, having the feedback within a tenth of the final render time seems worth the extra wait at the end.</p>
<p>That&#8217;s basically as far as I got over the past couple of weeks. Like many things I do, there seems to be more to do now than at the beginning. One of the things I&#8217;d really like to do is to be able to render out the lighting to radiosity normal maps. This would allow me to combine the static precomputed lighting in my DirectX10 engine. I could also output spherical harmonic coefficients for light probes which would allow me to render dynamic objects using the precomputed lighting.</p>
<p>Well, work starts back up in a couple of days, so the amount of time I can spend on this is going to be limited again, but I&#8217;ll post any significant updates. I have another article about the the lighting calculation on the the way, but it&#8217;s competing for my time!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.rorydriscoll.com/2009/01/03/the-holidays-time-for-fun-work/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Lighting: The Rendering Equation</title>
		<link>http://www.rorydriscoll.com/2008/08/24/lighting-the-rendering-equation/</link>
		<comments>http://www.rorydriscoll.com/2008/08/24/lighting-the-rendering-equation/#comments</comments>
		<pubDate>Mon, 25 Aug 2008 02:28:22 +0000</pubDate>
		<dc:creator>Rory</dc:creator>
				<category><![CDATA[Graphics]]></category>

		<guid isPermaLink="false">http://www.rorydriscoll.com/?p=56</guid>
		<description><![CDATA[Back in 2002, I started my first job in the games industry at Climax Studios in England. I have to admit, I didn&#8217;t know very much about game development at the time. Don&#8217;t get me wrong, I&#8217;d been writing little 2D games, and messing around with rubbish particle systems at home, but it was nothing [...]]]></description>
			<content:encoded><![CDATA[<p>Back in 2002, I started my first job in the games industry at Climax Studios in England. I have to admit, I didn&#8217;t know very much about game development at the time. Don&#8217;t get me wrong, I&#8217;d been writing little 2D games, and messing around with rubbish particle systems at home, but it was nothing like what I was about to get involved with. Despite my inexperience, somehow I did enough to pass the interview, and I was offered a job as a junior programmer.</p>
<p>As seemed to be typical for the time, my introduction to the industry was pretty much a trial by fire. I quickly found out that I couldn&#8217;t hope to truly understand every single new thing I encountered, so I learned to just accept some things as the truth. For example, I was told that the dot product of two normalized vectors yields the cosine of the angle between them. I just accepted this, and only took the time to find out why later on.</p>
<p style="text-align: center;"><a href="http://www.rorydriscoll.com/wp-content/uploads/2008/08/plasticmaterial.png"><img class="size-medium wp-image-58 aligncenter" title="plasticmaterial" src="http://www.rorydriscoll.com/wp-content/uploads/2008/08/plasticmaterial-295x300.png" alt="" width="295" height="300" /></a></p>
<p><span id="more-56"></span>One of the things I accepted at the beginning was the maths used to perform the lighting of our models during rendering. While I could understand how the equations appeared to yield decent looking results, I never understood where they came from, and why they worked.</p>
<p>After a while I began to wonder about this&#8230; Where did the equations for diffuse and specular reflections come from? What are the units of brightness we use for lights? What are the units for the pixels that get rendered?</p>
<p>It took lot of reading, re-reading, and re-re-reading, for me to really understand some of these things, so now that I have a blog, I thought I would share what I learned just in case anyone else is wondering about these things too.</p>
<h2>The Rendering Equation</h2>
<p>When light hits a point on a surface, some of it might get absorbed, reflected or possibly even refracted. Also, there may be additional light being emitted from that point by a power source, or perhaps scattered in from another point on the surface. Things can get complicated pretty quickly!</p>
<p>Luckily for us, some smart guys came up with something called the <a href="http://en.wikipedia.org/wiki/Rendering_equation">rendering equation</a> to deal with these factors. The rendering equation can produce incredibly realistic-looking images, but in its original form, it can also be very costly to evaluate.</p>
<p style="text-align: center;"><a href="http://www.rorydriscoll.com/wp-content/uploads/2008/08/cornellbox.png"><img class="size-medium wp-image-68 aligncenter" title="cornellbox" src="http://www.rorydriscoll.com/wp-content/uploads/2008/08/cornellbox.png" alt="" width="300" height="300" /></a></p>
<p>I&#8217;ve included a slightly simplified version of the rendering equation below. If you compare this to the version currently on Wikipedia, you&#8217;ll see that I&#8217;ve removed a couple of parameters, <em>t</em> and <em>lamba</em>.</p>
<p>Typically if you have something like the power output of a light varying over time, it&#8217;s a better to idea to evaluate it before trying to solve the rendering equation. By doing this, you can assume time is constant, and so you can ignore it.</p>
<p>The lambda symbol in the original equation represents a dependency on the wavelength of the light. Without this, everything would be greyscale, so it&#8217;s an important property. Rather than dealing with wavelength explicitly, we can just treat the red, green and blue color channels independently and solve the rendering equation once for each channel. Note that in practice we end up using per-component vector mathematics to solve the equation for all three channels at the same time.</p>
<p><a href="http://www.rorydriscoll.com/wp-content/uploads/2008/08/renderingequation.png"><img class="alignnone size-full wp-image-67" title="renderingequation" src="http://www.rorydriscoll.com/wp-content/uploads/2008/08/renderingequation.png" alt="" width="509" height="48" /></a></p>
<p>At first, this may seem a little bit intimidating, but actually it&#8217;s fairly simple. I&#8217;m going to break down each part and explain what it means.</p>
<p><strong>Outgoing Light</strong></p>
<p><a href="http://www.rorydriscoll.com/wp-content/uploads/2008/08/re-outgoing.png"><img class="alignnone size-full wp-image-64" title="re-outgoing" src="http://www.rorydriscoll.com/wp-content/uploads/2008/08/re-outgoing.png" alt="" width="509" height="48" /></a></p>
<p>This says that the rendering equation is a function which gives you the outgoing light in a particular direction <strong>w</strong> from a point <strong>x</strong> on a surface.</p>
<p><strong>Emitted Light</strong></p>
<p><a href="http://www.rorydriscoll.com/wp-content/uploads/2008/08/re-emitted.png"><img class="alignnone size-full wp-image-60" title="re-emitted" src="http://www.rorydriscoll.com/wp-content/uploads/2008/08/re-emitted.png" alt="" width="509" height="48" /></a></p>
<p>This is any light that is being emitted from the point. Most surfaces don&#8217;t emit light, so normally you don&#8217;t see any contribution here.</p>
<p><strong>Integral</strong></p>
<p><a href="http://www.rorydriscoll.com/wp-content/uploads/2008/08/re-integral.png"><img class="alignnone size-full wp-image-62" title="re-integral" src="http://www.rorydriscoll.com/wp-content/uploads/2008/08/re-integral.png" alt="" width="509" height="48" /></a></p>
<p>This says that the enclosed functions need to be integrated over all directions <strong>w&#8217;</strong> in the hemisphere above <strong>x</strong>. The orientation of the hemisphere is determined by the normal, <strong>n</strong>.</p>
<p><strong>BRDF</strong></p>
<p><a href="http://www.rorydriscoll.com/wp-content/uploads/2008/08/re-brdf.png"><img class="alignnone size-full wp-image-59" title="re-brdf" src="http://www.rorydriscoll.com/wp-content/uploads/2008/08/re-brdf.png" alt="" width="509" height="48" /></a></p>
<p>This is the bidirectional reflectance distribution function (BRDF). It&#8217;s a fancy name for the ratio of the amount of light reflected in a particular direction <strong>w</strong>, to the amount received from another direction <strong>w&#8217;</strong>. The BRDF warrants its own discussion, but for now it can just be thought of as the reflection amount.</p>
<p><strong>Incoming Light</strong></p>
<p><a href="http://www.rorydriscoll.com/wp-content/uploads/2008/08/re-incoming.png"><img class="alignnone size-full wp-image-61" title="re-incoming" src="http://www.rorydriscoll.com/wp-content/uploads/2008/08/re-incoming.png" alt="" width="509" height="48" /></a></p>
<p>This is the incoming light at the point <strong>x</strong> from the direction <strong>w&#8217;</strong>. Note that the incoming light doesn&#8217;t have to come from a light source (direct light). It may have been reflected or refracted from another point in the scene (indirect light).</p>
<p><strong>Normal Attenuation</strong></p>
<p><a href="http://www.rorydriscoll.com/wp-content/uploads/2008/08/re-normalattenuation.png"><img class="alignnone size-full wp-image-63" title="re-normalattenuation" src="http://www.rorydriscoll.com/wp-content/uploads/2008/08/re-normalattenuation.png" alt="" width="509" height="48" /></a></p>
<p>This attenuates the incoming light at <strong>x</strong> based on the cosine of the angle between the normal <strong>n</strong> and the incoming light direction <strong>w&#8217;</strong>.</p>
<h2>Making it Real-Time</h2>
<p>We use the rendering equation to perform lighting calculations in games, albeit in a simpler form. The most obvious problem for evaluating the rendering equation in a pixel shader is the integral, so we need to find a way to approximate it. One thing we can do is to split the way we deal with direct light and indirect light.</p>
<p>Since the indirect light is the harder of the two to deal with, we can approximate it. There are various different ways you might want to approximate the indirect light, from a simple ambient color, to more complex forms like spherical harmonics. Typically you would modulate your indirect light approximation with the diffuse part of your BRDF, since you don&#8217;t have a directional component to use in direction-dependent BRDFs.</p>
<p>By approximating the indirect light, we only have to worry the direct light in the rendering equation, so we can replace the integral with a simple sum over <em>k</em> light sources. Also, we typically model the change in the BRDF over space by using textures mapped over the surface, so <strong>x</strong> is no longer needed for that function.</p>
<p>I&#8217;ve changed up the notation here to match more closely what you see in games literature. The vector <strong>v</strong> is the normalized direction from the point <strong>x</strong> to the camera. The vector <strong>l<sub>i</sub></strong> is the normalized vector from light source <em>i</em> to the point <strong>x</strong>.</p>
<p><a href="http://www.rorydriscoll.com/wp-content/uploads/2008/08/simplification.png"><img class="alignnone size-full wp-image-65" title="simplification" src="http://www.rorydriscoll.com/wp-content/uploads/2008/08/simplification.png" alt="" width="418" height="52" /></a></p>
<p>It&#8217;s not explicitly written here, but the dot product now needs to be clamped to zero to avoid negative light values.</p>
<p>This may not look much like the &#8216;diffuse plus specular&#8217; equations you commonly see in games, but it&#8217;s actually pretty close. In fact, imagine removing the emitted light, and swapping the BRDF function with a constant color value, and you&#8217;ll see that we have the equation for calculating diffuse lighting.</p>
<p><a href="http://www.rorydriscoll.com/wp-content/uploads/2008/08/diffuse.png"><img class="alignnone size-full wp-image-57" title="diffuse" src="http://www.rorydriscoll.com/wp-content/uploads/2008/08/diffuse.png" alt="" width="291" height="55" /></a></p>
<p>The multiply symbol inside the circle just represents the component-wise multiplication of the red, green and blue channels of the color with the corresponding channels of the incoming light.</p>
<h2>Units&#8230;</h2>
<p>You may notice that I&#8217;ve been using fuzzy terms like &#8216;incoming light&#8217; as if it&#8217;s something we all know how to measure. A good question at this point might be &#8220;what are the units being used for light?&#8221;.</p>
<p style="text-align: center;"><img id="n.j60" class="aligncenter" style="width: 319px; height: 324px;" src="http://docs.google.com/File?id=dfnktsd4_58c5bhb5f9_b" alt="" width="319" height="324" /></p>
<p>Well, there are well defined units at play here, but I&#8217;m going to say right now that the units don&#8217;t really matter. I don&#8217;t know of any game engines where they attribute real-world units to lighting and material values, since those kinds of things tend to be driven more by the appearance than the physical correctness. The relative intensities of, say, a candle and a 100W light bulb are probably more important than the absolute values.</p>
<p>Having said all that, here&#8217;s the information anyway: The light is measured in units of radiance (Wsr<sup>-1</sup>m<sup>-2</sup>). Looking back at the integral in the rendering equation, you can see that the radiance is multiplied by the differential solid angle. This converts the radiance into irradiance (Wm<sup>-2</sup>).</p>
<p>Remember that the BRDF is the ratio of light reflected to light received? Well that&#8217;s a ratio of radiance to irradiance, so the BRDF has units of sr<sup>-1</sup>.</p>
<p>One thing to be careful of is that the incoming light in the simplified version of the rendering equation (using the sum over the lights) is measured in units of irradiance (Wm<sup>-2</sup>). We have to use irradiance directly here since the typical lights we use (point, spot, directional) don&#8217;t have any area.</p>
<h2>That&#8217;s It For Now</h2>
<p>I hope what I&#8217;ve explained so far was fairly clear, but I know I&#8217;ve left some pretty big holes here. I&#8217;m going to be looking at some of the following at a later date:</p>
<ul>
<li>What is the difference between irradiance and radiance?</li>
<li>What is the BRDF?</li>
<li>What is energy conservation, and does it matter for games?</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.rorydriscoll.com/2008/08/24/lighting-the-rendering-equation/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>

