Physically-Based Shading

I’ve noticed the term ‘physically-based shading’, or variants thereof, used with increasing frequency these days. I’m happy about this, since it represents a movement away from hacking around with magic numbers and formulae in shaders, towards focusing on the underlying material and lighting models. It offers consistency, predictability and constraints where previously these had been somewhat lacking. There was a great course at Siggraph this year purely about physically based shading.

The trouble is, I’m still not really sure exactly what it means…

Energy Conservation

When looking at material response, I expect that most people start with the usual Lambert for diffuse plus Blinn-Phong with a Fresnel effect for specular and continue on their happy way. At some point, they may start to read about physically-based shading, and discover the idea of energy conservation (something I wrote about before).

The standard Lambert diffuse response can emit more light than it receives. The standard Blinn-Phong specular model can either lose energy or gain energy depending on the specular power and color. If you just add the diffuse and specular responses together, materials can also emit more light than they receive.

It’s fairly easy to change these functions to be energy-conserving, and there are some benefits to doing so, but is energy-conserving Lambert and Blinn-Phong (LBP) considered ‘physically based shading’? It’s based on the concept that energy can neither be created or destroyed, right?

BRDF Model

I think what most people are referring to when they’re talking about physically based shading, is the model underlying the BRDF. For example, the Torrance-Sparrow microfacet BRDF is modeled on the idea of a surface being comprised of many tiny ideal Fresnel mirrors. The Phong BRDF is vastly simplified, but still grounded in a model of how light is reflected off a mirror.

Is more physically-based, fewer simplifications better? Have we lost any need for magic numbers and hacks?

To even think about answering this question, we have to understand what are we trying to do when we write a BRDF. In general, we’re trying to approximate the physical response of a real-world material using a combination of functions. That physical response is the ratio of radiance to irradiance based on the incoming light direction and the outgoing light direction. Ideally our BRDF will be flexible enough to handle a range of different material types within the same model.

Measure Twice, Cut Once

So if we’re approximating real-world data with our BRDF, can’t we just compare it to a real material? That’s a tricky prospect unfortunately. We can only compare our model to what we actually see, and this is the result of not only the BRDF, but the lighting environment as well. The lighting environment consists of many factors such as the number and geometry of the light emitters, the power of the lights, reflections, refractions, occlusion, volumetric scattering. It sounds impossible, doesn’t it?

There is some good news though. The boffins at the Mitsubishi Electric Research Laboratories (MERL) have laser-scanned a number of materials, and have made them freely available for research and academic use. Also, Disney Animation created a tool to visualize these scanned materials and to compare them to any BRDF written in GLSL.

BRDF Comparison

I thought it would be interesting to compare energy-conserving LBP to the Disney Principled BRDF. The Disney BRDF is energy-conserving and is based on the Torrance-Sparrow microfacet specular model and the Lambert diffuse model with some tweaks (for example, to handle diffuse retro-reflection). While it is more physically-based than straight LBP, it still contains an empirical model for the diffuse part.

To make these test images, I loaded up a MERL material in the BRDF explorer, and then used the graph views to match the parameters of each of the BRDFs as closely as possible for the peak specular direction.

The most interesting view in the BRDF explorer shows an image representing a slice of the BRDF (not the overall lighting response) as a two dimensional function. This function is parameterized in a different space than you might be used to, with the half-angle (the angle between the normal and the half-vector) vs difference-angle (the angle between the half vector and the incoming or outgoing light direction).

Where did the other two dimensions go? They’re still there… The slice just represents the theta angles, and you have to scroll through the slices for different phi values. The nice thing about this representation is that in general it’s enough to look at just one slice to get a really good idea of how well a BRDF fits the data.

For each of the following images, the Lambert-Blinn-Phong BRDF is on the left, the scanned material is in the middle, and the Disney BRDF is on the right. I’ve included the BRDF view as well as a lit sphere view.

This first material is shiny red plastic. The left side of the BRDF view clearly shows the tight specular peak. In the top left, you can see the strong Fresnel effect as the viewing direction gets to grazing angles. The darkening effect in the extremes I believe is due to the Fresnel effect, since light coming in from other angles is being mirrored away.

The LBP BRDF captures the Fresnel effect to a small amount, but cannot capture the darkening on the extremes. The Disney BRDF clearly does a better job at capturing these features of the scanned material, but still cannot quite match the reference.

In this dull red plastic, you can see the effects of the retro-reflection in both the sphere and BRDF view of the MERL material. The Disney BRDF captures this to a certain extent, but the LBP BRDF does not. Note that the Disney BRDF also did a better job at capturing the shape of the specular highlight, especially at grazing angles. The Blinn-Phong response is a compromise between the width of the specular lobe at grazing angles, and the intensity when more face on.

This is a brass material. It’s pretty clear here how inadequate Blinn-Phong is to capture the long tail of the brass specular response. The Disney BRDF fares a little better, but it’s still not close to the scanned material. It’s possible to alter the Disney BRDF slightly to allow for longer tails, but this then makes matching non-metal materials more difficult.

This steel material in appears to have some artifacts from the laser scanning process in the MERL view. Again, it’s difficult for both BRDFs to capture the specular response, but the Disney one does a little better than LBP.

How Physical Is Your Shader?

Clearly the Disney BRDF does a much better job at capturing these materials than Lambert plus Blinn-Phong. Of course, it’s more expensive to calculate too. It still contains what some would consider a ‘hack’ to handle diffuse retro-reflection, and was specifically engineered to match the MERL BRDFs. Does this make it bad? Not really. At the end of the day, we have to use the best approximation we can that will work within our budgets.

The primary benefit of the movement towards physically-based models for me is really in achieving more consistency via increasing constraints. An artist would probably tell you that it’s about achieving a closer match to the real-world materials. Both are really nice to have.

So what do you think of when someone says they’re using a physically based shader?

Comments (5)

Derivative Map Artifacts

I had been suffering from some strange artifacts on the edges of my objects when using derivative maps. After much time spent in GPU Perf Studio, I finally realised that my mipmap generation was not correct. It was introducing one extra column of garbage at every level.

My use of FXAA and anisotropic filtering was just making the problem more evident. I would recommend using regular trilinear filtering for derivative maps anyway.

So, mea culpa and all that. Let the name of Morten Mikkelsen and derivative maps be cleared!


Derivative Maps vs Normal Maps

This post is a quick follow up to my previous post on derivative maps. This time I’m going to compare the quality and performance of derivative maps with the currently accepted technique, normal maps.

I’ll be using the precomputed derivative maps for comparison since the ddx/ddy technique just isn’t acceptable in terms of quality.


Here’s the close up shot of the sphere with the moon texture again. This shows the derivative map implementation, and if you mouse over, you’ll see the normal map version.

There are some slight differences because the height of the derivative map doesn’t quite match the heights used to precompute the normal map, but overall I would say that they are remarkably similar. It looks to me that the normal map is preserving more of the detail though.

Here’s a high-contrast checkerboard, again with the normal map shown if you mouse over.

I’m no artist, but I would say the the derivative map results are close enough to the normal maps to call the technique viable from a quality standpoint.

EDIT: I had some issues with artifacts which I posted here. It turns out they were (embarrassingly) caused by my mipmap generation which was introducing a row of garbage at each level. Combined with FXAA and anisotropic filtering, this caused the weird vertical stripes I posted before.

I’ve removed the images since I don’t want to give the wrong impression of the quality of derivative maps.


I ran these tests on my Macbook Pro which has an AMD 6750M. The shader in question is a simple shader for filling out the gbuffer render targets. All shaders were compiled using shader model 5. I took the frame times from the Fraps frame counter and the other numbers came from Gpu Perf Studio.

For comparison, I’ve included an implementation with no normal perturbation at all.

Perturbation Frame Pixels Tex Inst Tex Busy ALU Inst ALU Busy ALU/Tex
None 1.08 ms 262144 3 27.5 % 14 32.1 % 4.667
Normal map 1.37 ms 262144 4 36.5 % 23 52.4 % 5.75
Derivative map 1.36 ms 262144 9 82.0 % 28 63.8 % 3.11

Despite the extra shader instructions, the derivative map method is basically as fast as normal maps on my hardware. As Mikkelsen predicted, it seems like having one fewer vertex attribute interpolator offsets the cost of the extra ALU instructions.

Note that the derivative map shader has nine texture instructions compared to just four for the normal maps. The extra five instructions are the two sets of ddx/ddy instructions, and the instruction to get the texture dimensions. The pixel shader can issue one texture instruction and one ALU instruction on the same cycle, these are essentially free.

The only performance overhead which has any impact for derivative maps are the five extra ALU instructions.


As I mentioned in my previous post, derivative maps also have the tremendous benefit of not requiring tangent vectors. In my case, with a simple vertex containing position, normal, tangent and one set of texcoords, the tangent takes up 27% of the mesh space.

Given that most games these days have tens of megabytes of mesh data, this would turn into some pretty decent memory savings. There’s also a minor benefit on the tool-side to not having to spend time generating face tangents and merging them into the vertices.


Well, for me it’s pretty clear. On my setup, derivative maps have a similar quality with the same performance but less memory. This makes them a win in my book. Of course, these numbers will vary wildly based on the API and hardware, so this can’t be taken as a blanket ‘derivative maps are better than normal maps’ statement, but they look promising. Good job Morten Mikkelsen!

I would love to see a similar comparison for the current generation of console hardware (hint, hint!).

If you have DirectX 11, then you should be able to run the demo here.

Comments (22)

Cubemap Texel Solid Angle

Warning: This post is going to be pretty math heavy. If you suck at math, then go and read this first, then come back. If you still think that you suck, then I suggest going to Khan Academy and start watching videos. You won’t regret it!

I was reading through the AMD CubeMapGen source last week and came across the code for calculating the solid angle of a cube map texel. This code piqued my interest, since it seemed very terse for what I thought would be a horrific calculation.

static float32 AreaElement( float32 x, float32 y )
	return atan2(x * y, sqrt(x * x + y * y + 1));

float32 TexelCoordSolidAngle(int32 a_FaceIdx, float32 a_U, float32 a_V, int32 a_Size)
   //scale up to [-1, 1] range (inclusive), offset by 0.5 to point to texel center.
   float32 U = (2.0f * ((float32)a_U + 0.5f) / (float32)a_Size ) - 1.0f;
   float32 V = (2.0f * ((float32)a_V + 0.5f) / (float32)a_Size ) - 1.0f;

   float32 InvResolution = 1.0f / a_Size;

	// U and V are the -1..1 texture coordinate on the current face.
	// Get projected area for this texel
	float32 x0 = U - InvResolution;
	float32 y0 = V - InvResolution;
	float32 x1 = U + InvResolution;
	float32 y1 = V + InvResolution;
	float32 SolidAngle = AreaElement(x0, y0) - AreaElement(x0, y1) - AreaElement(x1, y0) + AreaElement(x1, y1);

	return SolidAngle;

The source code for this particular part is well documented, and points you towards this thesis by Manne Öhrström (@manneohrstrom) where he gives a high level overview of the derivation. I was interested in finding out some more of the details, so I had a go myself, and this post is the result.

Why is it Useful?

When processing cube maps (for example, generating a diffuse irradiance map, or spherical harmonic approximation), you need to be able to integrate the texel values over a sphere.

One way of approximating this integral is to use a Monte Carlo estimator. This is a statistical technique that may oversample some texels and undersample other. This seems a bit wasteful considering that we have a finite number of input values. Ideally we’d like to use each texel value just once.

A naive approach to analytical integration where each texel has the same weight would result in overly bright values in the corner areas. This is because the texels in the corners project to smaller and smaller areas on the sphere. The correct approach is to factor in the solid angle during the integral, and this is what CubeMapGen does.

The Plan

Imagine a single cube map face placed at (0,0,1) and scaled such that the texel locations are all in [-1,1]. For any texel in this cube map, we want to project it onto a unit sphere sitting at the origin, then work out the area on the sphere. This area corresponds to the solid angle because the sphere is a unit sphere.

We can repeat this same calculation for any of the other cube map faces by first transforming them into the same range.

This is the high-level game plan for calculating out the solid angle:

  1. Determine a formula for projecting a position from texture-space onto the sphere.
  2. Work out how this projected position changes as the texture-space coordinates change in x and y.
  3. Imagining that these position change vectors define two sides of a microscopic quadrilateral, then calculate the microscopic area of this quad using the magnitude of the cross product.
  4. Integrate the microscopic area using the corner coordinates of a texel to calculate its area on the sphere, and solid angle.

The Details

We start off with the formula for projecting a point from its location on the texture face (x, y, 1) onto the unit sphere. This is just a standard vector normalization.

    \begin{align*} \vec{p} = \dfrac{\begin{pmatrix}x, y, 1 \end{pmatrix}}{\sqrt{x^2 + y^2+1}} \end{align*}

Note: I’ll be switching back and forth between negative and fractional exponents as I see fit. This makes things easier. Remember, x^{-n} = \frac{1}{x^n}, and x^{\frac{1}{2}} = \sqrt{x}.

We want to calculate how this projected point changes as the texture-space x and y coordinates change. We can do this separately for each axis using partial derivatives. First we’ll start by calculating how the projected z component changes along to the texture-space x axis.

Projected Z Change According to X

The z-component of p is simply:

    \begin{align*} p_z &= \dfrac{1}{\sqrt{x^2 + y^2+1}}\\ &= (x^2 + y^2+1)^{-\frac{1}{2}} \end{align*}

We need to differentiate this equation with respect to x. Because of the exponent, we need to use the chain rule to do this. The chain rule is a method for finding the derivative of the composition of two functions. First, we can reformulate the equation a little bit to make the two functions a bit clearer:

    \begin{align*} p_z = u^{-\frac{1}{2}}, u = x^2 + y^2+1 \end{align*}

In our case our first function is a function of u and our second function is a function of x and y. Given this, the chain rule says:

    \begin{align*} \frac{\partial p_z}{\partial x} = \frac{\partial p_z}{\partial u} \frac{\partial u}{\partial x} \end{align*}

We can apply this rule very easily to our reformulated functions:

    \begin{align*} \frac{\partial p_z}{\partial u} &=-\frac{u^{-\frac{3}{2}}}{2}\\ &=-\frac{1}{2(x^2+y^2+1)^{\frac{3}{2}}} \\[10] \frac{\partial u}{\partial x}&=2x\\[10] \frac{\partial p_z}{\partial x} &= -\frac{x}{(x^2+y^2+1)^{\frac{3}{2}}} \end{align*}

This equation tells us exactly how the z component of the projected point changes as the texture-space position moves along the x axis.

Projected X Change According to X

Now we’ve found the projected z component derivative, it’s going to make finding the x and y components a little easier. Why? Because we can express the x and y components in terms of the z component.

    \begin{align*} p_x &= \frac{x}{\sqrt{x^2 + y^2+1}}\\ &= x p_z \end{align*}

We don’t have the same ‘composition of functions’ setup that we did last time, so we can’t use the chain rule to differentiate this. Instead, we can use the product rule. The product rule in our case says:

    \begin{align*} \frac{\partial p_x}{\partial x} &= p_z\frac{\partial (x)}{\partial x} + x\frac{\partial (p_z)}{\partial x} \end{align*}

Applying this to equation for for the projected x component:

    \begin{align*} \frac{\partial p_x}{\partial x} &= \frac{1}{(x^2 + y^2+1)^{\frac{1}{2}}}-\frac{x^2}{(x^2+y^2+1)^{\frac{3}{2}}}\\ &= \frac{y^2+1}{(x^2+y^2+1)^{\frac{3}{2}}} \end{align*}

Projected Y Change According to X

We have a very similar derivation for the projected y derivative:

    \begin{align*} p_y = \frac{y}{\sqrt{x^2 + y^2+1}} = y p_z \end{align*}

We can use the product rule again:

    \begin{align*} \frac{\partial p_y}{\partial x} &= p_z\frac{\partial (y)}{\partial x} + y\frac{\partial (p_z)}{\partial x}\\ &= -\frac{xy}{(x^2+y^2+1)^{\frac{3}{2}}} \end{align*}

Projected Position Change According to X and Y

Putting this all together, we have our equation showing how the projected position changes as the texture-space position changes in the x direction. We can use the exact same process to work out how it moves in the y direction.

    \begin{align*} \frac{\partial \vec{p}}{\partial x} &= \frac{\begin{pmatrix}y^2+1,-xy, -x\end{pmatrix}}{(x^2+y^2+1)^{\frac{3}{2}}}\\ \frac{\partial \vec{p}}{\partial y} &= \frac{\begin{pmatrix}-xy, x^2+1,-y\end{pmatrix}}{(x^2+y^2+1)^{\frac{3}{2}}} \end{align*}

Differential Area

The next step is to calculate the differential (microscopic) area of the projected point using the partial derivatives we just calculated. Clearly at a normal scale, we wouldn’t be able to take the cross product of two projected vectors on a sphere and expect the magnitude to be the area they define on the sphere. But at this differential scale, we can treat the surface as if it is flat, so this works.

The first thing we need to do is to calculate the cross product of the partial derivatives.

    \begin{align*} \vec{r} &= \frac{\partial \vec{p}}{\partial x} \times \frac{\partial \vec{p}}{\partial y}\\ &= \frac{\begin{pmatrix}y^2+1,-xy, -x\end{pmatrix}}{(x^2 + y^2+1)^{\frac{3}{2}}} \times \frac{\begin{pmatrix}-xy, x^2+1,-y\end{pmatrix}}{(x^2 + y^2+1)^{\frac{3}{2}}} \end{align*}

Calculating the cross product for each of the components relatively straightforward.

    \begin{align*} r_x &= \frac{x^3+xy^2+x}{(x^2 + y^2+1)^3}\\ r_y &= \frac{x^2y+y^3+y}{(x^2 + y^2+1)^3}\\ r_z &= \frac{(y^2+1)(x^2+1)-x^2y^2}{(x^2 + y^2+1)^3}\\ \end{align*}

If you’re reading carefully, you’ll notice that each component has a factor of x^2+y^2+1 on the top and the bottom, so we can divide through. Combining all the components back together again, we arrive at the final equation for the perpendicular vector.

    \begin{align*} \vec{r} &= \frac{(x,y,1)}{(x^2 + y^2+1)^2} \end{align*}

Now we simply need to take the length of the result of the cross product to find the differential area on the sphere.

    \begin{align*} \partial A &= \sqrt{\vec{r} \cdot \vec{r}}\\[5] &= \sqrt{\frac{x^2+y^2+1}{(x^2 + y^2+1)^4}}\\[5] &= \frac{1}{(x^2 + y^2+1)^{\frac{3}{2}}} \end{align*}

Solid Angle

The final step is to integrate the differential area over our range of texture-space values to get the solid angle of the texel. We can start by calculating the integral between (0,0) and some point (s,t) on the cube map face.

    \begin{align*} f(s,t)&=\int_{y=0}^t\int_{x=0}^s \frac{1}{(x^2+y^2+1)^{\frac{3}{2}}} \,\mathrm{d}x\, \mathrm{d}y\\ &=\mathrm{tan}^{-1}\frac{st}{\sqrt{s^2+t^2+1}} \end{align*}

From this formula, we can calculate the area of any texel in the cube map face by adding together the two right-diagonal corners, A and C, and subtracting the left-diagonal corners, B and D.

    \begin{align*} S=f(A) - f(B) + f(C) - f(D) \end{align*}

You can see on the image below that the added areas in green are canceled out by the subtracted areas in red.

That should look familiar, since that’s exactly what the CubeMapGen code does. If you look at the surrounding source code to TexelCoordSolidAngle, then you’ll notice that there’s another method mentioned for calculating the solid angle of a texel. This method is based on Girard’s theorem, which describes how to calculate the area of a spherical triangle based on the excess of the sum of its interior angles. This method was also suggested to me on Twitter by Ignacio Castaño (@castano). I haven’t actually tried it, but it looks fascinating!

Is it Correct?

It’s always a bit daunting to get to the end of a derivation like this, and not know if the answer is correct or not. In this case, it’s pretty easy to verify if this result is correct.

Remember that the texture-space coordinates are in range [-1,1]. If we set our (x,y) values to 1, that corresponds to the top right quarter of the cube map face. We know that there are 4\pi steradians in a sphere, so that means that each face gets \frac{2\pi}{3} steradians. Since we’re only calculating for a quarter of a face we expect our result to be \frac{\pi}{6}.

    \begin{align*} f(1,1)&=\mathrm{tan}^{-1}\frac{1}{\sqrt{3}}\\ &= \frac{\pi}{6} \end{align*}

And it does. Thanks to the various people on twitter (@SebLagarde, @mattpharr, @manneohrstrom, @castano and @ChristerEricson) for engaging me in conversation over this.

Please let me know in the comments if you spot an error in this post, or if anything needs to be explained better or more easily.

Comments (17)

Derivative Maps

I recently came across an interesting paper, Bump Mapping Unparametrized Surfaces on the GPU by Morten Mikkelsen of Naughty Dog. This paper describes an alternative method to normal mapping, closely related to bump mapping. The alluring prospect of this technique is that it doesn’t require that a tangent space be defined.

Mikkelsen is apparently well-versed in academic obfuscation (tsk!), so the paper itself can be a little hard to read. If you’re interested in reading it, then I would recommend first reading Jim Blinn’s original bump mapping paper to understand some of the derivations.

But Wait! What’s Wrong with Normal Maps?

Nothing really. But if something comes along that can improve quality, performance or memory consumption then it’s worth taking a a look.

A Quick Detour into Gradients

Given a scalar height field (i.e. a two-dimensional array of scalar values), the gradient of that field is a 2D vector field where each vector points in the direction of greatest change. The length of the vectors corresponds to the rate of change.

The contour map below represents the scalar field generated from the function f(x,y) = 1 - (x^2 + y^2). The vector field shows the gradient of that scalar field. Note how each vector points towards the center, and how the vectors in the center are smaller due to the lower rate of change.

Derivative Maps

The main premise of the paper is that we can project the gradient of the height field onto an underlying surface and use it to skew the surface normal to approximate the normal of the height-map surface. We can do all of this without requiring tangent vectors.

As with the original bump-mapping technique, it’s not exact due to some terms being dropped due to their relatively small influence, but it’s close.

There are really only two important formulae to consider from the paper. The first shows how to perturb the surface normal using the surface gradient. Don’t confuse the surface gradient with the gradient of the height field mentioned above! As you’ll see shortly, they’re different.

    \begin{align*} {\bar{n}}'=\bar{n}-\nabla_s\beta \end{align*}

Here, {\bar{n}}' represents the perturbed normal, \bar{n} is the underlying surface normal, and \nabla_s\beta is the surface gradient. So basically, this says that the perturbed normal is the surface normal offset in the negative surface gradient direction.

So how do we calculate the surface gradient from the height field gradient? Well, there’s some fun math in there which I don’t want to repeat, but if you’re interested, I would recommend reading Blinn’s paper first, then Mikkelsen’s paper. You eventually arrive at:

    \begin{align*} \nabla_s\beta=\dfrac{(\sigma_t \times \bar{n} )\beta_s + (\bar{n} \times \sigma_s)\beta_t}{\bar{n} \cdot (\sigma_s \times \sigma_t)} \end{align*}

In addition to the symbols defined previously, \sigma_s and \sigma_t are the partial derivatives of the surface position, and \beta_s and \beta_t are the partial derivatives of the height field. The derivative directions s and t are not explictly defined here.

It’s easiest to think of this as the projection of the 2D gradient onto a 3D surface along the normal. Intuitively, this says that the surface gradient direction is pushed out on orthogonal vectors to the s/n and t/n planes by however much the gradient specifies. The denominator term is there to scale up the result when the s and t are not orthogonal, or are flipped.


Implementing this technique is fairly straightforward once you realise the meaning of some of the variables. Since we’re free to choose the partial derivative directions s and t, it’s convenient for the shader to use screen-space x and y. The value \sigma is the position, and the value \beta is the height field sample.

// Project the surface gradient (dhdx, dhdy) onto the surface (n, dpdx, dpdy)
float3 CalculateSurfaceGradient(float3 n, float3 dpdx, float3 dpdy, float dhdx, float dhdy)
	float3 r1 = cross(dpdy, n);
	float3 r2 = cross(n, dpdx);

	return (r1 * dhdx + r2 * dhdy) / dot(dpdx, r1);

// Move the normal away from the surface normal in the opposite surface gradient direction
float3 PerturbNormal(float3 n, float3 dpdx, float3 dpdy, float dhdx, float dhdy)
	return normalize(normal - CalculateSurfaceGradient(normal, dpdx, dpdy, dhdx, dhdy));

So far, so good. Next we need to work out how to calculate the partial derivatives. The reason why we chose screen-space x and y to be our partial derivative directions is so that we can use the ddx and ddy shader instructions to generate the partial derivatives of both the position and the height.

Given a position and normal in the same coordinate-space, and a height map sample, calculating the final normal is straighforward:

// Calculate the surface normal using screen-space partial derivatives of the height field
float3 CalculateSurfaceNormal(float3 position, float3 normal, float height)
	float3 dpdx = ddx(position);
	float3 dpdy = ddy(position);

	float dhdx = ddx(height);
	float dhdy = ddy(height);

	return PerturbNormal(normal, dpdx, dpdy, dhdx, dhdy);

Note that in shader model 5.0, you can use ddx_fine/ddy_fine instead of ddx/ddy to get high-precision partial derivatives.

So how does this look? At a medium distance, I would say that it looks pretty good:

But what about up close?

Uh oh! What’s happening here? Well, there are a couple of problems…

The main problem is that the height texture is using bilinear filtering, so the gradient between any two texels is constant. This causes large blocks to become very obvious when up close. There are a couple of options for alleviating this somewhat.

One option is to use bicubic filtering. I haven’t tried it, but I would expect this to make a good difference. The problem is that it will incur an extra cost. Another option, suggested in the paper, is to add a detail bump texture on top. This helps quite a lot, but again it adds more cost.

In the image below I’ve just tiled the same texture at 10x frequency over the top. It would be better to apply some kind of noise function as in the original paper.

The second problem is more subtle. We’re getting some small block artifacts because of the way that the ddx and ddy shader instructions work. They take pairs of pixels in a pixel quad and subtract the relevant values to get the derivative. In the case of the height derivatives, we can alleviate this by performing the differencing ourselves with extra texture samples.

The first problem is pretty much a killer for me. I would rather not have to cover up a fundamental implementation issue with extra fudges and more cost.

What Now?

It’s unfortunate that this didn’t make it into the original paper, but Mikkelsen mentions in a blog post that you can increase the quality by using precomputed height derivatives. This method requires double the texture storage (or half the resolution) of the ddx/ddy method, but produces much better results.

You’re probably wondering how you can possibly precompute screen-space derivatives. We don’t actually have to. Instead we can use the chain rule to transform a partial derivative from one space to another. In our case we can transform our derivatives from uv-space to screen-space if we have the partial derivatives of the uvs in screen-space.

To calculate dhdx you need dhdu, dhdv, dudx and dvdx:

    \begin{align*} \dfrac{\delta h}{\delta x} = \dfrac{\delta h}{\delta u} \cdot \dfrac{\delta u}{\delta x} + \dfrac{\delta h}{\delta v} \cdot \dfrac{\delta v}{\delta x} \end{align*}

To calculate dhdy you need dhdu, dhdv, dudy and dvdy:

    \begin{align*} \dfrac{\delta h}{\delta y} = \dfrac{\delta h}{\delta u} \cdot \dfrac{\delta u}{\delta y} + \dfrac{\delta h}{\delta v} \cdot \dfrac{\delta v}{\delta y} \end{align*}

The hlsl for this is very simple:

float ApplyChainRule(float dhdu, float dhdv, float dud_, float dvd_)
	return dhdu * dud_ + dhdv * dvd_;

Assuming that we have a texture that stores the texel-space height derivatives, we can scale this up in the shader to uv-space by simply multiplying by the texture dimensions. We can then use the screen space uv derivatives and the chain rule to transform from dhdu/dhdv to dhdx/dhdy.

// Calculate the surface normal using the uv-space gradient (dhdu, dhdv)
float3 CalculateSurfaceNormal(float3 position, float3 normal, float2 gradient)
	float3 dpdx = ddx(position);
	float3 dpdy = ddy(position);

	float dhdx = ApplyChainRule(gradient.x, gradient.y, ddx(uv.x), ddx(uv.y));
	float dhdy = ApplyChainRule(gradient.x, gradient.y, ddy(uv.x), ddy(uv.y));

	return PerturbNormal(normal, dpdx, dpdy, dhdx, dhdy);

So how does this look? Well, it’s pretty much the same at medium distance.

But it’s way better up close, since we’re now interpolating the derivatives.


In order to really draw any conclusions about this technique, I’m going to need to compare the quality, performance and memory consumption to that of normal mapping. That’s a whole other blog post waiting to happen…

But in theory, the pros are:

  • Less mesh memory: We don’t need to store a tangent vector, so this should translate into some pretty significant mesh memory savings.
  • Fewer interpolators: We don’t need to pass the tangent vector from the vertex shader to the pixel shader, so this should be a performance gain.
  • Possible less texture memory: At worst this method requires two channels in a texture. At best, a normal map takes up two channels.
  • Easy scaling: It’s easy to change the height scale on the fly by scaling the height derivatives. This isn’t quite so easy to get right when using normal maps. See here. As Stephen Hill points out in the comments below, this is a pretty weak argument, so I’m removing it.

And the cons are:

  • More ALU: It’s going to be interesting to see the actual numbers, but this is probably the only thing that could put the nail in the coffin for derivative maps. The extra cost for ALU might be compensated partially by the fewer interpolators, but we’ll have to see.
  • Less flexible: A normal map can represent any derivative map, but the reverse is not true. I’m not sure that this is a significant problem in practice though.
  • Worse quality? I’m not sure about this one, but it’ll be interesting to see if the quality holds up.

Comments (12)

UI Anti-Aliasing

I’ve been working on making a really simple IMGUI implementation for my engine at home. I like to do a little bit of research when I’m approaching something new to me like this, so I went hunting around for publicly available implementations. While doing this, I came across Mikko Mononen’s implementation in Recast.

I was impressed when I ran the demo with how smooth his UI looked. It turns out that he’s using a little trick (which I’d never seen before, but I’m sure is old to many) to smooth of the edges of his UI elements.

Basically, the trick is to create a ring of extra vertices by extruding the edges of the polygon out by a certain amount. These extra vertices take the same color as the originals, but their alpha is set to zero. Mikko calls this ‘feathering’.

In my case, I found that I got good results by feathering just one pixel. Here’s a quick before/after comparison of the my IMGUI check box at 800% zoom:

And here’s a 1-to-1 example showing rounded button corners:

It’s a pretty nice improvement for a very simple technique! If you’re interested in what the code looks like, then either take a look at Mikko’s IMGUI implementation, or you can find the code I use to feather my convex polygons below.

My implementation is a little less efficient since I recalculate each edge normal twice, but I chose to keep it simple for readability.

Comments (1)

Visual Studio Addins Updated

DoItNow & RevisionItNow

I’ve updated the Google Code repository for DoItNow with a newer version. I’ve removed all source control features from DoItNow and separated them into their own add-in. This should make it more compatible with other add-ins you may be using to handle source control. I’ve uploaded the Mercurial version of the add-in, but the full source is available should you want to change it back to Perforce.

I’m only using Visual Studio 2010 at home now, so the project files are all in that format at the moment. I provided Addin files which will work for Visual Studio 2008 as well though.

By request from someone at work, the open in solution dialog now performs matches on multiple (space-separated) search terms.


I’ve been testing out an idea for a replacement for the standard Visual Studio find-in-files. This is the first pass at it (let’s call it an alpha), so download at your own risk! It’s actually sits side-by-side with the existing find-in-files, so it’s pretty safe to install.

Here’s where it’s (possibly) better:

  • It can match multiple search terms and will rank results accordingly.
  • Remembers all settings from the previous searches. Pushing up or down will set things like ‘match case’ based on the search history.
  • It populates the file types drop down based on the files in the solution, and sorts them by frequency.

Here’s where it’s (definitely) worse right now:

  • It doesn’t remember the search paths you used in previous Visual Studio sessions.
  • Since it ranks results, it doesn’t present them incrementally. This means you might have to wait longer to get results.
  • You can’t cancel a search!

FindItNow ranks search results based on the number of hits on each line, as well as hits in the surrounding few lines. In order for a result to even show up, it must have all search terms present in a seven-line block.

The top matches (100% quality) have all search terms on the line in question. Worse quality matches have progressively fewer matches on the line.

e.g. Here are the results for a search I did looking for a quaternion conjugate function on some of my code:

Query: "quat conjugate"
Options: case=ignore, match=partial
Source: Entire solution
Finding... Complete

Match Quality: 100%
c:\Development.old\Libraries\C++\Math\Quaternion.h(79): 		inline Quaternion Conjugate(const Quaternion& quat)
c:\Development.old\Libraries\C++\Math\Quaternion.h(96): 			return Conjugate(quat) / Length(quat);
c:\Development.old\Libraries\C++\Math\Quaternion.h(101): 			const Quaternion result = quat * Quaternion(vec.x, vec.y, vec.z, 0) * Conjugate(quat);

Match Quality: 62%
c:\Development.old\Libraries\C++\Math\Quaternion.h(76): 			return Quaternion(lhs.x / f, lhs.y / f, lhs.z / f, lhs.w / f);
c:\Development.old\Libraries\C++\Math\Quaternion.h(81): 			return Quaternion(-quat.x, -quat.y, -quat.z, quat.w);
c:\Development.old\Libraries\C++\Math\Quaternion.h(94): 		inline Quaternion Invert(const Quaternion& quat)
c:\Development.old\Libraries\C++\Math\Quaternion.h(99): 		inline Vector3 Rotate(const Vector3& vec, const Quaternion& quat)

Total files searched: 407
Matching lines: 34
Find Time: 90 ms
Output Time: 9 ms

I’m finding it pretty useful when exploring for functions I think *should* exist in a large code-base since you don’t have to get the exact string to match.

If you’re interested in either of these, you can grab the binaries here.

Comments (4)

MockItNow: Now with Win64 goodness!

A couple of fine chaps named Julian Adams and Clement Dagneau from my previous company Black Rock Studios (née Climax Racing) took it upon themselves to tackle the daunting task of  porting MockItNow to x64 (MSVC).

They’ve kindly shared their efforts back with the main repository in google code. Feel free to profit from their hard work!

Thanks a lot Julian & Clement!


Adventures with Arduino: From 41 Instructions to 1

A couple of weeks ago, I was looking for a fun project to do with my 8 year old son. By chance, I saw a recent issue of Make Magazine which had a feature about an electronics prototyping platform called Arduino. I thought it sounded pretty cool, and could be a good introduction for my son to some basic electronics as well as some simple programming.

The Arduino is designed to be very easy to use. It comes with a set of tools based on the Wiring programming environment and libraries. The board is controlled by an Atmel 8-bit microprocessor. In comparison to what I spend my days working with, this is a very simple processor. No vector registers, no floating point registers. Even something as simple as adding two 32-bit numbers requires three assembly instructions (add word, add byte with carry, add byte with carry).

Read the rest of this entry »

Comments (3)

What’s wrong with this picture?


Well, you could point out a number of things to answer that question. There’s some pretty obvious aliasing, a random pixel on the ground which should be in shadow but isn’t, it’s noisy, boring etc. But that’s not my point. The point is: It’s too dark!

I know it’s too dark because I know how I rendered it, and I rendered it wrong. It still kind of looks acceptable (well to me at least) though. I’m not sure that I would say that it’s implausibly dark if I didn’t know it.

Read the rest of this entry »

Comments (2)