CodeItNow

Lighting: The Rendering Equation

Introduction

Back in 2002, I started my first job in the games industry at Climax Studios in England. I have to admit, I didn’t know very much about game development at the time. Don’t get me wrong, I’d been writing little 2D games, and messing around with rubbish particle systems at home, but it was nothing like what I was about to get involved with. Despite my inexperience, somehow I did enough to pass the interview, and I was offered a job as a junior programmer.

As seemed to be typical for the time, my introduction to the industry was pretty much a trial by fire. I quickly found out that I couldn’t hope to truly understand every single new thing I encountered, so I learned to just accept some things as the truth. For example, I was told that the dot product of two normalized vectors yields the cosine of the angle between them. I just accepted this, and only took the time to find out why later on.

One of the things I accepted at the beginning was the maths used to perform the lighting of our models during rendering. While I could understand how the equations appeared to yield decent looking results, I never understood where they came from, and why they worked.

After a while I began to wonder about this… Where did the equations for diffuse and specular reflections come from? What are the units of brightness we use for lights? What are the units for the pixels that get rendered?

It took lot of reading, re-reading, and re-re-reading, for me to really understand some of these things, so now that I have a blog, I thought I would share what I learned just in case anyone else is wondering about these things too.

The Rendering Equation

When light hits a point on a surface, some of it might get absorbed, reflected or possibly even refracted. Also, there may be additional light being emitted from that point by a power source, or perhaps scattered in from another point on the surface. Things can get complicated pretty quickly!

Luckily for us, some smart guys came up with something called the rendering equation to deal with these factors. The rendering equation can produce incredibly realistic-looking images, but in its original form, it can also be very costly to evaluate.

I’ve included a slightly simplified version of the rendering equation below. If you compare this to the version currently on Wikipedia, you’ll see that I’ve removed a couple of parameters, t and lamba.

Typically if you have something like the power output of a light varying over time, it’s a better to idea to evaluate it before trying to solve the rendering equation. By doing this, you can assume time is constant, and so you can ignore it.

The lambda symbol in the original equation represents a dependency on the wavelength of the light. Without this, everything would be greyscale, so it’s an important property. Rather than dealing with wavelength explicitly, we can just treat the red, green and blue color channels independently and solve the rendering equation once for each channel. Note that in practice we end up using per-component vector mathematics to solve the equation for all three channels at the same time.

At first, this may seem a little bit intimidating, but actually it’s fairly simple. I’m going to break down each part and explain what it means.

Outgoing Light
This says that the rendering equation is a function which gives you the outgoing light in a particular direction w from a point x on a surface.

Emitted Light

This is any light that is being emitted from the point. Most surfaces don’t emit light, so normally you don’t see any contribution here.

Integral

This says that the enclosed functions need to be integrated over all directions w’ in the hemisphere above x. The orientation of the hemisphere is determined by the normal, n.

BRDF

This is the bidirectional reflectance distribution function (BRDF). It’s a fancy name for the ratio of the amount of light reflected in a particular direction w, to the amount received from another direction w’. The BRDF warrants its own discussion, but for now it can just be thought of as the reflection amount.

Incoming Light

This is the incoming light at the point x from the direction w’. Note that the incoming light doesn’t have to come from a light source (direct light). It may have been reflected or refracted from another point in the scene (indirect light).

Normal Attenuation

This attenuates the incoming light at x based on the cosine of the angle between the normal n and the incoming light direction w’.

Making it Real-Time

We use the rendering equation to perform lighting calculations in games, albeit in a simpler form. The most obvious problem for evaluating the rendering equation in a pixel shader is the integral, so we need to find a way to approximate it. One thing we can do is to split the way we deal with direct light and indirect light.

Since the indirect light is the harder of the two to deal with, we can approximate it. There are various different ways you might want to approximate the indirect light, from a simple ambient color, to more complex forms like spherical harmonics. Typically you would modulate your indirect light approximation with the diffuse part of your BRDF, since you don’t have a directional component to use in direction-dependent BRDFs.


By approximating the indirect light, we only have to worry the direct light in the rendering equation, so we can replace the integral with a simple sum over k light sources. Also, we typically model the change in the BRDF over space by using textures mapped over the surface, so x is no longer needed for that function.

I’ve changed up the notation here to match more closely what you see in games literature. The vector v is the normalized direction from the point x to the camera. The vector li is the normalized vector from light source i to the point x.

It’s not explicitly written here, but the dot product now needs to be clamped to zero to avoid negative light values.
This may not look much like the ‘diffuse plus specular’ equations you commonly see in games, but it’s actually pretty close. In fact, imagine removing the emitted light, and swapping the BRDF function with a constant color value, and you’ll see that we have the equation for calculating diffuse lighting.
The multiply symbol inside the circle just represents the component-wise multiplication of the red, green and blue channels of the color with the corresponding channels of the incoming light.

Units…

You may notice that I’ve been using fuzzy terms like ‘incoming light’ as if it’s something we all know how to measure. A good question at this point might be “what are the units being used for light?”.


Well, there are well defined units at play here, but I’m going to say right now that the units don’t really matter. I don’t know of any game engines where they attribute real-world units to lighting and material values, since those kinds of things tend to be driven more by the appearance than the physical correctness. The relative intensities of, say, a candle and a 100W light bulb are probably more important than the absolute values.

Having said all that, here’s the information anyway: The light is measured in units of radiance (Wsr-1m-2). Looking back at the integral in the rendering equation, you can see that the radiance is multiplied by the differential solid angle. This converts the radiance into irradiance (Wm-2).

Remember that the BRDF is the ratio of light reflected to light received? Well that’s a ratio of radiance to irradiance, so the BRDF has units of sr-1.

One thing to be careful of is that the incoming light in the simplified version of the rendering equation (using the sum over the lights) is measured in units of irradiance (Wm-2). We have to use irradiance directly here since the typical lights we use (point, spot, directional) don’t have any area.

That’s It For Now

I hope what I’ve explained so far was fairly clear, but I know I’ve left some pretty big holes here. I’m going to be looking at some of the following at a later date:

  • What is the difference between irradiance and radiance?
  • What is the BRDF?
  • What is energy conservation, and does it matter for games?
No comments

Catmull-Clark Subdivision: The Basics

For a little under a year now, I’ve been using Modo to learn how to create 3D models. I’m doing this partly for fun, and partly to help me think about game engine features from the perspective of an artist. Along with the traditional poly-modeling tools, Modo (like many other 3D tools) also has support for subdivision surfaces. A subdivision surface basically allows you to create a low-poly control mesh for your model, and to add detail, you let the application divide each face in your mesh according to a set of rules.

Here’s an example:

Notice how few polygons there are in the leftmost model. At each increasing level of subdivision to the right, each face gets split into four new faces to add detail. This allows the square hole with sharp edges to become a nice round hole with smooth edges. If you made a mistake (as I frequently do), then it is relatively easy to fix the mistake in the low poly control mesh because there are so few vertices to deal with.

Sounds good, right? Mostly it is, but there is a fairly steep learning curve to subdivision surfaces. One of the main problems is really understanding how your original control mesh relates to the subdivided version. None of the points in the control mesh are guaranteed to remain in the exact same position once the subdivision has occurred, and working out where these points are going to move can sometimes seem like voodoo. In various forums, arguments rage on about edgeloop modeling, E-Poles, N-Poles, keeping only regular vertices, avoiding triangles, n-gons etc. I really wanted to better understand what was going on during subdivision, so I started to dig a little deeper into the details. In the rest of this article, I’m going to share what I’ve learned so far.

A Little Background First

How exactly does a 3D application decide how to subdivide a control mesh? Well, there are a few different subdivision schemes out there, but the most widely used is Catmull-Clark subdivision. Since the original paper by Ed Catmull and Jim Clark, there have been a number of improvements to the original scheme, in particular dealing with crease and boundary edges. This means that although an application might say that it uses Catmull-Clark subdivision, you’ll probably see slightly different results, particularly in regard to edge weights and boundaries.

Rules of Subdivision

The original rules for how to subdivide a control mesh are actually fairly simple. The paper describes the mathematics behind the rules for meshes with quads only, and then goes on to generalize this without proof for meshes with arbitrary polygons. I’m going to go through a very simple example here to show what happens when a mesh gets subdivided. I’m going to be concentrating solely on non-boundary sections of the mesh for simplicity. Also, the example uses a 2D mesh, but again this is only for simplicity, and the principles expand to 3D without change.

There are four basic steps to subdividing a mesh of control-points:

1. Add a new point to each face, called the face-point.
2. Add a new point to each edge, called the edge-point.
3. Move the control-point to another position, called the vertex-point.
4. Connect the new points.

Easy, right? The only question left to answer is, what are the rules for adding and moving these points? Well, like I wrote earlier, this really depends on who implemented it, but I’m going to show the original Catmull-Clark rules using screenshots from Modo. This doesn’t mean that these are the rules that Modo applies, but it should be fairly close.

Here’s the example mesh I’m going to be using:


No prizes for guessing which face we’re going to be looking at. I’ve made it a little bit skewed because that makes things more interesting. If you’re wondering about the double edges around the boundary, I’m going to be writing about those at a later date, so be patient! For now I need them to make the boundary hold shape, but they can be safely ignored for the purposes of this article.

Step One

Firstly, we need to insert a face point, but where? Well this is easy, it just needs to go at the average position of all the control points for the face:


I’ve drawn red dots approximately where each new face-point will go. The big dot is obviously the face-point for the highlighted face. The location for each dot is simply the sum of all the positions of the control-points in that face, divided by the number of control-points.

Step Two

Now we need to add edge-points. Because we’re not dealing with weighted, crease or boundary edges, this is also fairly simple. For each edge, we take the average of the two control points on either side of the edge, and the face-points of the touching faces.

Here’s an example for one edge-point, in blue, where the control points affecting its position are pink, and face-points are red. Note that the edge points don’t necessarily lie on the original edge!


For the highlighted face only, I’ve drawn all the roughly where each new edge-point will go in blue:

Step Three

This step is a little bit more complicated, but very interesting. We’re going to see how we place the vertex-points from the original control points, and why they get placed there. This time, we use a weighted average to determine where the vertex-point will be placed. A weighted average is just a normal average, except you allow some values to have greater influence on the final result than others.

Before I begin with the details of this step, I need to define a term that you may have heard used in regards to subdivision surfaces - valence. The valence of a point is simply the number of edges that connect to that point. In the example mesh, you can see that all of the control points have a valence of 4.
Alright, I’m going to dive into some math here, but I’m also going to explain what the math actually means in real terms, so don’t be put off!

The new vertex-point is placed at:

(Q/n) + (2R/n) + (S(n-3)/n)

where n is the valence, Q is the average of the surrounding face points, R is the average of all surround edge midpoints, and S is the control point.

When you break this down, it’s actually fairly simple. Looking at our example, our control-points have a valence of 4, which means n = 4. Applying this to the formula, we now get:

(Q/4) + (R/2) + (S/4)

Much simpler! What does this mean though? Well, it says that a quarter of the weighting for the vertex-point comes from the average of surrounding face-points, half of it comes from the surrounding edge midpoints, and the last quarter comes from the original control-point.

If we look at the top left control point in the highlighted face, Q is the average of the surrounding face-points, the red dots:


R is the average of the surrounding edge midpoints, the yellow dots. Note that the yellow dots represent the middle of the edge (just the average of the two end points) which is a little bit different from the edge-points we inserted above because they don’t factor in the nearby face-points.

S is just the original control-point, the pink dot:

Below is my rough approximation of where all these averages come out, using the same color scheme as before. So the yellow dot is the average of all the yellow dots above, and the red dot is the average of all the red dots above.


Now all we do is take the weighted average of these points. So remember that the red dot accounts for a quarter, the yellow dot accounts for half, and the pink dot accounts for the final quarter. So the final resting place for the vertex-point should be somewhere near the yellow dot, slightly toward the pink dot, roughly where the pink dot is below:


So now I hope you can see why the control point is going to get pulled down and to the right. All the surrounding points just get averaged together and weighted to pull the vertex around. This means that if a control-point is next to a huge face, and three small faces, the huge face is going to pull that vertex towards it much more that the small faces do.

One other thing you can see from this is that if the mesh is split into regular faces, then the control point won’t move at all because the average of all the points will be the same as the original control point. The pull from each face-point and edge midpoint gets canceled out by a matching point on the other side:

Step Four

The final step is just to connect all the points. Confusingly, Modo has two ways to subdivide a mesh which actually return slightly different results. I don’t know why this is, or how the subdivision schemes differ, but they are close enough for all intents and purposes to call the same. For clarity, I’m using a screenshot from Modo where I have subdivided using the SDS subdivide command. For the highlighted face only, I’ve drawn the new face-point in red, the new edge-points in blue, and the moved vertex-points in pink:


Below is an overlay of the original control mesh for reference. As expected, you can see that the top left control-point gets pulled to the right. The same process is applied to all faces, resulting in four times as many quads as we had previously.

One More Thing

If you look at the formula for moving the control-point to its new location, you can see that not only are the immediate neighbor points used, so are the all the rest of the points in the adjoining faces. How much any single one of these surrounding points affects the control-point isn’t very clear from the original formula. This information is actually really easy to find out just by substituting in the surrounding points into the original formula.

I’m making the assumption from hereon out that all of the surrounding faces are quads to make things easier on myself. Also, I’m not going to write down each step of the expansion here since it got pretty big, but at the end, I got the following weights:

Control-point weight = (4n-7) / 4n
Edge-neighbor weight = 3 / 2n^2
Face-neighbor weight = 1 / 4n^2

I’m using the term edge-neighbor to denote a neighboring point sharing the same edge as the control point, and face-neighbor as a neighbor that only shares the same face as the control-point. In the picture below, the edge-neighbors are cyan, and the face neighbors are yellow. Note that for quads, you have exactly n edge-neighbors, and n face-neighbors.

Applying this formula to various different valences, you get the following weights for a single point of the given type:

Valence
Control
Edge Neighbor
Face Neighbor
3 5/12 3/18 1/36
4 9/16 3/32 1/64
5 13/20 3/50 1/100
Infinity 1 0 0

What does this mean? If we look at a the valence-4 row, you can see that if you move a face-neighbor, that’s going to affect the position of the vertex-point by 1/64th of however much you move it. So if you move it 64 cm on the x-axis, then the resulting vertex-point will move 1 cm in the same direction. I tested this scenario out in Modo, and it seems to match up very closely.

Clearly, the edge-neighbors are weighted more heavily than the face-neighbors, and the control-point itself always the most influence on the final vertex-point location. Remember though, each point represents a different thing (control-point, edge-neighbor, face-neighbor, nothing) depending on which particular face is getting subdivided!

It is interesting to note what happens as the valence gets larger and larger. The control-point influence tends towards 1, while all the neighboring points tend to zero. The reverse is also true, where at valence-4, the control-point weight is about 0.56, at valence-3 it is only 0.41 which means it is getting pulled around by its neighbors a lot more.

Wrap Up

I’ve learned a lot while investigating Catmull-Clark subdivision. There’s a lot more to learn, and a lot more to share, but I don’t have the time right now. This post was much longer than I thought it would be, so I’m going to look at some other aspects of subdivision modeling in later posts. One of the next things I’m going to look at will be how the rules change for boundary points, since these are often the bane of my modeling life…

I’m not really sure this will make me a better modeler, but at least I have a much better understanding of what will happen to my mesh when it gets subdivided. Hopefully I will be able to apply this knowledge to make my modeling process more predictable.

Remember that this only applies to non-boundary/crease points, and that different applications may use different weights for the points.

If you’re interested in checking out Modo, they have a 30 day trial available on the Luxology website. It’s really rather good, and I’ve seen some absolultely stunning images on the forums which were modeled and rendered using Modo.

No comments

Counting Objects In A List in C# 3.5

While I was reading through some C# 2.0 code that a friend of mine wrote, I noticed that he was finding some objects in a List using a foreach loop. As soon as I saw that, I thought to myself that I would probably have done that using FindAll with an anonymous delegate.

That got me thinking about all the different ways you can find things in a List in C#. I’ve recently upgraded to using Visual Studio 2008, so with the advent of C# 3.5 there are even more ways to do something like this. Which method is best is not very clear, so I thought it was time for some performance tests.

The Test

I decided that a nice and simple performance test would be to find all the numbers less than 10,000 in an array of 1,000,000 random integers. To even things out, I perform the same test 100 times for each method, and take the average time. I generated the set of random numbers once, and then used that same set in all tests.

Techniques

So far, I’ve thought of eight different ways to implement the performance test. There are probably more ways, but these seem to be a fairly sensible set:

  1. Use a for loop and index into the list.
  2. Use a foreach loop.
  3. ForEach function in List.
  4. FindAll function in List with an anonymous delegate.
  5. FindAll function in List with a lambda function.
  6. Count function in List with an anonymous delegate.
  7. Count function in List with a lambda function.
  8. Linq.

Results

I ran the tests on my laptop in the standard release config. I verified that all implementations returned the same number of items.

Technique
Time(ms)
For Loop 8
ForEach Loop 6
ForEach Function 10
FindAll Function Delegate 11
FindAll Function Lambda 11
Count Function Delegate 34
Count Function Lambda 34
Linq 19

Well it turns out that my friend had indeed picked the best way to find things in an unsorted list. I was surprised that the for loop was so much slower than foreach, but then I took a look at the IL. The foreach implementation had one virtual function call to GetEnumerator at the top of the loop, and then used non-virtual function calls to get_Current on the enumerator inside the loop. By contrast, the for loop had virtual function calls to get_Item inside the loop.

Clearly, using delegates and lambda functions makes no difference. Looking at the IL in Reflector, they generated identical delegates. That’s enough for me to avoid delegates wherever possible, since the lambda functions were much easier to read.

I had heard about the relatively poor performance of Linq, but I was surprised that it was really three times slower than the best solution. There was more code to write than using FindAll with the lambda function, so it was no more brief either.

While I was looking though the list functions, I came across Count, so I thought I would include it here. I don’t know why you would ever use this function given how terribly it performed. Initially when I saw it, I thought that maybe it was used internally for Linq, but I don’t suppose it can be. I’d be interested to know what its purpose is.

I hope this was informative!

p.s. Charles just pointed me to this page explaining some of the reasons behind the current performance of Linq.

No comments

Nothing To Report

Well, there’s not much to report really. I’ve been spreading myself very thin recently, investigating a few different things, so I thought might be nice to post a status update.

Variance Shadow Maps

Yes, I’m jumping on the variance shadow maps boat. They look nice, they’re easy to implement, and have relatively small overhead over traditional shadow maps. I read the article in GPU Gems 3, which is also available here, and it was pretty easy to implement from there. It probably took me about 2 hours to go from no shadows at all to variance shadow maps. After a bit of tomfoolery to get rid of the light-leaking I ended up with the image below. I haven’t implemented any bias yet, so there are some errors there, but the VSMs appear to be working very nicely.

Better Tone Mapping

My current tone mapping algorithm doesn’t look that good, so I’ve been investigating any alternatives. I was initially put off Reinhard tone mapping when I first looked at the DirectX HDRLighting sample a long time ago because it treated each channel completely independently. This is a really nasty idea since it can change the hue of the image during tone mapping as each channel gets compressed by a different amount. I recently re-read a gamedev article and then implemented Reinhard’s tone mapping using based on the pixel luminance, which gave much nicer results, and didn’t change the hue. One note on that article though: It says that to convert to RGB from the final luminance you need to multiply by the luminance, which doesn’t seem right. What I did was to divide by the old luminance and then multiply by the tone mapped luminance. It’s hard to tell right now if it looks better or worse, it’s just different.

The image above is using this new tone mapping, but I’m not doing the dynamic image key calculation at the moment. That basically gives me three variables (image key, midzone luminance, and white luminance) which aren’t very intuitive to tweak. I’ve generally been leaving the midzone gray at 0.18 as suggested, and then tweaking the image key and white luminance to get something that looks OK.

Windows Presentation Foundation

I’ve spent a lot of time at work recently improving one of our C# tools, which is Windows Forms based. There are many things I dislike about Windows Forms, like the lack of built-in support for something useful like the command-pattern, the horrible input handling, menu shortcuts, you name it! For my own education, last weekend I started reading about the latest and greatest from Microsoft: The Windows Presentation Foundation. From my brief foray into WPF, it seems that a lot of the the things in Windows Forms that I’ve spent time trying to fix or circumvent appear to have been adressed.

For example, in Windows Forms, if you don’t handle a key-press event in a control, then no-one else gets to handle it either, unless they have specifically attached an event handler to that controls key-press event. In WPF they have the notion of tunneling and bubbling commands. This basically means that if you don’t handle that event then your parent gets a chance to handle it (it bubbles up), and vice-versa for tunneling events.

There seems to be a much greater emphasis of splitting the View from the Model (in the Model-View-Controller sense that is), with only very loose references (or none at all) required to bind everything together. I’m very interested in finding a good method for splitting the View code from the Controller and Model code, so I’ve been reading a few articles, some of which you can read here: Model-View-Controller, Model-View-ViewModel, Presentation Model. I really liked Martin Fowler’s description of Presentation Model in particular. The Model-View-ViewModel paper was also a good read, but I really dislike the resuse of the words Model and View in the terminology (it’s doubly confusing since I already have C++ classes called Model and View representing a 3D model and a view point respectively). In Windows Forms you seem to be almost forced to keep your View and Controller code tightly bound, but there appears to be much more freedom in WPF.

I also bought a book about WPF by some bloke from Microsoft called Adam Nathan, which so far has been some good reading (despite the liberal sprinkling of exclamation marks on every page). I’d love to see a book that combines WPF and something like Model-View-ViewModel to create a real-world, complicated application rather than the fairly simple examples that appear in books and online articles. If anyone knows of a book like this, then please let me know!

Oh, but it’s not all roses in WPF though. One of the first videos I watched was this one. If you’ve seen it, then you probably have the same same thought running around your head that that I did when I first watched it: WTF? You’re basically allowed to reuse some storage from a dependency property that isn’t used to affect your control. For example, assuming you’re not parented to a DockPanel, you can use something like DockPanel.Width to store an arbitrary number. It’s like the Tag in Windows Forms, but much much worse. Microsoft seem to be touting it like a really cool feature, when it feels more like a dirty hack to me. Using unused properties to store arbitrary data is a really quick way to make your code unmaintainable as far as I’m concerned. I don’t want to have to search the code for where someone put the data into the property in order to work out what it is actually storing… Urgh!

Articles

Yeah, I’m about 80% through writing one about storing function call arguments in MockItNow at the moment. It’ll be up as soon as I finish the last 80%.

That’s all!

3 comments

Materials

Introduction

A while back, I wanted to improve the material system I was using in my engine at home, since at the time, I had just just one fixed material type. I had to specify three textures for each object I wanted to render: Diffuse, Specular and Normal. If I didn’t want specular, I had to use a black specular texture. This was barely OK for very simple things, but soon I wanted more…

I’m not really going to mention anything about the lighting in this article, since that’s still a work in progress (as can be seen by the lack of shadows below), but I hope this will be interesting nonetheless.

Constraints

Designing a material system can be a bit of a can of worms, so I thought for a while about what features I really wanted.

Only One Shader

This first constraint is really all about being able to maintain the shader code, and reduce the amount of C++ code I need to write in order to use it. I’ve seen a few different engines that piece together different fragments of HLSL to form shaders. This is pretty nice since there’s absolutely no redundant code that gets compiled in, but the problem is that you can end up with lots and lots of shaders. You then need to write code to load the correct shader when you need it, write code to compile the shaders on the fly if they don’t exist etc. Compiling all the permutations is an option, but you can quickly end up having to compile thousands of shaders despite the fact that you may use only a handful. Since I’m just one guy, I want something quick to implement, and quick to maintain.

Optional Material Features

Obviously my first material system sucked because it was hard coded to one specific material type. I had to do all the specular calculations even if I didn’t want specular. Not only was this a waste of precious pixel shader instructions and texture fetches, it was also annoying to have to specifically ‘zero-out’ features that I didn’t want to use. I wanted to be able to just specify the material properties that I do want to use.

Optional Textures

When I was playing around with creating materials, I would quite often find myself wanting to have a solid color. The way I had to do this in the old system was to create a constant-colored texture. I really wanted to just be able to specify a base color for something like diffuse, then if I want to override it, I could specify a texture as well.

Implementation

There were really only two things I had to do inside my shader in order to satisfy my constraints:

  1. Use branches inside the shader to choose whether or not to execute the code for a particular feature.
  2. Use the fact that DirectX 10 returns 0 from an unbound sampler to lerp between base colors and texture samples.

Uber-Shaders and Branching

Using branches in a shader isn’t a new idea by any means. You’ve probably heard of uber-shaders before, and that’s exactly what I implemented. An uber-shader is basically a shader that does everything, but you can turn features on and off using branches. One thing you need to beware of though, is that there are two types of branches inside shaders: Static and Dynamic.


When you use static branching, you are using the contents of a constant register to decide which branch to take. As such, the shader knows which side of the branch it is going to take before the shader even executes. This means that the branches that are not taken are skipped completely, and so this is a very efficient form of branching.

Dynamic branching uses the contents of a variable inside the shader to make the branch decision. Microsoft says that “the performance hit is the cost of the branch plus the cost of the instructions on the side of the branch taken”. At first, this seems pretty good, since the performance hit of a branch instruction (if, else, endif) is only two cycles, but this isn’t quite the whole truth. When pixels are being shaded, they are part of a pixel quad. This is just a group of four pixels that are being shaded at the same time (I assume that the reason for this is to allow gradient operations to work correctly or something). All the pixels in the quad get shaded simultaneously, so if one pixel takes one side of the branch, and another takes the other side, the shader must execute both sides for the quad. This is actually worse than if there was no branch at all, since you have paid the cost for both sides of the branch, plus the cost of the branch instructions too!

Given that fact that I know at draw time which branches I want to take before each draw call, I use static branching to turn features on and off.

Sampling From Unbound Textures

There was a change in behavior in Direct3D 10 where reads from unbound samplers now return zero rather than one. This turned out to be pretty convenient, since I use this to sample from textures in a slightly different way. I just wrote a function which samples a texture, then uses the alpha channel to blend between a predetermined value and the sampled value modulated with the predetermined value.

That sentence was probably a bit wordy, here’s some psuedo-code:

float3 LerpSample(float3 baseValue, Texture2D sourceTexture)
{
float4 sample = texture.Sample(texcoord, sampler);
return Lerp(baseValue, baseValue * sampler.rgb, sample.a);
}

So imagine that I have a red base color for my diffuse color. When I call LerpSample on an unbound sampler, the alpha value is zero, so I just get back my original red color. When the sampler is bound, then the alpha value is used to blend between the base value, and the tinted sampled value.

float3 diffuseColor = LerpSample(material.baseDiffuseColor, diffuseTexture, diffuseSampler, texcoord);

For sampling a normal map, I use a slightly different function, since I just want either the base value, or the sampled value, not a blend between them.

Material Descriptions

I store my materials in xml format, and parse them and convert to a runtime format as part of my pipeline. This allows me to leave out whole sections (like diffuse, specular, reflective etc) very easily, and it is also very simple to edit. Take a look at a sample material xml file here.

Conclusion

Well that’s it really. It’s still a rather rudimentary material system, but it’s a big step up from what I used to have. I’d like to add support for multiple textures of the same type at some point, but I don’t really need it right now. Overall I’m pretty satisfied, and so far it has been very easy to use.

Here’s a breakdown the different layers used in the wood material:

Ambient Occlusion:



Indirect lighting approximation:

Direct diffuse lighting:

Direct specular highlights:

Reflections from an environment map:

All together:


And that’s it! I’ll talk a little bit about the lighting models I’ve tried out, and my current lighting solution next time.

No comments

MockItNow: Sample Code Added

I’ve just uploaded a sample project for MockItNow to Google Code. Get it here.

This file includes the Examples project, the latest version of MockItNow, and UnitTest++. The projects are provided in Visual Studio 2005 format, and should compile out of the box.

Please note that this is a slightly modified version of UnitTest++, since the current release doesn’t catch std::exception in all cases. I’m going to work with Charles at some point to integrate these changes back into the main branch of UnitTest++.

I’ve provided three configurations for the Examples project: Debug, Release and Final. Debug and Release configurations both run the unit tests, but Final has optimizations turned on, and so does not run the tests.

If you’re just interested in seeing what the the test code looks like when using MockItNow, you can browse the example file.

Enjoy!

No comments

MockItNow: Redirecting Function Calls in C++

Introduction

his is the first in a short series of articles I’m going to write about the way MockItNow works. I’m going to start off with something pretty meaty: How does MockItNow intercept the function calls and redirect them?

I tried a few ways of intercepting function calls and recording their parameters, and maybe one day I’ll explain ways not to do it, but for now, read on if you’re interested in how it works at the moment.

How Does MockItNow Intercept Function Calls and Redirect Them?

The short answer to that question is:

MockItNow uses a compiler flag to hook a call to a naked function at the beginning of every function call, before the prologue. This hook function jumps to a predetermined interceptor function and manipulates the stack pointer to cause the program to return back two levels in the stack when the interceptor function returns.

If I’d read this eight months ago then I’d be pretty confused right now. I knew roughly what a stack frame is, but special hooks? Prologues? Naked functions? Manipulating the stack pointer? That was all mumbo jumbo to me. If some or all of those are also mumbo jumbo to you, then you might want to read on. I’m going to explain what these things are, and why I needed to learn about them.

First Things First

I think it’s going to help to start off with an idea of what’s going on with the program when the mocker is intercepting and redirecting function calls. In the following scenario, we are calling a function Bar from a function Foo. Bar is going to get intercepted by the mocker and redirected to a replay function.

  1. Foo calls Bar.
  2. Bar calls compiler hook function _penter.
  3. _penter locates the replay function address and sets it into a variable jumpAddress.
  4. _penter moves the stack pointer.
  5. _penter jumps to Replay.
  6. Replay makes a note of the function call and returns
  7. Program flow returns back to Foo

Below I’ve written out a simplified view of the assembly code that this scenario would generate. I’ve interleaved the instructions to illustrate the program flow through the various functions. To help out with identifying which function is which, I’ve colored them all differently.

The important thing for the MockItNow is that we must never execute the body of Bar, which is highlighted in the second green section. If you’re smart and like this kind of thing, you might be able to work out why the body of Bar will never be called. If not, don’t worry - I’m about to explain.

Assembly Full

There are a few important things to note here:

1. The call to _penter occurs before the prologue of Bar executes.
2. _penter adds four bytes to the stack pointer in its epilogue.
3. Instead of calling ret at the end of the _penter, it uses jmp to jump to Replay.

In the next few sections, I’m going to explain some details behind why these points are important. I’m also going to fill in a little bit of background knowledge that I had to learn in order to work this out.

Stack Frames, Prologues and Epilogues

If the assembly listing didn’t put you off, you may be wondering at this point, what is a prologue? Well, as you can see in the assembly above, each function performs the exact same two operations at the start of the function:

   push ebp
   mov ebp,esp

This is the prologue! Note that if a function has any local variables then there will be another line in the prologue to allocate space for them, but in this case we don’t have any, so I’ll ignore it.

So what’s going on here? The point of the prologue is set up the current stack frame. The stack frame basically stores all the information about the current function we are in. It is here that your stack variables get stored, along with any other context the program needs to call and pass arguments to other functions, and to restore the previous stack frame.

All the prologue does is to store the previous stack frame location (ebp) by pushing it onto the stack and sets ebp to store the current stack frame location (esp).

You’ll probably also notice that apart from the _penter function, all the functions do the exact same thing at the end:

   mov esp,ebp
   pop ebp
   ret

This is called the epilogue, and it just moves the stack pointer back to where the old base pointer is stored in the stack frame, and then pops the base pointer, which restores it. One important thing to remember here is that when you call pop, it restores whatever is at the current stack pointer location into the register provided, and moves the stack pointer.

I suspect that most programmers aren’t even aware of prologues and epilogues because the compiler automatically generates a standard prologue and epilogue for each function you write. It even works out the size of all your stack variables so that it can make the stack frame big enough. I certainly never had to deal with it until I started looking into MockItNow.

There is, however, one type of function for which the compiler won’t auto-generate prologues and epilogues. These functions are called naked functions. You can declare your function as naked using __declspec(naked) before the function name.

You’ll almost never want to use a naked function unless, of course, you need to customize either the prologue or epilogue. This is exactly what needs to happen in the _penter function in order to divert the function call.

Manipulating the Instruction Pointer

There are a couple more things I need to explain before I can go through an example of how MockItNow actually works: I need to explain what happens to the stack when the call and ret instructions are encountered, and also difference between call and jmp.

I’ll start my explanations with a quick introduction to the instruction pointer register (eip) and how it is changes as a program runs. The instruction pointer stores the address of the instruction that the program is currently executing. Normally when an instruction is executed, the instruction pointer is moved to the address of the next instruction. The call, ret and jmp instructions are all exceptions to this rule though, since their sole purpose is to manipulate instruction pointer in very specific ways.

When you call a function, the instruction pointer moves to the next instruction as usual. The value of the instruction pointer register then gets pushed onto the stack, and the program sets the register to the address of the first instruction in the function you are calling. This means that the next instruction to get executed is the first instruction of the function you have called.

So how does the program know which address to copy into the instruction pointer register when you call a function? For most calls, the answer is easy: The linker tells it! If you take a look at the disassembly for a function call, you’ll normally see something like this:

   call Foo (123456h)

The number in brackets is the address that the instruction pointer is going to jump to when the call instruction is processed. Sometimes you may see the program make a function call using a register like eax. This happens if you are calling a virtual function because the function address wasn’t known at link time.

Beware that if you have incremental linking turned on then address you see in the call instruction probably isn’t the address of the actual function, but actually the address inside the incremental link table. This descrepency between the real function address and the call address is why MockItNow doesn’t work when incremental linking is turned on.

I suspect that you’ve guessed by now that the reason the call instruction pushed the old instruction pointer register onto the stack is so that the ret instruction can pop it back off. This is exactly what it does to return program flow back to the caller function.

You’ve probably also guessed that the difference between call and jmp is that jmp doesn’t push the instruction pointer onto the stack before changing it. We’re going to exploit this difference in a bit.

Putting it All Together

Hopefully everything I outlined above is clear, because now I’m going to show a simplified view of the state of the stack as the program in the scenario above executes, and we’ll see why the body of Bar never gets executed.

Remember, the contents of the stack are based on the address of stack pointer, so if you change the stack pointer, you change the contents of the stack. Also, ebp is the base pointer for that function’s stack frame, and eip is the next instruction to execute within the function.

I’m going to assume that Foo is called directly from the Test function, so the first thing that gets pushed onto the stack is the Test function return address and base pointer. The stack is represented by the table below each significant event. The top of the stack is the top of the table.

Assembly 1

There’s nothing out of the ordinary here so far. This is what you’d expect to see for a function call to Bar. It just pushed the return address for the instruction pointer onto the stack.

Assembly 2

Now things are looking a bit interesting. We have the return address for Bar on top of the return address for Foo. We’re going to manipulate this in a bit.

Assembly 3

There’s nothing out of the ordinary here so far for _penter. It has done the standard prologue, and it has done the first two instructions of the standard epliogue.

Assembly 4

Finally, this is the part where the good stuff happens! The stack pointer was incremented by four bytes, which basically removes the return address for Bar without restoring the instruction pointer. There is no trace left in the stack that Bar was ever called! The program now uses the jmp instruction to move program exectution to the Replay function. It uses jmp rather than call so that the instruction pointer register doesn’t get stored onto the stack.

Assembly 5

This function has the standard prologue and epilogue, but before the ret instruction is executed, I wanted to highlight the fact that the address on the the top of the stack is an address inside Foo.

Assembly 6

We have now jumped up two levels in the stack. I write ‘two levels’ because we were never really in Bar. Instead of returning back _penter, or to Bar, we have returned Foo. This is really important since the Replay function may be returning a value, and the next instructions in Foo will retrieve that return value.

Assembly 7

The rest is just some cleanup!

I hope that you can see how manipulating the stack caused the instruction pointer to return back from Replay to Foo. The body of Bar was never executed.

If Bar has a return value, we return a preset value from Replay, and Foo never knows that Bar was intercepted! As long as Replay has the same calling convention as Bar, you don’t have to worry about how the return values are propagated back.

Summary

Looking back to the beginning of the article, I highlighted three important points. I hope you have a better understanding now I highlighted these points in particular.

1. Bar never gets the chance to change the state of the stack. This means we don’t have to worry about removing the stack frame when we return from _penter.
2. Adding four bytes to the stack pointer removes the return address back into Bar from the stack without changing the instruction pointer. This makes it appear as though _penter was called directly from Foo.
3. Because the call instruction wasn’t used to enter the Record function from _penter, the instruction pointer was never saved to the stack, so the return address still points back to Foo.

I hope this was informative and perhaps even a little bit interesting. Next time I’m going to write about some of the the challenges I encountered when working out how to store function arguments and return values.

References

http://en.wikipedia.org/wiki/Function_prologue
http://msdn.microsoft.com/en-us/library/c63a9b7h.aspx
http://www.swansontec.com/sregisters.html

1 comment

Welcome!

I finally took the plunge to set up my own website! I’ve been wanting to do this for a while to be able to share some of the things that have interested me as I spend my spare time writing a small game engine. I hope that someone somewhere finds these things interesting!

One other reason that I set up the site is to have central place to store information about a couple of small pieces of software I wrote: DoItNow and MockItNow.

DoItNow is a little addin for Visual Studio 2005 that I use both at work and at home to speed certain parts of the development process up. It’s like Visual Assist, but not as good, and freer.

MockItNow is a C++ mocking framework. If you write unit tests in C#, then you have probably tried out using something like RhinoMocks, or maybe TypeMock. There isn’t much choice in C++, so I wanted to see if I could write something.

MockItNow was a great venture into the unknown for me. I wasn’t at all sure that I would be able to pull it off at the start. In fact, I had two failed attempts (based on ludicrous ideas) before I landed on what I use now. Along the way I learned a lot about areas I’d never looked at in great detail before, like stack frames, x86 assembly, calling conventions, type traits… The list goes on and on. I also learned some nasty little tricks that you can do in order to get at certain pieces of information I needed along the way too. I wouldn’t recommend using some of these techniques, but I had to do what I had to do.

I’ll be providing some more of the gritty details about how MockItNow works over the coming weeks, but If you can’t wait, then feel free to download it and take a look.

The site’s still a bit of a work in progress at the moment, but I set myself the goal of getting something up this weekend, so here I am. Unfortunately the great weather stole a lot of my time, but be sure that I’ll be improving things as I learn more!

1 comment