Craig Reynolds introduced a paper at GDC 1999 to deal with what he described as the middle level of a three-layer motion system (goal setting and strategy, steering behaviors and locomotion). The steering behavior system decomposes all of the decisions into smaller decisions focusing on one concern, and deals with merging the resulting directions in a final decision stage.
This system is very easy to implement, and is also very intuitive at first glance. In the paper, various behaviors are described as well as combinations of behaviors.
However, it is quite possible for the individual decisions made by each behavior to cancel each other out. For example, a seek behavior may want the entity to move due East, but an evade behavior may want the entity to move due West. In this case, it is typical to have some kind of weighting for each behavior to overcome the stalemate. But could a better decision be made?
Andrew Fray describes the steering behavior problem in this blog post and describes a potential solution called context behaviors. He goes into further detail about context behaviors in the second half of this GDC 2013 talk.
The general idea of context behaviors is for each behavior to decide how much it wants to move in each of a particular set of directions, and then to make the final decision based on looking at all possible directions. This is implemented via two context maps, one for expressing interest in each direction, and one for expressing danger. Each value in a context map is a normalized [0, 1] number expressing the degree of danger or interest.
The interest and danger maps are evaluated based on some decision-making heuristic, and one of the directions is selected as the desired steering vector.
In my initial implementation, I found it quite difficult to tune the heuristic in such a way as to balance the danger and interest. For example, when moving to a patrol point very close to a wall, the danger of colliding with the wall would override the interest of patrol point and the entity would never arrive. A fix for this was to calculate a danger threshold (based on knowing how the danger values translate to distance) for that particular patrol point. This meant that the entity could finally arrive, but it was also highly tolerant of danger in general the closer it got to the patrol point.
I would like to briefly describe a few decisions I made while implementing context behaviors that I think might be of interest.
The first change is to include the distance of the interest or danger along each of the directions. This creates a nice closed space around the entity for both interest and danger. Armed with this information, values of high interest that are nearer than the danger in the corresponding direction can be trivially accepted. If the danger is nearer than the interest, then again we have to fall back to a heuristic that will weigh the danger of going in that direction with the interest.
The decision-making heuristic can now also consider how far away the danger is, so it can perhaps choose to move in a dangerous direction for a while, safe in the knowledge that the actual danger is far away.
The output of the decision making process is another context map with the normalized decision values for each direction. The obvious thing to do is to just pick the direction with the highest decision value. This works as expected, but can produce rather stilted motion due to the limited set of directions the entity is allowed to move in.
Andrew Fray briefly mentions the idea of using linear interpolation to extract a better decision direction. I chose instead to use cubic splines to interpolate the decision values since they are continuous as long as the tangents are selected consistently. The decision context map doesn’t explicitly have tangents, so the obvious thing to due is to calculate them via central differencing. At this point this is equivalent to using a Catmull-Rom spline. Catmull-Rom splines require four points in order to calculate a smooth value which normally creates issues at the beginning and end of the spline, however since our domain wraps around there isn’t an issue.
One nice thing about using cubic-splines is that you can take the derivative of the cubic function and then quickly find the maximum value over the entire spline. This provides a nice continuous decision direction (well, as continuous as the decision map is). Be aware that by interpolating like this, values can go outside the normalized range!
I started by using eight direction vectors (I’m talking 2D here) for the context space, where the first vector pointed down the world X-axis. The problem with this is that the direction of primary interest almost certainly doesn’t align with one of the directions. I would find that danger values (based on ray casts) could appear very late and at high danger values, causing the entities to swerve at the last minute.
I then tried aligning the directions with the linear velocity of the entity. This worked somewhat better but it caused quite a lot of flickering as the context maps changed quite considerably based on their primary direction. Interpolating between context maps for the previous and current frames would probably help this quite a bit, but I didn’t implement it.
What I’m currently doing is aligning the context maps in the direction of the high-level goal position. There may be other potential goal points, but one is the most important and so everything is aligned to this. By doing this, the most desired direction is always aligned with one of the context map directions.
In this example, the entity is trying to move to a patrol point on the to the left of the large grey rectangle obstacle. There is no path to the patrol point, so the entity is relying purely on steering.
This shows the space formed by the distances in the interest context map. You can see that the context map is aligned to point from the entity towards the goal. Note that there is some interest expressed in going away from the patrol point. This is critical for decision-making purposes to allow the entity to back out of dead-ends.
This shows the interpolated values of the interest map. The circle represents an interest value of 1. In this case the interest value for each direction is simply proportional to the angle to the goal point.
Absent of any danger, the direction selected in this example would be to move directly towards the patrol point. Of course this would make the entity collide with the environment.
This next example adds danger for the static obstactles in the environment.
This is for the same situation as above, but now shows the danger distances. These have been calculated by ray-casting into the environment. This provides a good, but not perfect, approximation of the space that is safe to move in.
For static obstacles, the normalized danger values are proportional to the distance to the collision up to a specified maximum.
This shows the final decision map that was generated via the decision-making heuristic. Despite wanting to move directly to the patrol point, the high danger in that direction has forced the entity to look elsewhere. The ground collision has all but ruled out moving downwards, so the decision selected (the yellow line) is to move up and fowards.
This example looks at a situation where avoidance of other entities (all dynamic entities have the cyan circle around them) is required. The entity is still moving to a patrol point far right of the screen.
Each pair of entities is tested to see if there is a potential future collision based on their current position and velocity. The potential collision positions are marked with the yellow lines and circles. Note that the entity on the left is predicting a collision with the entity on the right, but the entity on the right has selected the closer collision with the central entity as the one to be concerned about.
This shows the danger distances for the central entity only. The static obstacle distances are already present. The impact point has been projected onto any context map direction within 45 degrees of the impact direction. You can see that it has cut out a large space in the general direction of the predicted collision.
Like the static collisions, the dynamic collision values are initially based on the normalized predicted distance to impact. Additionally for dynamic collisions, directions which move away from the predicted impact point are scaled down. In this case, steering to the left of the potential collision is shown to be a dangerous option, but steering right is less dangerous.
The final decision shows a desire to move below the other entity as you might expect.
This briefly shows the decision map in motion. I’m assigning patrol points by mouse-clicking. Despite the lack of path-finding, the entity is capable of steering around quite a variety of obstacles.
I’ve mentioned path-finding a couple of times in this post, and I should mention that this is not a replacement for path-finding. There are absolutely situations where the entity can get stuck if the environment is the wrong shape. This kind of steering would work well when moving to path waypoints though.
As I mentioned, I’m currently generating new context maps each frame, but I think that interpolating between maps for the previous and current frame would probably be a good idea. Another option would be to just damp the context maps from the previous frame and then run the context behaviors as normal. Compared to interpolating the maps, this method can deal with sudden, imminent danger much better.
]]>
A quick aside for those not familiar with Unity: Each script has three different update functions that can be called. Update is called as you would expect, and LateUpdate is a version that’s called well, later on in the frame. Both of these should use the global (yikes) Time.deltaTime to access the variable frame time. FixedUpdate uses Time.fixedTimeDelta and runs on a fixed timestep, so can potentially run multiple times per frame.
It seems to crop up on the Unity forums again and again and again – the correct usage of linear interpolation for damping a value. You have a value, a, and you’d like to smoothly move it towards another value b, so you decide to use linear interpolation with some arbitrary rate r. If you do something like this in your variable rate update function (Update or LateUpdate), then you have a problem:
a = Mathf.Lerp(a, b, r);
This code is broken because it takes a chunk out between a and b each frame and we know that the frame rate is variable, so the smoothing will also be variable.
Once you figure this out, perhaps you do some research, and you find in the Unity docs that you should really be doing this:
a = Mathf.Lerp(a, b, r * Time.deltaTime);
Except, wait, this isn’t right either… The interpolation parameter can now potentially go over 1, which is not allowed. This is both wrong and even wronger. So what next?
Perhaps you decide that you should go back to the original lerp without deltaTime, but this time put it inside FixedUpdate instead.
I have some bad news for you… There is still a potential problem here. It’s a bit subtle, but things that run inside FixedUpdate are almost never at the correct state for the current frame since the update rates are different. This means that they require either extrapolation or interpolation to be displayed smoothly. Unity has an option to turn on extrapolation or interpolation for rigid bodies, so if you have this option on and you’re lerping a rigid body property then lerping will work as you would expect. However, if you’re lerping a value that isn’t extrapolated or interpolated then the smoothing is technically smooth as far as FixedUpdate is concerned, but you can still see stuttering on the screen.
One other problem that cannot be circumvented by using a fixed update is that if you need to change your update rate (for example you want to run physics at 100 Hz) and you use the plain lerp then your smoothing values will all need to be retuned.
Let’s simplify things, and look at what we’re trying to actually achieve here. For now, let’s assume that we’re always interpolating towards zero.
One way of looking at this problem would be to ask how much of the initial value should be remaining after one second. Let’s say that we have an initial value of 10, and every second we would like to lose half of the current value:
Let’s look at a graph of how this looks over time. We can see that it’s a nice and smooth curve going from our start value 10 down to almost zero. It will never quite reach zero, but it will get very close.
Looking at the number sequence, we can generalize it pretty easily to:
Or for an arbitrary rate rate r in the range (0, 1):
What happens if we look more than one step ahead of the current value?
I hope the pattern is clear here, so we can say even more generally:
This means that we can take our value at our current time t and calculate the value for an arbitrary time in the future t + n. It’s crucial to realize here that n doesn’t have to be an integer value, so it’s quite fine to use deltaTime here. This means that we can now write a frame-rate aware function that will damp to zero and use it inside our variable rate update functions
// Smoothing rate dictates the proportion of source remaining after one second // public static float Damp(float source, float smoothing, float dt) { return source * Mathf.Pow(smoothing, dt); } private void Update() { a = Damp(a, 0.5f, Time.deltaTime); } // or private void FixedUpdate() { a = Damp(a, 0.5f, Time.fixedDeltaTime); }
I hear you – what if you want to go from a value a to a value b rather than to zero? The key thing to realise here is that it’s just a shift of the graph on the y-axis. If we’re now damping from 20 to 10 then it looks like this:
So we need to add damp using (a – b) and then add b back on afterwards. Let’s alter our damping function to do this:
This should be looking pretty familiar… It’s in the same form as a standard Lerp but with an exponent on the rate parameter:
a(t + n) = Lerp(b, a(t), Pow(r, n))
You’ll probably notice here that the parameters are not in the order you might expect, but this is easy to fix since:
Lerp(a, b, t) = Lerp(b, a, 1 - t)
Therefore:
a(t + n) = Lerp(a(t), b, 1 - Pow(r, n))
We can write this code directly, or probably a better idea is to wrap it up into a function which will do frame-rate aware damping between two arbitrary values:
// Smoothing rate dictates the proportion of source remaining after one second // public static float Damp(float source, float target, float smoothing, float dt) { return Mathf.Lerp(source, target, 1 - Mathf.Pow(smoothing, dt)) }
A smoothing rate of zero will give you back the target value (i.e. no smoothing), and a rate of 1 is technically not allowed, but will just give you back the source value (i.e. infinite smoothing). Note that this is the opposite of the way a lerp parameter works, but if you so desire, you can just use additive inverse of the smoothing parameter inside the Pow.
The keen-eyed among you may have looked at the graph and thought that it looks awfully like an exponential decay function. You would be right since it actually is an exponential decay function. To see why, let’s go back to the damping function without b in it:
Now let’s compare this to the formula for exponential decay:
Let’s equate these and see what happens
Therefore
So an alternative way of expressing the damping function is to parameterize using lambda. This now has a range between zero and infinity, which nicely expresses the fact that you can never actually reach b when damping.
public static float Damp(float a, float b, float lambda, float dt) { return Mathf.Lerp(a, b, 1 - Mathf.Exp(-lambda * dt)) }
If you look around at other code, you’ll see the exponential decay form used commonly, but just know that it’s just another form of the frame-rate aware Lerp (or the other way around, depending on how you look at it).
Below is a graph showing both forms of damping with lambda calculated from the smoothing rate accordingly. As you can see, they both perfectly match.
Finally, here is the same graph, but this time with a random time interval used.
I hope this clears up some of the confusion over how to use Lerp correctly when damping a value.
]]>The trouble is, I’m still not really sure exactly what it means…
When looking at material response, I expect that most people start with the usual Lambert for diffuse plus Blinn-Phong with a Fresnel effect for specular and continue on their happy way. At some point, they may start to read about physically-based shading, and discover the idea of energy conservation (something I wrote about before).
The standard Lambert diffuse response can emit more light than it receives. The standard Blinn-Phong specular model can either lose energy or gain energy depending on the specular power and color. If you just add the diffuse and specular responses together, materials can also emit more light than they receive.
It’s fairly easy to change these functions to be energy-conserving, and there are some benefits to doing so, but is energy-conserving Lambert and Blinn-Phong (LBP) considered ‘physically based shading’? It’s based on the concept that energy can neither be created or destroyed, right?
I think what most people are referring to when they’re talking about physically based shading, is the model underlying the BRDF. For example, the Torrance-Sparrow microfacet BRDF is modeled on the idea of a surface being comprised of many tiny ideal Fresnel mirrors. The Phong BRDF is vastly simplified, but still grounded in a model of how light is reflected off a mirror.
Is more physically-based, fewer simplifications better? Have we lost any need for magic numbers and hacks?
To even think about answering this question, we have to understand what are we trying to do when we write a BRDF. In general, we’re trying to approximate the physical response of a real-world material using a combination of functions. That physical response is the ratio of radiance to irradiance based on the incoming light direction and the outgoing light direction. Ideally our BRDF will be flexible enough to handle a range of different material types within the same model.
So if we’re approximating real-world data with our BRDF, can’t we just compare it to a real material? That’s a tricky prospect unfortunately. We can only compare our model to what we actually see, and this is the result of not only the BRDF, but the lighting environment as well. The lighting environment consists of many factors such as the number and geometry of the light emitters, the power of the lights, reflections, refractions, occlusion, volumetric scattering. It sounds impossible, doesn’t it?
There is some good news though. The boffins at the Mitsubishi Electric Research Laboratories (MERL) have laser-scanned a number of materials, and have made them freely available for research and academic use. Also, Disney Animation created a tool to visualize these scanned materials and to compare them to any BRDF written in GLSL.
I thought it would be interesting to compare energy-conserving LBP to the Disney Principled BRDF. The Disney BRDF is energy-conserving and is based on the Torrance-Sparrow microfacet specular model and the Lambert diffuse model with some tweaks (for example, to handle diffuse retro-reflection). While it is more physically-based than straight LBP, it still contains an empirical model for the diffuse part.
To make these test images, I loaded up a MERL material in the BRDF explorer, and then used the graph views to match the parameters of each of the BRDFs as closely as possible for the peak specular direction.
The most interesting view in the BRDF explorer shows an image representing a slice of the BRDF (not the overall lighting response) as a two dimensional function. This function is parameterized in a different space than you might be used to, with the half-angle (the angle between the normal and the half-vector) vs difference-angle (the angle between the half vector and the incoming or outgoing light direction).
Where did the other two dimensions go? They’re still there… The slice just represents the theta angles, and you have to scroll through the slices for different phi values. The nice thing about this representation is that in general it’s enough to look at just one slice to get a really good idea of how well a BRDF fits the data.
For each of the following images, the Lambert-Blinn-Phong BRDF is on the left, the scanned material is in the middle, and the Disney BRDF is on the right. I’ve included the BRDF view as well as a lit sphere view.
This first material is shiny red plastic. The left side of the BRDF view clearly shows the tight specular peak. In the top left, you can see the strong Fresnel effect as the viewing direction gets to grazing angles. The darkening effect in the extremes I believe is due to the Fresnel effect, since light coming in from other angles is being mirrored away.
The LBP BRDF captures the Fresnel effect to a small amount, but cannot capture the darkening on the extremes. The Disney BRDF clearly does a better job at capturing these features of the scanned material, but still cannot quite match the reference.
In this dull red plastic, you can see the effects of the retro-reflection in both the sphere and BRDF view of the MERL material. The Disney BRDF captures this to a certain extent, but the LBP BRDF does not. Note that the Disney BRDF also did a better job at capturing the shape of the specular highlight, especially at grazing angles. The Blinn-Phong response is a compromise between the width of the specular lobe at grazing angles, and the intensity when more face on.
This is a brass material. It’s pretty clear here how inadequate Blinn-Phong is to capture the long tail of the brass specular response. The Disney BRDF fares a little better, but it’s still not close to the scanned material. It’s possible to alter the Disney BRDF slightly to allow for longer tails, but this then makes matching non-metal materials more difficult.
This steel material in appears to have some artifacts from the laser scanning process in the MERL view. Again, it’s difficult for both BRDFs to capture the specular response, but the Disney one does a little better than LBP.
Clearly the Disney BRDF does a much better job at capturing these materials than Lambert plus Blinn-Phong. Of course, it’s more expensive to calculate too. It still contains what some would consider a ‘hack’ to handle diffuse retro-reflection, and was specifically engineered to match the MERL BRDFs. Does this make it bad? Not really. At the end of the day, we have to use the best approximation we can that will work within our budgets.
The primary benefit of the movement towards physically-based models for me is really in achieving more consistency via increasing constraints. An artist would probably tell you that it’s about achieving a closer match to the real-world materials. Both are really nice to have.
So what do you think of when someone says they’re using a physically based shader?
]]>My use of FXAA and anisotropic filtering was just making the problem more evident. I would recommend using regular trilinear filtering for derivative maps anyway.
So, mea culpa and all that. Let the name of Morten Mikkelsen and derivative maps be cleared!
]]>I’ll be using the precomputed derivative maps for comparison since the ddx/ddy technique just isn’t acceptable in terms of quality.
Here’s the close up shot of the sphere with the moon texture again. This shows the derivative map implementation, and if you mouse over, you’ll see the normal map version.
There are some slight differences because the height of the derivative map doesn’t quite match the heights used to precompute the normal map, but overall I would say that they are remarkably similar. It looks to me that the normal map is preserving more of the detail though.
Here’s a high-contrast checkerboard, again with the normal map shown if you mouse over.
I’m no artist, but I would say the the derivative map results are close enough to the normal maps to call the technique viable from a quality standpoint.
EDIT: I had some issues with artifacts which I posted here. It turns out they were (embarrassingly) caused by my mipmap generation which was introducing a row of garbage at each level. Combined with FXAA and anisotropic filtering, this caused the weird vertical stripes I posted before.
I’ve removed the images since I don’t want to give the wrong impression of the quality of derivative maps. |
I ran these tests on my Macbook Pro which has an AMD 6750M. The shader in question is a simple shader for filling out the gbuffer render targets. All shaders were compiled using shader model 5. I took the frame times from the Fraps frame counter and the other numbers came from Gpu Perf Studio.
For comparison, I’ve included an implementation with no normal perturbation at all.
Perturbation | Frame | Pixels | Tex Inst | Tex Busy | ALU Inst | ALU Busy | ALU/Tex |
---|---|---|---|---|---|---|---|
None | 1.08 ms | 262144 | 3 | 27.5 % | 14 | 32.1 % | 4.667 |
Normal map | 1.37 ms | 262144 | 4 | 36.5 % | 23 | 52.4 % | 5.75 |
Derivative map | 1.36 ms | 262144 | 9 | 82.0 % | 28 | 63.8 % | 3.11 |
Despite the extra shader instructions, the derivative map method is basically as fast as normal maps on my hardware. As Mikkelsen predicted, it seems like having one fewer vertex attribute interpolator offsets the cost of the extra ALU instructions.
Note that the derivative map shader has nine texture instructions compared to just four for the normal maps. The extra five instructions are the two sets of ddx/ddy instructions, and the instruction to get the texture dimensions. The pixel shader can issue one texture instruction and one ALU instruction on the same cycle, these are essentially free.
The only performance overhead which has any impact for derivative maps are the five extra ALU instructions.
As I mentioned in my previous post, derivative maps also have the tremendous benefit of not requiring tangent vectors. In my case, with a simple vertex containing position, normal, tangent and one set of texcoords, the tangent takes up 27% of the mesh space.
Given that most games these days have tens of megabytes of mesh data, this would turn into some pretty decent memory savings. There’s also a minor benefit on the tool-side to not having to spend time generating face tangents and merging them into the vertices.
Well, for me it’s pretty clear. On my setup, derivative maps have a similar quality with the same performance but less memory. This makes them a win in my book. Of course, these numbers will vary wildly based on the API and hardware, so this can’t be taken as a blanket ‘derivative maps are better than normal maps’ statement, but they look promising. Good job Morten Mikkelsen!
I would love to see a similar comparison for the current generation of console hardware (hint, hint!).
If you have DirectX 11, then you should be able to run the demo here.
]]>I was reading through the AMD CubeMapGen source last week and came across the code for calculating the solid angle of a cube map texel. This code piqued my interest, since it seemed very terse for what I thought would be a horrific calculation.
static float32 AreaElement( float32 x, float32 y ) { return atan2(x * y, sqrt(x * x + y * y + 1)); } float32 TexelCoordSolidAngle(int32 a_FaceIdx, float32 a_U, float32 a_V, int32 a_Size) { //scale up to [-1, 1] range (inclusive), offset by 0.5 to point to texel center. float32 U = (2.0f * ((float32)a_U + 0.5f) / (float32)a_Size ) - 1.0f; float32 V = (2.0f * ((float32)a_V + 0.5f) / (float32)a_Size ) - 1.0f; float32 InvResolution = 1.0f / a_Size; // U and V are the -1..1 texture coordinate on the current face. // Get projected area for this texel float32 x0 = U - InvResolution; float32 y0 = V - InvResolution; float32 x1 = U + InvResolution; float32 y1 = V + InvResolution; float32 SolidAngle = AreaElement(x0, y0) - AreaElement(x0, y1) - AreaElement(x1, y0) + AreaElement(x1, y1); return SolidAngle; }
The source code for this particular part is well documented, and points you towards this thesis by Manne Öhrström (@manneohrstrom) where he gives a high level overview of the derivation. I was interested in finding out some more of the details, so I had a go myself, and this post is the result.
When processing cube maps (for example, generating a diffuse irradiance map, or spherical harmonic approximation), you need to be able to integrate the texel values over a sphere.
One way of approximating this integral is to use a Monte Carlo estimator. This is a statistical technique that may oversample some texels and undersample other. This seems a bit wasteful considering that we have a finite number of input values. Ideally we’d like to use each texel value just once.
A naive approach to analytical integration where each texel has the same weight would result in overly bright values in the corner areas. This is because the texels in the corners project to smaller and smaller areas on the sphere. The correct approach is to factor in the solid angle during the integral, and this is what CubeMapGen does.
Imagine a single cube map face placed at (0,0,1) and scaled such that the texel locations are all in [-1,1]. For any texel in this cube map, we want to project it onto a unit sphere sitting at the origin, then work out the area on the sphere. This area corresponds to the solid angle because the sphere is a unit sphere.
We can repeat this same calculation for any of the other cube map faces by first transforming them into the same range.
This is the high-level game plan for calculating out the solid angle:
We start off with the formula for projecting a point from its location on the texture face (x, y, 1) onto the unit sphere. This is just a standard vector normalization.
Note: I’ll be switching back and forth between negative and fractional exponents as I see fit. This makes things easier. Remember, , and . |
We want to calculate how this projected point changes as the texture-space x and y coordinates change. We can do this separately for each axis using partial derivatives. First we’ll start by calculating how the projected z component changes along to the texture-space x axis.
The z-component of p is simply:
We need to differentiate this equation with respect to x. Because of the exponent, we need to use the chain rule to do this. The chain rule is a method for finding the derivative of the composition of two functions. First, we can reformulate the equation a little bit to make the two functions a bit clearer:
In our case our first function is a function of and our second function is a function of and . Given this, the chain rule says:
We can apply this rule very easily to our reformulated functions:
This equation tells us exactly how the z component of the projected point changes as the texture-space position moves along the x axis.
Now we’ve found the projected z component derivative, it’s going to make finding the x and y components a little easier. Why? Because we can express the x and y components in terms of the z component.
We don’t have the same ‘composition of functions’ setup that we did last time, so we can’t use the chain rule to differentiate this. Instead, we can use the product rule. The product rule in our case says:
Applying this to equation for for the projected x component:
We have a very similar derivation for the projected y derivative:
We can use the product rule again:
Putting this all together, we have our equation showing how the projected position changes as the texture-space position changes in the x direction. We can use the exact same process to work out how it moves in the y direction.
The next step is to calculate the differential (microscopic) area of the projected point using the partial derivatives we just calculated. Clearly at a normal scale, we wouldn’t be able to take the cross product of two projected vectors on a sphere and expect the magnitude to be the area they define on the sphere. But at this differential scale, we can treat the surface as if it is flat, so this works.
The first thing we need to do is to calculate the cross product of the partial derivatives.
Calculating the cross product for each of the components relatively straightforward.
If you’re reading carefully, you’ll notice that each component has a factor of on the top and the bottom, so we can divide through. Combining all the components back together again, we arrive at the final equation for the perpendicular vector.
Now we simply need to take the length of the result of the cross product to find the differential area on the sphere.
The final step is to integrate the differential area over our range of texture-space values to get the solid angle of the texel. We can start by calculating the integral between and some point on the cube map face.
From this formula, we can calculate the area of any texel in the cube map face by adding together the two right-diagonal corners, A and C, and subtracting the left-diagonal corners, B and D.
You can see on the image below that the added areas in green are canceled out by the subtracted areas in red.
That should look familiar, since that’s exactly what the CubeMapGen code does. If you look at the surrounding source code to TexelCoordSolidAngle, then you’ll notice that there’s another method mentioned for calculating the solid angle of a texel. This method is based on Girard’s theorem, which describes how to calculate the area of a spherical triangle based on the excess of the sum of its interior angles. This method was also suggested to me on Twitter by Ignacio Castaño (@castano). I haven’t actually tried it, but it looks fascinating!
It’s always a bit daunting to get to the end of a derivation like this, and not know if the answer is correct or not. In this case, it’s pretty easy to verify if this result is correct.
Remember that the texture-space coordinates are in range [-1,1]. If we set our values to 1, that corresponds to the top right quarter of the cube map face. We know that there are steradians in a sphere, so that means that each face gets steradians. Since we’re only calculating for a quarter of a face we expect our result to be .
And it does. Thanks to the various people on twitter (@SebLagarde, @mattpharr, @manneohrstrom, @castano and @ChristerEricson) for engaging me in conversation over this.
Please let me know in the comments if you spot an error in this post, or if anything needs to be explained better or more easily.
]]>Mikkelsen is apparently well-versed in academic obfuscation (tsk!), so the paper itself can be a little hard to read. If you’re interested in reading it, then I would recommend first reading Jim Blinn’s original bump mapping paper to understand some of the derivations.
Nothing really. But if something comes along that can improve quality, performance or memory consumption then it’s worth taking a a look.
Given a scalar height field (i.e. a two-dimensional array of scalar values), the gradient of that field is a 2D vector field where each vector points in the direction of greatest change. The length of the vectors corresponds to the rate of change.
The contour map below represents the scalar field generated from the function . The vector field shows the gradient of that scalar field. Note how each vector points towards the center, and how the vectors in the center are smaller due to the lower rate of change.
The main premise of the paper is that we can project the gradient of the height field onto an underlying surface and use it to skew the surface normal to approximate the normal of the height-map surface. We can do all of this without requiring tangent vectors.
As with the original bump-mapping technique, it’s not exact due to some terms being dropped due to their relatively small influence, but it’s close.
There are really only two important formulae to consider from the paper. The first shows how to perturb the surface normal using the surface gradient. Don’t confuse the surface gradient with the gradient of the height field mentioned above! As you’ll see shortly, they’re different.
Here, represents the perturbed normal, is the underlying surface normal, and is the surface gradient. So basically, this says that the perturbed normal is the surface normal offset in the negative surface gradient direction.
So how do we calculate the surface gradient from the height field gradient? Well, there’s some fun math in there which I don’t want to repeat, but if you’re interested, I would recommend reading Blinn’s paper first, then Mikkelsen’s paper. You eventually arrive at:
In addition to the symbols defined previously, and are the partial derivatives of the surface position, and and are the partial derivatives of the height field. The derivative directions and are not explictly defined here.
It’s easiest to think of this as the projection of the 2D gradient onto a 3D surface along the normal. Intuitively, this says that the surface gradient direction is pushed out on orthogonal vectors to the s/n and t/n planes by however much the gradient specifies. The denominator term is there to scale up the result when the and are not orthogonal, or are flipped.
Implementing this technique is fairly straightforward once you realise the meaning of some of the variables. Since we’re free to choose the partial derivative directions and , it’s convenient for the shader to use screen-space x and y. The value is the position, and the value is the height field sample.
// Project the surface gradient (dhdx, dhdy) onto the surface (n, dpdx, dpdy) float3 CalculateSurfaceGradient(float3 n, float3 dpdx, float3 dpdy, float dhdx, float dhdy) { float3 r1 = cross(dpdy, n); float3 r2 = cross(n, dpdx); return (r1 * dhdx + r2 * dhdy) / dot(dpdx, r1); } // Move the normal away from the surface normal in the opposite surface gradient direction float3 PerturbNormal(float3 n, float3 dpdx, float3 dpdy, float dhdx, float dhdy) { return normalize(normal - CalculateSurfaceGradient(normal, dpdx, dpdy, dhdx, dhdy)); }
So far, so good. Next we need to work out how to calculate the partial derivatives. The reason why we chose screen-space x and y to be our partial derivative directions is so that we can use the ddx and ddy shader instructions to generate the partial derivatives of both the position and the height.
Given a position and normal in the same coordinate-space, and a height map sample, calculating the final normal is straighforward:
// Calculate the surface normal using screen-space partial derivatives of the height field float3 CalculateSurfaceNormal(float3 position, float3 normal, float height) { float3 dpdx = ddx(position); float3 dpdy = ddy(position); float dhdx = ddx(height); float dhdy = ddy(height); return PerturbNormal(normal, dpdx, dpdy, dhdx, dhdy); }
Note that in shader model 5.0, you can use ddx_fine/ddy_fine instead of ddx/ddy to get high-precision partial derivatives.
So how does this look? At a medium distance, I would say that it looks pretty good:
But what about up close?
Uh oh! What’s happening here? Well, there are a couple of problems…
The main problem is that the height texture is using bilinear filtering, so the gradient between any two texels is constant. This causes large blocks to become very obvious when up close. There are a couple of options for alleviating this somewhat.
One option is to use bicubic filtering. I haven’t tried it, but I would expect this to make a good difference. The problem is that it will incur an extra cost. Another option, suggested in the paper, is to add a detail bump texture on top. This helps quite a lot, but again it adds more cost.
In the image below I’ve just tiled the same texture at 10x frequency over the top. It would be better to apply some kind of noise function as in the original paper.
The second problem is more subtle. We’re getting some small block artifacts because of the way that the ddx and ddy shader instructions work. They take pairs of pixels in a pixel quad and subtract the relevant values to get the derivative. In the case of the height derivatives, we can alleviate this by performing the differencing ourselves with extra texture samples.
The first problem is pretty much a killer for me. I would rather not have to cover up a fundamental implementation issue with extra fudges and more cost.
It’s unfortunate that this didn’t make it into the original paper, but Mikkelsen mentions in a blog post that you can increase the quality by using precomputed height derivatives. This method requires double the texture storage (or half the resolution) of the ddx/ddy method, but produces much better results.
You’re probably wondering how you can possibly precompute screen-space derivatives. We don’t actually have to. Instead we can use the chain rule to transform a partial derivative from one space to another. In our case we can transform our derivatives from uv-space to screen-space if we have the partial derivatives of the uvs in screen-space.
To calculate dhdx you need dhdu, dhdv, dudx and dvdx:
To calculate dhdy you need dhdu, dhdv, dudy and dvdy:
The hlsl for this is very simple:
float ApplyChainRule(float dhdu, float dhdv, float dud_, float dvd_) { return dhdu * dud_ + dhdv * dvd_; }
Assuming that we have a texture that stores the texel-space height derivatives, we can scale this up in the shader to uv-space by simply multiplying by the texture dimensions. We can then use the screen space uv derivatives and the chain rule to transform from dhdu/dhdv to dhdx/dhdy.
// Calculate the surface normal using the uv-space gradient (dhdu, dhdv) float3 CalculateSurfaceNormal(float3 position, float3 normal, float2 gradient) { float3 dpdx = ddx(position); float3 dpdy = ddy(position); float dhdx = ApplyChainRule(gradient.x, gradient.y, ddx(uv.x), ddx(uv.y)); float dhdy = ApplyChainRule(gradient.x, gradient.y, ddy(uv.x), ddy(uv.y)); return PerturbNormal(normal, dpdx, dpdy, dhdx, dhdy); }
So how does this look? Well, it’s pretty much the same at medium distance.
But it’s way better up close, since we’re now interpolating the derivatives.
In order to really draw any conclusions about this technique, I’m going to need to compare the quality, performance and memory consumption to that of normal mapping. That’s a whole other blog post waiting to happen…
But in theory, the pros are:
And the cons are:
I was impressed when I ran the demo with how smooth his UI looked. It turns out that he’s using a little trick (which I’d never seen before, but I’m sure is old to many) to smooth of the edges of his UI elements.
Basically, the trick is to create a ring of extra vertices by extruding the edges of the polygon out by a certain amount. These extra vertices take the same color as the originals, but their alpha is set to zero. Mikko calls this ‘feathering’.
In my case, I found that I got good results by feathering just one pixel. Here’s a quick before/after comparison of the my IMGUI check box at 800% zoom:
And here’s a 1-to-1 example showing rounded button corners:
It’s a pretty nice improvement for a very simple technique! If you’re interested in what the code looks like, then either take a look at Mikko’s IMGUI implementation, or you can find the code I use to feather my convex polygons below.
My implementation is a little less efficient since I recalculate each edge normal twice, but I chose to keep it simple for readability.
]]>I’ve updated the Google Code repository for DoItNow with a newer version. I’ve removed all source control features from DoItNow and separated them into their own add-in. This should make it more compatible with other add-ins you may be using to handle source control. I’ve uploaded the Mercurial version of the add-in, but the full source is available should you want to change it back to Perforce.
I’m only using Visual Studio 2010 at home now, so the project files are all in that format at the moment. I provided Addin files which will work for Visual Studio 2008 as well though.
By request from someone at work, the open in solution dialog now performs matches on multiple (space-separated) search terms.
I’ve been testing out an idea for a replacement for the standard Visual Studio find-in-files. This is the first pass at it (let’s call it an alpha), so download at your own risk! It’s actually sits side-by-side with the existing find-in-files, so it’s pretty safe to install.
Here’s where it’s (possibly) better:
Here’s where it’s (definitely) worse right now:
FindItNow ranks search results based on the number of hits on each line, as well as hits in the surrounding few lines. In order for a result to even show up, it must have all search terms present in a seven-line block.
The top matches (100% quality) have all search terms on the line in question. Worse quality matches have progressively fewer matches on the line.
e.g. Here are the results for a search I did looking for a quaternion conjugate function on some of my code:
Query: "quat conjugate" Options: case=ignore, match=partial Source: Entire solution Finding... Complete Match Quality: 100% -------------------- c:\Development.old\Libraries\C++\Math\Quaternion.h(79): inline Quaternion Conjugate(const Quaternion& quat) c:\Development.old\Libraries\C++\Math\Quaternion.h(96): return Conjugate(quat) / Length(quat); c:\Development.old\Libraries\C++\Math\Quaternion.h(101): const Quaternion result = quat * Quaternion(vec.x, vec.y, vec.z, 0) * Conjugate(quat); Match Quality: 62% ------------------- c:\Development.old\Libraries\C++\Math\Quaternion.h(76): return Quaternion(lhs.x / f, lhs.y / f, lhs.z / f, lhs.w / f); c:\Development.old\Libraries\C++\Math\Quaternion.h(81): return Quaternion(-quat.x, -quat.y, -quat.z, quat.w); c:\Development.old\Libraries\C++\Math\Quaternion.h(94): inline Quaternion Invert(const Quaternion& quat) c:\Development.old\Libraries\C++\Math\Quaternion.h(99): inline Vector3 Rotate(const Vector3& vec, const Quaternion& quat) Total files searched: 407 Matching lines: 34 Find Time: 90 ms Output Time: 9 ms
I’m finding it pretty useful when exploring for functions I think *should* exist in a large code-base since you don’t have to get the exact string to match.
If you’re interested in either of these, you can grab the binaries here.
]]>They’ve kindly shared their efforts back with the main repository in google code. Feel free to profit from their hard work!
Thanks a lot Julian & Clement!
]]>