# Managing Transformations in Hierarchy

• Introduction

One of the most fundamental aspects of 3D engine design is management of spatial relationship between objects. The most intuitive way of handling this issue is to organize objects in a tree structure (hierarchy), where each node stores its local transformation, relative to its parent.

The most common way to define the local transformation is to use a socalled TRS system, where the transformation is composed of translation, rotation, and scale. This system is very easy to use for both programmers using the engine as well as non-technical users like level designers. In this chapter we describe the theory behind such a system.

One problem with the system is decomposition of a matrix back to TRS. It turns out that this problem is often ill-defined and no robust solution exists. We present an approximate solution that works reasonably well in the majority of cases.

• Theory

Keeping objects in hierarchy is a well-known concept. Every object can have a number of children and only one parent. It can also be convenient to store and manage a list of pointers to the children so that we have fast access to them. The aforementioned structure is in fact a tree.

We assume that a node stores its translation, rotation, and scale (TRS) that are relative to its parent. Therefore, we say these properties are local. When we move an object, we drag all its children with it. If we increase scale of the object, then all of its children will become larger too.

Local TRS uniquely defines a local transformation matrix M. We transform vector v in the following way:

where S is an arbitrary scale matrix, R is an arbitrary rotation matrix, T is a translation matrix, and T is the vector matrix T is made of.

To render an object, we need to obtain its global (world) transformation by composing local transformations of all the object’s ancestors up in the hierarchy.

The composition is achieved by simply multiplying local matrices. Given a vector v0, its local matrix M0, and the local matrix M1 of v0’s parent, we can find the global position v2:

Using vector notation for translation, we get

RS != S’R’

Skew Problem

Applying a nonuniform scale (coming from object A) that follows a local rotation (objects B and C) will cause objects (B and C) to be skewed. Skew can appear during matrices composition but it becomes a problem during the decomposition, as it cannot be expressed within a single TRS node. We give an approximate solution to this issue in Section 3.2.4.

Let an object have n ancestors in the hierarchy tree. Let M1,M2, · · · ,Mn be their local transformation matrices, M0 be a local transformation matrix of the considered object, and Mi = SiRiTi.

MTRSΣ = M0M1 · · ·Mn

MTR = R0T0R1T1 · · ·RnTn

TR可以很好的叠加来获得世界坐标的TR

MSΣ = MRSΣMR

here we have the skew and the scale combined. We use diagonal elements of MSΣ to get the scale, and we choose to ignore the rest that is responsible for the skew.

Scale 的话用这边算出来的对角线来表示，其他的结果丢弃采用上面的TR，这样的结果就可以避免skew.

In a 3D engine we often need to modify objects’ parent-children relationship.

we want to change the local transformation such that the global transformation is still the same. Obviously, that forces us to recompute local TRS values of the object whose parent we’re changing.

To get from the current local space to a new local space (parent changes, global transform stays the same), we first need to find the global transform of the object by going up in the hierarchy to the root node. Having done this we need to go down the hierarchy to which our new parent belongs.

LetM’0 be the new parent’s local transformation matrix. Let that new parent have n’ ancestors in the hierarchy tree with local transformations M’1,M’2, · · · ,M’n’, where M’i = S’iR’iT’i. The new local transformation matrix can thus be found using the following formula:

Alternative Systems

• Implementation

# Object Space Ambient Occlusion for Molecular Dynamics (OSAO)

http://http.developer.nvidia.com/GPUGems2/gpugems2_chapter14.html

Michael Bunnell

NVIDIA Corporation

In this chapter we describe a new technique for computing diffuse light transfer and show how it can be used to compute global illumination for animated scenes. Our technique is efficient enough when implemented on a fast GPU to calculate ambient occlusion and indirect lighting data on the fly for each rendered frame. It does not have the limitations of precomputed radiance transfer(光辐射传输) (PRT) or precomputed ambient occlusion techniques, which are limited to rigid objects that do not move relative to one another (Sloan 2002). Figure 14-1 illustrates how ambient occlusion and indirect lighting enhance environment lighting.

【这里介绍一种高效的基于GPU运算的ambient occlusion技术。这里突破了一般预计算方式的只可应用于静态对象的局限。】

Figure 14-1 Adding Realism with Ambient Occlusion and Indirect Lighting

Our technique works by treating polygon meshes as a set of surface elements that can emit, transmit, or reflect light and that can shadow each other. This method is so efficient because it works without calculating the visibility of one element to another. Instead, it uses a much simpler and faster technique based on approximate shadowing to account for occluding (blocking) geometry.

14.1 Surface Elements

The first step in our algorithm is to convert the polygonal data to surface elements to make it easy to calculate how much one part of a surface shadows or illuminates another.

【这里算法的第一步就是将多边形数据转化成surface elements。】

Figure 14-2 illustrates the basic concept. We define a surface element as an oriented disk with a position, normal, and area. An element has a front face and a back face. Light is emitted and reflected from the front-facing side. Light is transmitted and shadows are cast from the back. We create one element per vertex of the mesh. Assuming that the vertices are defined with a position and normal already, we just need to calculate the area of each element. We calculate the area at a vertex as the sum of one-third of the area of the triangles that share the vertex (or one-fourth of the area for quads). Heron’s formula for the area of a triangle with sides of length a, b, and c is:

where s is half the perimeter of the triangle: (a + b + c)/2.

Figure 14-2 Converting a Polygonal Mesh to Elements

We store element data (position, normal, and area) in texture maps because we will be using a fragment program (that is, a pixel shader) to do all the ambient occlusion calculations. Assuming that vertex positions and normals will change for each frame, we need to be able to change the values in the texture map quickly.

One option is to keep vertex data in a texture map from the start and to do all the animation and transformation from object space to eye (or world) space with fragment programs instead of vertex programs. We can use render-to-vertex-array to create the array of vertices to be sent down the regular pipeline, and then use a simple pass-through vertex shader.

Another, less efficient option is to do the animation and transformation on the CPU and load a texture with the vertex data each frame.

14.2 Ambient Occlusion

Ambient occlusion is a useful technique for adding shadowing to diffuse objects lit with environment lighting. Without shadows, diffuse objects lit from many directions look flat and unrealistic. Ambient occlusion provides soft shadows by darkening surfaces that are partially visible to the environment. It involves calculating the accessibility value, which is the percentage of the hemisphere above each surface point not occluded by geometry (Landis 2002). In addition to accessibility, it is also useful to calculate the direction of least occlusion, commonly known as the bent normal. The bent normal is used in place of the regular normal when shading the surface for more accurate environment lighting.

AO解释，在对象表面生成软阴影可以有效的提高真实感。】

We can calculate the accessibility(辅助) value at each element as 1 minus the amount by which all the other elements shadow the element. We refer to the element that is shadowed as the receiver and to the element that casts the shadow as the emitter. We use an approximation based on the solid angle of an oriented disk to calculate the amount by which an emitter element shadows a receiver element. Given that A is the area of the emitter, the amount of shadow can be approximated by:

As illustrated in Figure 14-3, qE is the angle between the emitter’s normal and the vector from the emitter to the receiver. qR is the corresponding angle for the receiver element. The max(1, 4 x cos qR ) term is added to the disk solid angle formula to ignore emitters that do not lie in the hemisphere above the receiver without causing rendering artifacts for elements that lie near the horizon.

【这一段在解释变量含义】

Figure 14-3 The Relationship Between Receiver and Emitter Elements

Here is the fragment program function to approximate the element-to-element occlusion:

【下面是计算函数的实现】

We calculate the accessibility values(辅助值) in two passes.

【这里计算包含两个pass

In the first pass, we approximate the accessibility for each element by summing the fraction(分数) of the hemisphere(半球) subtended(对着) by every other element and subtracting(减法) the result from 1.

【第一个pass是根据上面的公式来近似计算每一个element的分数】

After the first pass, some elements will generally be too dark because other elements that are in shadow are themselves casting shadows. So we use a second pass to do the same calculation, but this time we multiply each form factor by the emitter element’s accessibility from the last pass.

【经过第一步会导致有些elements太暗了，原因在于存在投影的过度叠加。因此第二个pass做同样的计算，但是这里我们乘上每一个emitter elements的上一步计算出来的辅助值。】

The effect is that elements that are in shadow will cast fewer shadows on other elements, as illustrated in Figure 14-4. After the second pass, we have removed any double shadowing.

However, surfaces that are triple shadowed or more will end up being too light. We can use more passes to get a better approximation, but we can approximate the same answer by using a weighted average of the combined results of the first and second passes. Figure 14-5 shows the results after each pass, as well as a ray-traced solution for comparison. The bent normal calculation is done during the second pass. We compute the bent normal by first multiplying the normalized vector between elements and the form factor. Then we subtract this result from the original element normal.

【其实通过上面的两步还是得不到很好的结果，比如第二步只去除的是双重叠加的效果，如果是三重叠加我们还需要更进一步的 pass来去除叠加效果，这是个无底洞。 因此我们采用对第二步的结果再设置权重值的方式来获得更好的近似效果，下下图就是结果展示。】

Figure 14-4 Correcting for Occlusion by Overlapping Objects

Figure 14-5 Comparing Models Rendered with Our Technique to Reference Images

We calculate the occlusion result by rendering a single quad (or two triangles) so that one pixel is rendered for each surface element. The shader calculates the amount of shadow received at each element and writes it as the alpha component of the color of the pixel. The results are rendered to a texture map so the second pass can be performed with another render. In this pass, the bent normal is calculated and written as the RGB value of the color with a new shadow value that is written in the alpha component.

14.2.2 Improving Performance

Even though the element-to-element shadow calculation is very fast (a GeForce 6800 can do 150 million of these calculations per second), we need to improve our algorithm to work on more than a couple of thousand elements in real time. We can reduce the amount of work by using simplified geometry for distant surfaces. This approach works well for diffuse lighting environments because the shadows are so soft that those cast by details in distant geometry are not visible. Fortunately, because we do not use the polygons themselves in our technique, we can create surface elements to represent simplified geometry without needing to create alternate polygonal models. We simply group elements whose vertices are neighbors in the original mesh and represent them with a single, larger element. We can do the same thing with the larger elements, creating fewer and even larger elements, forming a hierarchy. Now instead of traversing every single element for each pixel we render, we traverse the hierarchy of elements. If the receiver element is far enough away from the emitter—say, four times the radius of the emitter—we use it for our calculation. Only if the receiver is close to an emitter do we need to traverse its children (if it has any). See Figure 14-6. By traversing a hierarchy in this way, we can improve the performance of our algorithm from O(n 2) to O(n log n) in practice. The chart in Figure 14-7 shows that the performance per vertex stays consistent as the number of vertices in the hierarchy increases.

【其实这样的element to element(pixel to pixel)的计算已经很快了。我们要增强我们的算法来尽可能多的支持顶点(element/pixel)数。这里的想法就是通过空间几何关系，相邻的一些定点可以组合当作一个element group（计算的时候当作一个element）来处理，然后起作用再细分，就是一般层次化的方法。】

Figure 14-6 Hierarchical Elements

Figure 14-7 Ambient Occlusion Shader Performance for Meshes of Different Densities

【性能图示】

We calculate a parent element’s data using its direct descendants in the hierarchy. We calculate the position and normal of a parent element by averaging the positions and normals of its children. We calculate its area as the sum of its children’s areas. We can use a shader for these calculations by making one pass of the shader for each level in the hierarchy, propagating the values from the leaf nodes up. We can then use the same technique to average the results of an occlusion pass that are needed for a following pass or simply treat parent nodes the same as children and avoid the averaging step. It is worth noting that the area of most animated elements varies little, if at all, even for nonrigid objects; therefore, the area does not have to be recalculated for each frame.

【这里交代父节点（高层次）的数据来源】

The ambient occlusion fragment shader appears in Listing 14-1.

14.3 Indirect Lighting and Area Lights

We can add an extra level of realism to rendered images by adding indirect lighting caused by light reflecting off diffuse surfaces (Tabellion 2004). We can add a single bounce of indirect light using a slight variation of the ambient occlusion shader. We replace the solid angle function with a disk-to-disk radiance transfer function. We use one pass of the shader to transfer the reflected or emitted light and two passes to shadow the light.

For indirect lighting, first we need to calculate the amount of light to reflect off the front face of each surface element. If the reflected light comes from environment lighting, then we compute the ambient occlusion data first and use it to compute the environment light that reaches each vertex. If we are using direct lighting from point or directional lights, we compute the light at each element just as if we are shading the surface, including shadow mapping. We can also do both environment lighting and direct lighting and sum the two results. We then multiply the light values by the color of the surface element, so that red surfaces reflect red, yellow surfaces reflect yellow, and so on. Area lights are handled just like light-reflective diffuse surfaces except that they are initialized with a light value to emit.

Here is the fragment program function to calculate element-to-element radiance transfer:

Equation 14-2 Disk-to-Disk Form Factor Approximation

We calculate the amount of light transferred from one surface element to another using the geometric term of the disk-to-disk form factor given in Equation 14-2. We leave off the visibility factor, which takes into account blocking (occluding) geometry. Instead we use a shadowing technique like the one we used for calculating ambient occlusion—only this time we use the same form factor that we used to transfer the light. Also, we multiply the shadowing element’s form factor by the three-component light value instead of a single-component accessibility value.

【我们使用上面的公式来计算光线从一个element transfer 到另一个。也就是说我们这里用了OSAO那种思想来做光线的传播。】

We now run one pass of our radiance-transfer shader to calculate the maximum amount of reflected or emitted light that can reach any element. Then we run a shadow pass that subtracts from the total light at each element based on how much light reaches the shadowing elements. Just as with ambient occlusion, we can run another pass to improve the lighting by removing double shadowing. Figure 14-8 shows a scene lit with direct lighting plus one and two bounces of indirect lighting.

Figure 14-8 Combining Direct and Indirect Lighting

14.4 Conclusion

Global illumination techniques such as ambient occlusion and indirect lighting greatly enhance the quality of rendered diffuse surfaces. We have presented a new technique for calculating light transfer to and from diffuse surfaces using the GPU. This technique is suitable for implementing various global illumination effects in dynamic scenes with deformable geometry.

【废话不解释】

14.5 References

Landis, Hayden. 2002. “Production-Ready Global Illumination.” Course 16 notes, SIGGRAPH 2002.

Pharr, Matt, and Simon Green. 2004. “Ambient Occlusion.” In GPU Gems, edited by Randima Fernando, pp. 279–292. Addison-Wesley.

Sloan, Peter-Pike, Jan Kautz, and John Snyder. 2002. “Precomputed Radiance Transfer for Real-Time Rendering in Dynamic, Low-Frequency Lighting Environments.” ACM Transactions on Graphics (Proceedings of SIGGRAPH 2002) 21(3), pp. 527–536.

Tabellion, Eric, and Arnauld Lamorlette. 2004. “An Approximate Global Illumination System for Computer Generated Films.” ACM Transactions on Graphics (Proceedings of SIGGRAPH 2004) 23(3), pp. 469–476.

# Screen Space Ambient Occlusion(SSAO)

• BACKGROUND

Ambient occlusion is an approximation of the amount by which a point on a surface is occluded by the surrounding geometry, which affects the accessibility of that point by incoming light. (主要看是否靠近物体)

In effect, ambient occlusion techniques allow the simulation of proximity shadows – the soft shadows that you see in the corners of rooms and the narrow spaces between objects. (用于模拟软阴影)

Ambien occlusion is often subtle, but will dramatically improve the visual realism of a computer-generated scene:

The basic idea is to compute an occlusion factor(阻塞要素) for each point on a surface and incorporate(合并) this into the lighting model, usually by modulating the ambient term such that more occlusion = less light, less occlusion = more light. Computing the occlusion factor can be expensive; offline renderers typically do it by casting a large number of rays in a normal-oriented hemisphere to sample the occluding geometry around a point. In general this isn’t practical for realtime rendering.

To achieve interactive frame rates, computing the occlusion factor needs to be optimized as far as possible. One option is to pre-calculate it, but this limits how dynamic a scene can be (the lights can move around, but the geometry can’t).(速度是大问题)

• CRYSIS METHOD

Way back in 2007, Crytek implemented a realtime solution for Crysis, which quickly became the yardstick for game graphics. The idea is simple: use per-fragment depth information as an approximation of the scene geometry and calculate the occlusion factor in screen space. This means that the whole process can be done on the GPU, is 100% dynamic and completely independent of scene complexity. Here we’ll take a quick look at how the Crysis method works, then look at some enhancements.

Rather than(与其) cast(投射) rays in a hemisphere, Crysis samples the depth buffer at points derived(来源) from samples in a sphere:[在深度buffer以当前点为中心的一个圆内取sample]

This works in the following way:

• project each sample point into screen space to get the coordinates into the depth buffer(获得深度图及坐标)
• sample the depth buffer(取深度图的sample)
• if the sample position is behind the sampled depth (i.e. inside geometry), it contributes to the occlusion factor(sample平均值小于其本身深度值，则起作用)

Clearly the quality of the result is directly proportional to the number of samples, which needs to be minimized in order to achieve decent performance. Reducing the number of samples, however, produces ugly ‘banding’ artifacts in the result. This problem is remedied by randomly rotating the sample kernel at each pixel, trading banding for high frequency noise which can be removed by blurring the result.

The Crysis method produces occlusion factors with a particular ‘look’ – because the sample kernel is a sphere, flat walls end up looking grey because ~50% of the samples end up being inside the surrounding geometry. Concave corners darken as expected, but convex ones appear lighter since fewer samples fall inside geometry. Although these artifacts are visually acceptable, they produce a stylistic effect which strays somewhat from photorealism.

• NORMAL-ORIENTED HEMISPHERE

Rather than sample a spherical kernel at each pixel, we can sample within a hemisphere, oriented along the surface normal at that pixel. This improves the look of the effect with the penalty of requiring per-fragment normal data. For a deferred renderer, however, this is probably already available, so the cost is minimal (especially when compared with the improved quality of the result).

(改进：去法线方向的半球内的sample)

• Generating the Sample Kernel

The first step is to generate the sample kernel itself. The requirements are that

• sample positions fall within the unit hemisphere
• sample positions are more densely clustered towards the origin. This effectively attenuates the occlusion contribution according to distance from the kernel centre – samples closer to a point occlude it more than samples further away

Generating the hemisphere is easy:

This creates sample points on the surface of a hemisphere oriented along the z axis.(先建一个标准半球) The choice of orientation is arbitrary(随意) – it will only affect the way we reorient the kernel in the shader. The next step is to scale each of the sample positions to distribute them within the hemisphere. This is most simply done as:

which will produce an evenly distributed set of points. What we actually want is for the distance from the origin to falloff as we generate more points, according to a curve like this:(权重和距离相关)

• Generating the Noise Texture

Next we need to generate a set of random values used to rotate the sample kernel, which will effectively increase the sample count and minimize the ‘banding’ artefacts mentioned previously.

Note that the z component is zero; since our kernel is oriented along the z-axis, we want the random rotation to occur around that axis.(竟然是random rotation!难道不能是顶点或者面法线更符合实际情况？)

These random values are stored in a texture and tiled over(铺满) the screen. The tiling of the texture causes the orientation of the kernel to be repeated and introduces regularity into the result. By keeping the texture size small we can make this regularity occur at a high frequency, which can then be removed with a blur step that preserves the low-frequency detail of the image. Using a 4×4 texture and blur kernel produces excellent results at minimal cost. This is the same approach as used in Crysis.

With all the prep work done, we come to the meat of the implementation: the shader itself. There are actually two passes: calculating the occlusion factor, then blurring the result.

Calculating the occlusion factor requires first obtaining the fragment’s view space position and normal:

I reconstruct the view space position by combining the fragment’s linear depth with the interpolated vViewRay. See Matt Pettineo’s blog for a discussion of other methods for reconstructing position from depth. The important thing is that origin ends up being the fragment’s view space position.

Retrieving(检索) the fragment’s normal is a little more straightforward(直截了当); the scale/bias and normalization steps are necessary unless you’re using some high precision format to store the normals:

Next we need to construct a change-of-basis matrix to reorient our sample kernel along the origin’s normal. We can cunningly(巧妙地) incorporate(合并) the random rotation here, as well:

The first line retrieves a random vector rvec from our noise texture. uNoiseScale is a vec2 which scales vTexcoord to tile the noise texture. So if our render target is 1024×768 and our noise texture is 4×4, uNoiseScale would be (1024 / 4, 768 / 4). (This can just be calculated once when initialising the noise texture and passed in as a uniform).

The next three lines use the Gram-Schmidt process to compute an orthogonal basis, incorporating our random rotation vector rvec.

The last line constructs the transformation matrix from our tangent, bitangent and normal vectors. The normal vector fills the z component of our matrix because that is the axis along which the base kernel is oriented.

Next we loop through the sample kernel (passed in as an array of vec3, uSampleKernel), sample the depth buffer and accumulate the occlusion factor:

Getting the view space sample position is simple; we multiply by our orientation matrix tbn, then scale the sample by uRadius (a nice artist-adjustable factor, passed in as a uniform) then add the fragment’s view space position origin.

We now need to project sample (which is in view space) back into screen space to get the texture coordinates with which we sample the depth buffer. This step follows the usual process – multiply by the current projection matrix (uProjectionMat), perform w-divide then scale and bias to get our texture coordinate: offset.xy.

Next we read sampleDepth out of the depth buffer (uTexLinearDepth). If this is in front of the sample position, the sample is ‘inside’ geometry and contributes to occlusion. If sampleDepth is behind the sample position, the sample doesn’t contribute to the occlusion factor. Introducing a rangeCheck helps to prevent erroneous occlusion between large depth discontinuities:

As you can see, rangeCheck works by zeroing any contribution from outside the sampling radius.

The final step is to normalize the occlusion factor and invert it, in order to produce a value that can be used to directly scale the light contribution.

The blur shader is very simple: all we want to do is average a 4×4 rectangle around each pixel to remove the 4×4 noise pattern:

The only thing to note in this shader is uTexelSize, which allows us to accurately sample texel centres based on the resolution of the AO render target.

• CONCLUSION

The normal-oriented hemisphere method produces a more realistic-looking than the basic Crysis method, without much extra cost, especially when implemented as part of a deferred renderer where the extra per-fragment data is readily available. It’s pretty scalable, too – the main performance bottleneck is the size of the sample kernel, so you can either go for fewer samples or have a lower resolution AO target.

A demo implementation is available here.

# Anti-aliasing

1.顺序栅格超级采样（Ordered Grid Super-Sampling，简称OGSS），采样时选取2个邻近像素。

2.旋转栅格超级采样（Rotated Grid Super-Sampling，简称RGSS），采样时选取4个邻近像素。

NVIDIA已经移除了CSAA，可能这种抗锯齿技术有点落伍了吧，论画质不如TXAA，论性能不如FXAA，而且只有NVIDIA支持，兼容性也是个问题。

TXAA的原理就是通过HDR后处理管线从硬件层面上提供颜色矫正处理，后期处理的方式实际上原理和FXAA差不多：整合硬件AA以及类似于CG电影中所采用的复杂的高画质过滤器，来减少抗锯齿中出现的撕裂和抖动现象。

TXAA 是一种全新的电影风格抗锯齿技术，旨在减少时间性锯齿 (运动中的蠕动和闪烁) 该技术集时间性过滤器、硬件抗锯齿以及定制的 CG 电影式抗锯齿解算法于一身。 要过滤屏幕上任意特定的像素，TXAA 需要使用像素内部和外部的采样以及之前帧中的采样，以便提供超级高画质的过滤。 TXAA 在标准 2xMSAA 4xMSAA 的基础上改进了时间性过滤。 例如，在栅栏或植物上以及在运动画面中，TXAA 已经开始接近、有时甚至超过了其它高端专业抗锯齿算法的画质。TXAA 由于采用更高画质的过滤，因而与传统 MSAA 较低画质的过滤相比，图像更加柔和。

NVIDIA（英伟达）根据MSAA改进出的一种抗锯齿技术。目前只有使用麦克斯韦架构GPU的显卡才可以使用。在 Maxwell 上，英伟达推出了用于光栅化的可编程采样位置，它们被存储在随机存取存储器 (RAM) 中。如此一来便为更灵活、更创新的全新抗锯齿技术创造了机会，这类抗锯齿技术能够独特地解决现代游戏引擎所带来的难题，例如高画质抗锯齿对性能的更高要求。只要在NVIDIA控制面板里为程序开启MFAA并在游戏中选择MSAA就可以开启。画面表现明显强于同级别的MSAA，这种全新抗锯齿技术在提升边缘画质的同时能够将性能代价降至最低。通过在时间和空间两方面交替使用抗锯齿采样格式，4xMFAA 的性能代价仅相当于 2xMSAA，但是抗锯齿效果却与 4xMSAA 相当。[3]

# Defered/Forward Rendering

http://www.cnblogs.com/polobymulberry/p/5126892.html

1. rendering path的技术基础

2. 几种常用的Rendering Path

Rendering Path其实指的就是渲染场景中光照的方式。由于场景中的光源可能很多，甚至是动态的光源。所以怎么在速度和效果上达到一个最好的结果确实很困难。以当今的显卡发展为契机，人们才衍生出了这么多的Rendering Path来处理各种光照。

2.1 Forward Rendering

Forward Rendering优缺点

3.对于某个几何体，光源对其作用的程度是不同，所以有些作用程度特别小的光源可以不进行考虑。典型的例子就是Unity中只考虑重要程度最大的4个光源。

2.2 Deferred Rendering

Deferred Rendering（延迟渲染）顾名思义，就是将光照处理这一步骤延迟一段时间再处理。具体做法就是将光照处理这一步放在已经三维物体生成二维图片之后进行处理。也就是说将物空间的光照处理放到了像空间进行处理。要做到这一步，需要一个重要的辅助工具——G-Buffer。G-Buffer主要是用来存储每个像素对应的Position，Normal，Diffuse Color和其他Material parameters。根据这些信息，我们就可以在像空间中对每个像素进行光照处理[3]。下面是Deferred Rendering的核心伪代码。

. Depth Buffer

.Specular Intensity/Power

.Normal Buffer

.Diffuse Color Buffer

.Deferred Lighting Results

Deferred Rendering的最大的优势就是将光源的数目和场景中物体的数目在复杂度层面上完全分开。也就是说场景中不管是一个三角形还是一百万个三角形，最后的复杂度不会随光源数目变化而产生巨大变化。从上面的伪代码可以看出deferred rendering的复杂度为 。

2.2.1 Light Pre-Pass

Light Pre-Pass最早是由Wolfgang Engel在他的博客[2]中提到的。具体的做法是

(1)只在G-Buffer中存储Z值和Normal值。对比Deferred Render，少了Diffuse Color Specular Color以及对应位置的材质索引值。

(2)FS阶段利用上面的G-Buffer计算出所必须的light properties，比如Normal*LightDir,LightColor,Specularlight properties。将这些计算出的光照进行alpha-blend并存入LightBuffer（就是用来存储light propertiesbuffer）。

(3)最后将结果送到forward rendering渲染方式计算最后的光照效果。

2.2.2 Tile-Based Deferred Rendering

TBDR主要思想就是将屏幕分成一个个小块tile。然后根据这些Depth求得每个tilebounding box。对每个tilebounding boxlight进行求交，这样就得到了对该tile有作用的light的序列。最后根据得到的序列计算所在tile的光照效果。[4][5]

2.3 Forward+

Forward+ == Forward + Light Culling[6]Forward+ 很类似Tiled-based Deferred Rendering。其具体做法就是先对输入的场景进行z-prepass，也就是说关闭写入color，只向z-buffer写入z值。注意此步骤是Forward+必须的，而其他渲染方式是可选的。接下来来的步骤和TBDR很类似，都是划分tiles，并计算bounding box。只不过TBDR是在G-Buffer中完成这一步骤的，而Forward+是根据Z-Buffer。最后一步其实使用的是forward方式，即在FS阶段对每个pixel根据其所在tilelight序列计算光照效果。而TBDR使用的是基于G-Bufferdeferred rendering

Forward+的优势还有很多，其实大多就是传统Forward Rendering本身的优势，所以Forward+更像一个集各种Rendering Path优势于一体的Rendering Path

3. 总结

3.1 Rendering Equation

3.2 Forward Renderng

3.3 Deferred Rendering

3.4 Forward+ Rendering

[1] Shawn Hargreaves. (2004) “Deferred Shading”. [Online] Available:

[2] Wolfgang Engel. (March 16, 2008) “Light Pre-Pass Renderer”. [Online] Available: