Category: Basic Game Tech

RayTracing – Adding Reflection and Refraction

The other advantage of ray-tracing is that, by extending the idea of ray propagation, we can very easily simulate effects like reflection and refraction, both of which are handy in simulating glass materials or mirror surfaces. In a 1979 paper entitled “An Improved Illumination Model for Shaded Display”, Turner Whitted was the first to describe how to extend Appel’s ray-tracing algorithm for more advanced rendering. Whitted’s idea extended Appel’s model of shooting rays to incorporate computations for both reflection and refraction.【扩展光线追踪,其模拟的方法很容易模拟反射折射】


In optics, reflection and refraction are well known phenomena. Although a whole later lesson is dedicated to reflection and refraction, we will look quickly at what is needed to simulate them. We will take the example of a glass ball, an object which has both refractive and reflective properties. As long as we know the direction of the ray intersecting the ball, it is easy to compute what happens to it. Both reflection and refraction directions are based on the normal at point of intersection and the direction of the incoming ray (the primary ray). To compute the refraction direction we also need to specify the index of refraction of the material. Although we said earlier that rays travel on a straight line, we can visualize refraction as the ray being bent. When a photon hits an object of a different medium (and thus a different index of refraction), its direction changes. The science of this will be discussed in more depth later. As long as we remember that these two effects depend of the normal vector and the incoming ray direction, and that refraction depends of the refractive index of the material we are ready to move on.【我们举个玻璃球的例子来看折射反射,一根光线射到玻璃球,折返射的方向都可根据物理规则知道,如下图】



Similarly, we must also be aware of the fact that an object like a glass ball is reflective and refractive at the same time. We need to compute both for a given point on the surface, but how do we mix them together? Do we take 50% of the reflection result and mix it with 50% of the refraction result? Unfortunately, it is more complicated than that. The mixing of values is dependent upon the angle between primary ray (or viewing direction) and both the normal of the object and the index of refraction. Fortunately for us, however, there is an equation that calculates precisely how each should be mixed. This equation is know as the Fresnel equation. To remain concise, all we need to know, for now, is that it exists and it will be useful in the future in determining the mixing values.【那么我们如何混合折返射,混合的比例和入射光线的角度相关,就是Fresnel函数表示的】


So let’s recap. How does the Whitted algorithm work? We shoot a primary ray from the eye and the closest intersection (if any) with objects in the scene. If the ray hits an object which is not a diffuse or opaque object, we must do extra computational work. To compute the resulting color at that point on, say for example, the glass ball, you need to compute the reflection color and the refraction color and mix them together. Remember, we do that in three steps. Compute the reflection color, compute the refraction color, and then apply the Fresnel equation.【算法流程就是,从眼睛发出射线射到第一个物体,如果物体不是不透明的,就需要拆分成折返射光线继续参与光线追踪,最后的结果做颜色比例混合。下面就是再说这三步】


  1. First we compute the reflection direction. For that we need two items: the normal at the point of intersection and the primary ray’s direction. Once we obtain the reflection direction, we shoot a new ray in that direction. Going back to our old example, let’s say the reflection ray hits the red sphere. Using Appel’s algorithm, we find out how much light reaches that point on the red sphere by shooting a shadow ray to the light. That obtains a color (black if it is shadowed) which is then multiplied by the light intensity and returned to the glass ball’s surface.
  2. Now we do the same for the refraction. Note that, because the ray goes through the glass ball it is said to be a transmission ray (light has traveled from one side of the sphere to other; it was transmitted). To compute the transmission direction we need the normal at the hit point, the primary ray direction, and the refractive index of the material (in this example it may be something like 1.5 for glass material). With the new direction computed, the refractive ray continues on its course to the other side of the glass ball. There again, because it changes medium, the ray is refracted one more time. As you can see in the adjacent image, the direction of the ray changes when the ray enters and leaves the glass object. Refraction takes place every time there’s a change of medium and that two media, the one the ray exits from and the one it gets in, have a different index of refraction. As you probably know the refraction index of air is very close to 1 and the refraction index of glass is around 1.5). Refraction has for effect to bend the ray slightly. This process is what makes objects appear shifted when looking through or at objects of different refraction indexes. Let’s imagine now that when the refracted ray leaves the glass ball it hits a green sphere. There again we compute the local illumination at the point of intersection between the green sphere and refracted ray (by shooting a shadow ray). The color (black if it is shadowed) is then multiplied by the light intensity and returned to the glass ball’s surface
  3. Lastly, we compute the Fresnel equation. We need the refractive index of the glass ball, the angle between the primary ray, and the normal at the hit point. Using a dot product (we will explain that later), the Fresnel equation returns the two mixing values.


Here is some pseudo code to reinforce how it works:


One last, beautiful thing about this algorithm is that it is recursive (that is also a curse in a way, too!). In the case we have studied so far, the reflection ray hits a red, opaque sphere and the refraction ray hits a green, opaque, and diffuse sphere. However, we are going to imagine that the red and green spheres are glass balls as well. To find the color returned by the reflection and the refraction rays, we would have to follow the same process with the red and the green spheres that we used with the original glass ball. This is a serious drawback of the ray tracing algorithm and can actually be nightmarish in some cases. Imagine that our camera is in a box which has only reflective faces. Theoretically, the rays are trapped and will continue bouncing off of the box’s walls endlessly (or until you stop the simulation). For this reason, we have to set an arbitrary limit that prevents the rays from interacting, and thus recursing endlessly. Each time a ray is either reflected or refracted its depth is incremented. We simply stop the recursion process when the ray depth is greater than the maximum recursion depth.【这个算法是递归的,这点要注意,最好设置合理的条件已产生合理的结果。】


























RayTracing – Implementing the Raytracing Algorithm


We have covered everything there is to say! We are now prepared to write our first ray-tracer. You should now be able to guess how the ray-tracing algorithm works.【我们开始来实现算法】


First of all, take a moment to notice that the propagation of light in nature is just a countless number of rays emitted from light sources that bounce around until they hit the surface of our eye. Ray-tracing is, therefore, elegant in the way that it is based directly on what actually happens around us. Apart from the fact that it follows the path of light in the reverse order, it is nothing less that a perfect nature simulator.【光线在自然界的传播只是从光源发射的无数光线,它们会反射到我们的眼睛表面。



The ray-tracing algorithm takes an image made of pixels. For each pixel in the image, it shoots a primary ray into the scene. The direction of that primary ray is obtained by tracing a line from the eye to the center of that pixel. Once we have that primary ray’s direction set, we check every object of the scene to see if it intersects with any of them. In some cases, the primary ray will intersect more than one object. When that happens, we select the object whose intersection point is the closest to the eye. We then shoot a shadow ray from the intersection point to the light (Figure 6, top). If this particular ray does not intersect an object on its way to the light, the hit point is illuminated. If it does intersect with another object, that object casts a shadow on it (figure 2).ray-tracing基于图片的pixel,对于每一个pixel,我们从眼睛所在位置向pixel位置发出射线,然后我们检查场景每一个物体与光线的相交关系。很多情况下会与多个物体相交,这时候我们处理离眼睛最近的那个对象。发射shadow light,如果这光线只与这对象相交,则是亮的,否则是其他物体投下的阴影区域】



If we repeat this operation for every pixel, we obtain a two-dimensional representation of our three-dimensional scene (figure 3).【遍历pixel获得图像结果】



Here is an implementation of the algorithm in pseudocode:【伪代码】



The beauty of ray-tracing, as one can see, is that it takes just a few lines to code; one could certainly write a basic ray-tracer in 200 lines. Unlike other algorithms, such as a scanline renderer, ray-tracing takes very little effort to implement.ray-trace的美妙在于,一个基本的实现就200行左右,如上图所示】


This technique was first described by Arthur Appel in 1969 by a paper entitled “Some Techniques for Shading Machine Renderings of Solids”. So, if this algorithm is so wonderful why didn’t it replace all the other rendering algorithms? The main reason, at the time (and even today to some extent), was speed. As Appel mentions in his paper:【这技术在1969年首次提出,但是在实际使用中没有推广的原因在于渲染时间还是很长】


“This method is very time consuming, usually requiring for useful results several thousands times as much calculation time as a wire frame drawing. About one half of of this time is devoted to determining the point to point correspondence of the projection and the scene.”


In other words, it is slow (but as Kajiya – one of the most influential researchers of all computer graphics history -once said: “ray tracing is not slow – computers are”). It is extremely time consuming to find the intersection between rays and geometry. For decades, the algorithm’s speed has been the main drawback of ray-tracing. However, as computers become faster, it is less and less of an issue. Although one thing must still be said: comparatively to other techniques, like the z-buffer algorithm, ray-tracing is still much slower. However, today, with fast computers, we can compute a frame that used to take one hour in a few minutes or less. In fact, real-time and interactive ray-tracers are a hot topic.【换句话说就是慢,射线求交慢,但是在硬件越来越好的情况下,这越来越不是问题。但是相对于光栅化来讲,还是非常慢,但是实时的光线追踪已经是很热门的研究课题。】


To summarize, it is important to remember (again) that the rendering routine can be looked at as two separate processes. One step determines if a point is visible at a particular pixel (the visibility part), the second shades that point (the shading part). Unfortunately, both of the two steps require expensive and time consuming ray-geometry intersection tests. The algorithm is elegant and powerful but forces us to trade rendering time for accuracy and vise versa. Since Appel published his paper a lot of research has been done to accelerate the ray-object intersection routines. By combining these acceleration schemes with the new technology in computers, it has become easier to use ray-tracing to the point where it has been used in nearly every production rendering software.【总结一下光线追踪渲染可以分为两步,首先决定这个对象对于这个像素是否可见,然后对于这个点调色。不过这两步都需要射线求交计算,都非常耗时。】































RayTracing – Raytracing Algorithm in a Nutshell


The phenomena described by Ibn al-Haytham explains why we see objects. Two interesting remarks can be made based on his observations: firstly, without light we cannot see anything and secondly, without objects in our environment, we cannot see light. If we were to travel in intergalactic space, that is what would typically happen. If there is no matter around us, we cannot see anything but darkness even though photons are potentially moving through that space. Ibn al-Haytham解释了我们为什么可以看到物体,是基于两个有趣的现象,首先是没有光线我们看不到任何东西,其次是没有物体的世界我们看不到光线。】



Forward Tracing


If we are trying to simulate the light-object interaction process in a computer generated image, then there is another physical phenomena which we need to be aware of. Compared to the total number of rays reflected by an object, only a select few of them will ever reach the surface of our eye. Here is an example. Imagine we have created a light source which emits only one single photon at a time. Now let’s examine what happens to that photon. It is emitted from the light source and travels in a straight line path until it hits the surface of our object. Ignoring photon absorption, we can assume the photon is reflected in a random direction. If the photons hits the surface of our eye, we “see” the point where the photon was reflected from (figure 1).【在模拟光照过程的时候,我们需要注意的是光线经过物体的反射,只有少部分光线进入眼睛,下图就是在说明这个事情。】



We can now begin to look at the situation in terms of computer graphics. First, we replace our eyes with an image plane composed of pixels. In this case, the photons emitted will hit one of the many pixels on the image plane, increasing the brightness at that point to a value greater than zero. This process is repeated multiple times until all the pixels are adjusted, creating a computer generated image. This technique is called forward ray-tracing because we follow the path of the photon forward from the light source to the observer.【我们来模拟这个过程,首先用Image代替眼睛,光线从光源出发,Image接收到光线就增加亮度,直到走完所有的光线。这个方法叫做forward ray-tracing。】


However do you see a potential problem with this approach?【但是你会发现这个方法存在问题】


The problem is the following: in our example we assumed that the reflected photon always intersected the surface of the eye. In reality, rays are essentially reflected in every possible direction, each of which have a very, very small probability of actually hitting the eye. We would potentially have to cast zillions of photons from the light source to find only one photon that would strike the eye. In nature this is how it works, as countless photons travel in all directions at the speed of light. In the computer world, simulating the interaction of that many photons with objects in a scene is just not practical solution for reasons we will now explain.【问题是我们只有投射足够量的光子,其中的一小部分才会真的与眼睛相交变成有效的画面的一部分】


So you may think: “Do we really need to shoot photons in random directions? Since we know the eye’s position, why not just send the photon in that direction and see which pixel in the image it passes through, if any?” That would certainly be one possible optimization, however we can only use this method for certain types of material. For reasons we will explain in a later lesson on light-matter interaction, directionality is not important for diffuse surfaces. This is because a photon that hits a diffuse surface can be reflected in any direction within the hemisphere centered around the normal at the point of contact. However, if the surface is a mirror, and does not have diffuse characteristics, the ray can only be reflected in a very precise direction; the mirrored direction (something which we will learn how to compute later on). For this type of surface, we can not decide to artificially change the direction of the photon if it’s actually supposed to follow the mirrored direction. Meaning that this solution is not completely satisfactory.【因此我们就想,我们怎样提高光子的投射效率,一种方法是人工干预方向,在每一次的折返射的时候摒弃掉一些方向的光线,但是这样的做法存在的问题是,对于镜子这样的对象你无法有效的处理】


Even if we do decide to use this method, with a scene made up of diffuse objects only, we would still face one major problem. We can visualize the process of shooting photons from a light into a scene as if you were spraying light rays (or small particles of paint) onto an object’s surface. If the spray is not dense enough, some areas would not be illuminated uniformly.【不用上述方法的另一个原因是对于场景中占大多数的diffuse的物体,你无法通过上述方法化简计算量】


Imagine that we are trying to paint a teapot by making dots with a white marker pen onto a black sheet of paper (consider every dot to be a photon). As we see in the image below, to begin with only a few photons intersect with the teapot object, leaving many uncovered areas. As we continue to add dots, the density of photons increases until the teapot is “almost” entirely covered with photons making the object more easily recognisable.【下图所示我们想绘制一个茶壶,这个方法的绘制过程表现就是一个一个随机的白点增加的过程】


But shooting 1000 photons, or even X times more, will never truly guarantee that the surface of our object will be totally covered with photons. That’s a major drawback of this technique. In other words, we would probably have to let the program run until we decide that it had sprayed enough photons onto the object’s surface to get an accurate representation of it. This implies that we would need watch the image as it’s being rendered in order to decide when to stop the application. In a production environment, this simply isn’t possible. Plus, as we will see, the most expensive task in a ray-tracer is finding ray-geometry intersections. Creating many photons from the light source is not an issue, but, having to find all of their intersections within the scene would be prohibitively expensive.【但问题在于实际实现的过程中,无论你发射了多少条有限的光线,你都很难把所有的茶壶中间的黑洞填白,这事情是不可控的,而且代价昂贵】



Conclusion: Forward ray-tracing (or light tracing because we shoot rays from the light) makes it technically possible simulate the way light travels in nature on a computer. However, this method, as discussed, is not efficient or practical. In a seminal paper entitled “An Improved Illumination Model for Shaded Display” and published in 1980, Turner Whitted (one of the earliest researchers in computer graphics) wrote:forward是一种计算机模拟的方式,但是这个方法不实用。An Improved Illumination Model for Shaded Display这篇写到:】


“In an obvious approach to ray tracing, light rays emanating from a source are traced through their paths until they strike the viewer. Since only a few will reach the viewer, this approach is wasteful. In a second approach suggested by Appel, rays are traced in the opposite direction, from the viewer to the objects in the scene”.forward这种方法太浪费了,我们是否反过来思考光线的走势】


We will now look at this other mode, Whitted talks about.


Backward Tracing


Instead of tracing rays from the light source to the receptor (such as our eye), we trace rays backwards from the receptor to the objects. Because this direction is the reverse of what happens in nature, it is fittingly called backward ray-tracing or eye tracing because we shoot rays from the eye position?(figure 2). This method provides a convenient solution to the flaw of forward ray-tracing. Since our simulations cannot be as fast and as perfect as nature, we must compromise and trace a ray from the eye into the scene. If the ray hits an object then we find out how much light it receives by throwing another ray (called a light or shadow ray) from the hit point to the scene’s light. Occasionally this “light ray” is obstructed by another object from the scene, meaning that our original hit point is in a shadow; it doesn’t receive any illumination from the light. For this reason, we don’t name these rays light rays?but instead shadow rays. In CG literature, the first ray we shoot from the eye into the scene is called a primary ray, visibility ray, or camera ray.【我们来看反向光线追踪,如下图所示,其做法就是光线从眼睛出发反向去传播,直到回传到光源。】





In computer graphics the concept of shooting rays either from the light or from the eye is called path tracing. The term ray-tracing can also be used but the concept of path tracing suggests that this method of making computer generated images relies on following the path from the light to the camera (or vice versa). By doing so in an physically realistic way, we can easily simulate optical effects such caustics or the reflection of light by other surface in the scene (indirect illumination). These topics will be discussed in other lessons.【在计算机图形学中,从光线或从眼睛射出射线的概念被称为路径追踪。
术语光线跟踪也可以使用,但路径跟踪的概念表明,这种制作计算机生成图像的方法依赖于从光源到相机的路径(反之亦然)。 通过物理上逼真的方式,我们可以很容易地模拟光学效应,如焦场或场景中其他表面的反射(间接照明)。 这些主题将在其他课程中讨论。】



























RayTracing – How Does It Work?


To begin this lesson, we will explain how a three-dimensional scene is made into a viewable two-dimensional image. Once we understand that process and what it involves, we will be able to utilize a computer to simulate an “artificial” image by similar methods. We like to think of this section as the theory that more advanced CG is built upon.【这课程我们首先来解释怎么从3D场景获得2D图像】


In the second section of this lesson, we will introduce the ray-tracing algorithm and explain, in a nutshell, how it works. We have received email from various people asking why we are focused on ray-tracing rather than other algorithms. The truth is, we are not. Why did we chose to focus on ray-tracing in this introductory lesson? Simply because this algorithm is the most straightforward way of simulating the physical phenomena that cause objects to be visible. For that reason, we believe ray-tracing is the best choice, among other techniques, when writing a program that creates simple images.【然后我们会介绍ray-tracing算法,仅仅因为这个算法是模拟引起物体可见的物理现象的最直接的方式。


To start, we will lay the foundation with the ray-tracing algorithm. However, as soon as we have covered all the information we need to implement a scanline renderer, for example, we will show how to do that as well.【在了解ray-tracing之前,我们首先回顾一下扫描线算法】



How Does an Image Get Created?


Although it seems unusual to start with the following statement, the first thing we need to produce an image, is a two-dimensional surface (this surface needs to be of some area and cannot be a point). With this in mind, we can visualize a picture as a cut made through a pyramid whose apex is located at the center of our eye and whose height is parallel to our line of sight (remember, in order to see something, we must view along a line that connects to that object). We will call this cut, or slice, mentioned before, the image plane (you can see this image plane as the canvas used by painters). An image plane is a computer graphics concept and we will use it as a two-dimensional surface to project our three-dimensional scene upon. Although it may seem obvious, what we have just described is one of the most fundamental concepts used to create images on a multitude of different apparatuses. For example, an equivalent in photography is the surface of the film (or as just mentioned before, the canvas used by painters).【根据图形学的概念渲染就是用2D Image来展示3D 场景】




Perspective Projection


Let’s imagine we want to draw a cube on a blank canvas. The easiest way of describing the projection process is to start by drawing lines from each corner of the three-dimensional cube to the eye. To map out the object’s shape on the canvas, we mark a point where each line intersects with the surface of the image plane. For example, let us say that c0 is a corner of the cube and that it is connected to three other points: c1c2, and c3. After projecting these four points onto the canvas, we get c0′c1′c2′, and c3′. If c0c1 defines an edge, then we draw a line from c0′ to c1′. If c0c2 defines an edge, then we draw a line from c0′ to c2′.【在image上绘制一个Cube,最简单的方法就是顶点投影,然后顶点之间的连线处理】


If we repeat this operation for remaining edges of the cube, we will end up with a two-dimensional representation of the cube on the canvas. We have then created our first image using perspective projection. If we continually repeat this process for each object in the scene, what we get is an image of the scene as it appears from a particular vantage point. It was only at the beginning of the 15th century that painters started to understand the rules of perspective projection.【重复上述方法到6个面,就画完了Cube,在重复用于场景每一个物体,就渲染完成。这就是15世纪,画家从这方法开始理解透视】




Light and Color


Once we know where to draw the outline of the three-dimensional objects on the two-dimensional surface, we can add colors to complete the picture.【上面画完线框,下面上色】


To summarize quickly what we have just learned: we can create an image from a three-dimensional scene in a two step process. The first step consists of projecting the shapes of the three-dimensional objects onto the image surface (or image plane). This step requires nothing more than connecting lines from the objects features to the eye. An outline is then created by going back and drawing on the canvas where these projection lines intersect the image plane. As you may have noticed, this is a geometric process. The second step consists of adding colors to the picture’s skeleton.【快速总结,创建Image分为两步:第一步是投影,第二步是上色】


An object’s color and brightness, in a scene, is mostly the result of lights interacting with an object’s materials. Light is made up of photons (electromagnetic particles) that have, in other words, an electric component and a magnetic component. They carry energy and oscillate like sound waves as they travel in straight lines. Photons are emitted by a variety of light sources, the most notable example being the sun. If a group of photons hit an object, three things can happen: they can be either absorbed, reflected or transmitted. The percentage of photons reflected, absorbed, and transmitted varies from one material to another and generally dictates how the object appears in the scene. However, the one rule that all materials have in common is that the total number of incoming photons is always the same as the sum of reflected, absorbed and transmitted photons. In other words, if we have 100 photons illuminating a point on the surface of the object, 60 might be absorbed and 40 might be reflected. The total is still 100. In this particular case, we will never tally 70 absorbed and 60 reflected, or 20 absorbed and 50 reflected because the total of transmitted, absorbed and reflected photons has to be 100.【物体的颜色和亮度,是物体材质和光照合力的结果,具体解释就是光学那套。】


In science, we only differentiate two types of materials, metals which are called conductors and dielectrics. Dielectris include things such a glass, plastic, wood, water, etc. These materials have the property to be electrical insulators (pure water is an electrical insulator). Note that a dielectric material can either be transparent or opaque. Both the glass balls and the plastic balls in the image below are dielectric materials. In fact, every material is in away or another transparent to some sort of electromagnetic radiation. X-rays for instance can pass through the body.【材质分类我们只关心透明和不透明,不透明的会挡住光线穿过】


An object can also be made out of a composite, or a multi-layered, material. For example, one can have an opaque object (let’s say wood for example) with a transparent coat of varnish on top of it (which makes it look both diffuse and shiny at the same time like the colored plastic balls in the image below).【还有一种是半透明,比如皮肤这种,可以看作是有多层材质】



Let’s consider the case of opaque and diffuse objects for now. To keep it simple, we will assume that the absorption process is responsible for the object’s color. White light is made up of “red”, “blue”, and “green” photons. If a white light illuminates a red object, the absorption process filters out (or absorbs) the “green” and the “blue” photons. Because the object does not absorb the “red” photons, they are reflected. This is the reason why this object appears red. Now, the reason we see the object at all, is because some of the “red” photons reflected by the object travel towards us and strike our eyes. Each point on an illuminated area, or object, radiates (reflects) light rays in every direction. Only one ray from each point strikes the eye perpendicularly and can therefore be seen. Our eyes are made of photoreceptors that convert the light into neural signals. Our brain is then able to use these signals to interpret the different shades and hues (how, we are not exactly sure). This a very simplistic approach to describe the phenomena involved. Everything is explained in more detail in the lesson on color (which you can find in the section Mathematics and Physics for Computer Graphics.【光照原理的例子,初中物理不解释】



Like the concept of perspective projection, it took a while for humans to understand light. The Greeks developed a theory of vision in which objects are seen by rays of light emanating from the eyes. An Arab scientist, Ibn al-Haytham (c. 965-1039), was the first to explain that we see objects because the sun’s rays of light; streams of tiny particles traveling in straight lines were reflected from objects into our eyes, forming images (Figure 3). Now let us see how we can simulate nature with a computer!【这哥们第一次解释我们看到物体是因为光照。下面我们开始讲解怎么用计算机模拟这个物理现象】
































































Managing Transformations in Hierarchy

  • Introduction


One of the most fundamental aspects of 3D engine design is management of spatial relationship between objects. The most intuitive way of handling this issue is to organize objects in a tree structure (hierarchy), where each node stores its local transformation, relative to its parent.


The most common way to define the local transformation is to use a socalled TRS system, where the transformation is composed of translation, rotation, and scale. This system is very easy to use for both programmers using the engine as well as non-technical users like level designers. In this chapter we describe the theory behind such a system.


One problem with the system is decomposition of a matrix back to TRS. It turns out that this problem is often ill-defined and no robust solution exists. We present an approximate solution that works reasonably well in the majority of cases.


  • Theory



Keeping objects in hierarchy is a well-known concept. Every object can have a number of children and only one parent. It can also be convenient to store and manage a list of pointers to the children so that we have fast access to them. The aforementioned structure is in fact a tree.



We assume that a node stores its translation, rotation, and scale (TRS) that are relative to its parent. Therefore, we say these properties are local. When we move an object, we drag all its children with it. If we increase scale of the object, then all of its children will become larger too.







Local TRS uniquely defines a local transformation matrix M. We transform vector v in the following way:

where S is an arbitrary scale matrix, R is an arbitrary rotation matrix, T is a translation matrix, and T is the vector matrix T is made of.



To render an object, we need to obtain its global (world) transformation by composing local transformations of all the object’s ancestors up in the hierarchy.

The composition is achieved by simply multiplying local matrices. Given a vector v0, its local matrix M0, and the local matrix M1 of v0’s parent, we can find the global position v2:

Using vector notation for translation, we get


RS != S’R’


Skew Problem



Applying a nonuniform scale (coming from object A) that follows a local rotation (objects B and C) will cause objects (B and C) to be skewed. Skew can appear during matrices composition but it becomes a problem during the decomposition, as it cannot be expressed within a single TRS node. We give an approximate solution to this issue in Section 3.2.4.


Let an object have n ancestors in the hierarchy tree. Let M1,M2, · · · ,Mn be their local transformation matrices, M0 be a local transformation matrix of the considered object, and Mi = SiRiTi.

MTRSΣ = M0M1 · · ·Mn

MTR = R0T0R1T1 · · ·RnTn



here we have the skew and the scale combined. We use diagonal elements of MSΣ to get the scale, and we choose to ignore the rest that is responsible for the skew.

Scale 的话用这边算出来的对角线来表示,其他的结果丢弃采用上面的TR,这样的结果就可以避免skew.




In a 3D engine we often need to modify objects’ parent-children relationship.

we want to change the local transformation such that the global transformation is still the same. Obviously, that forces us to recompute local TRS values of the object whose parent we’re changing.


To get from the current local space to a new local space (parent changes, global transform stays the same), we first need to find the global transform of the object by going up in the hierarchy to the root node. Having done this we need to go down the hierarchy to which our new parent belongs.


LetM’0 be the new parent’s local transformation matrix. Let that new parent have n’ ancestors in the hierarchy tree with local transformations M’1,M’2, · · · ,M’n’, where M’i = S’iR’iT’i. The new local transformation matrix can thus be found using the following formula:




Alternative Systems


这边主要讲 Scale 处理,和skew相关

做法:除了叶节点存储x,y,z不相同的,各项异的scale值(三维向量)(nonuniform scale in last node),其他节点存储的是uniform scale值(不是三维向量,是值)这样可以有效的解决skew问题且实现简单。


  • Implementation














Object Space Ambient Occlusion for Molecular Dynamics (OSAO)

Michael Bunnell

NVIDIA Corporation

In this chapter we describe a new technique for computing diffuse light transfer and show how it can be used to compute global illumination for animated scenes. Our technique is efficient enough when implemented on a fast GPU to calculate ambient occlusion and indirect lighting data on the fly for each rendered frame. It does not have the limitations of precomputed radiance transfer(光辐射传输) (PRT) or precomputed ambient occlusion techniques, which are limited to rigid objects that do not move relative to one another (Sloan 2002). Figure 14-1 illustrates how ambient occlusion and indirect lighting enhance environment lighting.

【这里介绍一种高效的基于GPU运算的ambient occlusion技术。这里突破了一般预计算方式的只可应用于静态对象的局限。】


Figure 14-1 Adding Realism with Ambient Occlusion and Indirect Lighting

Our technique works by treating polygon meshes as a set of surface elements that can emit, transmit, or reflect light and that can shadow each other. This method is so efficient because it works without calculating the visibility of one element to another. Instead, it uses a much simpler and faster technique based on approximate shadowing to account for occluding (blocking) geometry.

【我们的技术把多边形表面看作是一组表面的单元集合,他们之间可以emit, transmit, reflect shadow,通过这样的近似可以简单快速的获得起到阻塞效果的几何的形状。】

14.1 Surface Elements

The first step in our algorithm is to convert the polygonal data to surface elements to make it easy to calculate how much one part of a surface shadows or illuminates another.

【这里算法的第一步就是将多边形数据转化成surface elements。】

Figure 14-2 illustrates the basic concept. We define a surface element as an oriented disk with a position, normal, and area. An element has a front face and a back face. Light is emitted and reflected from the front-facing side. Light is transmitted and shadows are cast from the back. We create one element per vertex of the mesh. Assuming that the vertices are defined with a position and normal already, we just need to calculate the area of each element. We calculate the area at a vertex as the sum of one-third of the area of the triangles that share the vertex (or one-fourth of the area for quads). Heron’s formula for the area of a triangle with sides of length a, b, and c is:


where s is half the perimeter of the triangle: (a + b + c)/2.

【下图展示的就是这一步的概念示意,surface element定义成圆形表面包含位置/法向/area信息。 surface包含正反面,光线从正面emit/reflect,反面形成transmit/shadow

对于多边形的每一个顶点生成一个surface element. 顶点的位置法线直接赋予surface elementarea的计算由使用到这个顶点的三角形的面积总和的三分之一,计算公式如上。】


Figure 14-2 Converting a Polygonal Mesh to Elements

We store element data (position, normal, and area) in texture maps because we will be using a fragment program (that is, a pixel shader) to do all the ambient occlusion calculations. Assuming that vertex positions and normals will change for each frame, we need to be able to change the values in the texture map quickly.

One option is to keep vertex data in a texture map from the start and to do all the animation and transformation from object space to eye (or world) space with fragment programs instead of vertex programs. We can use render-to-vertex-array to create the array of vertices to be sent down the regular pipeline, and then use a simple pass-through vertex shader.

Another, less efficient option is to do the animation and transformation on the CPU and load a texture with the vertex data each frame.

【我们需要把surface elementposition/normal/area的信息存储到texture用于pixel shader. 假设顶点的位置法线是每个frame都变化的,因此我们需要快速改变texture的值。

一种可行的方案是一直保持一开始的时候的顶点信息,之后动画的变化完全由eye space/pixel shader来替代object space/vertex shader的处理,然后render to vertex array生成顶点数组,再交由正常的流水线再处理,之后就是一个简单的vertex shader可以搞定了。


14.2 Ambient Occlusion

Ambient occlusion is a useful technique for adding shadowing to diffuse objects lit with environment lighting. Without shadows, diffuse objects lit from many directions look flat and unrealistic. Ambient occlusion provides soft shadows by darkening surfaces that are partially visible to the environment. It involves calculating the accessibility value, which is the percentage of the hemisphere above each surface point not occluded by geometry (Landis 2002). In addition to accessibility, it is also useful to calculate the direction of least occlusion, commonly known as the bent normal. The bent normal is used in place of the regular normal when shading the surface for more accurate environment lighting.


We can calculate the accessibility(辅助) value at each element as 1 minus the amount by which all the other elements shadow the element. We refer to the element that is shadowed as the receiver and to the element that casts the shadow as the emitter. We use an approximation based on the solid angle of an oriented disk to calculate the amount by which an emitter element shadows a receiver element. Given that A is the area of the emitter, the amount of shadow can be approximated by:


Equation 14-1 Shadow Approximation

【计算辅助值:1减去所有其他element在此的阴影。Element 作为接收者 shadowed,作为发光者造成阴影。因为发光者和接收阴影着的角度都是已知的,,我们采用上面的公式来估算,配合下面的示意图。Aemitter的面积。】

As illustrated in Figure 14-3, qE is the angle between the emitter’s normal and the vector from the emitter to the receiver. qR is the corresponding angle for the receiver element. The max(1, 4 x cos qR ) term is added to the disk solid angle formula to ignore emitters that do not lie in the hemisphere above the receiver without causing rendering artifacts for elements that lie near the horizon.



Figure 14-3 The Relationship Between Receiver and Emitter Elements

Here is the fragment program function to approximate the element-to-element occlusion:


bgt_4_6计算机生成了可选文字: El em entShadow (Elo at3 oat oat El oats e as sme that rs quar e d, o at3 oat (1 emittuArea has already divided Ey r qrt(emi t terkrea/rSquared dot

14.2.1 The Multipass Shadowing Algorithm

We calculate the accessibility values(辅助值) in two passes.


In the first pass, we approximate the accessibility for each element by summing the fraction(分数) of the hemisphere(半球) subtended(对着) by every other element and subtracting(减法) the result from 1.


After the first pass, some elements will generally be too dark because other elements that are in shadow are themselves casting shadows. So we use a second pass to do the same calculation, but this time we multiply each form factor by the emitter element’s accessibility from the last pass.

【经过第一步会导致有些elements太暗了,原因在于存在投影的过度叠加。因此第二个pass做同样的计算,但是这里我们乘上每一个emitter elements的上一步计算出来的辅助值。】

The effect is that elements that are in shadow will cast fewer shadows on other elements, as illustrated in Figure 14-4. After the second pass, we have removed any double shadowing.

【效果如下图所示,通过第二步我们解决的是double shadowing导致的太暗的问题】

However, surfaces that are triple shadowed or more will end up being too light. We can use more passes to get a better approximation, but we can approximate the same answer by using a weighted average of the combined results of the first and second passes. Figure 14-5 shows the results after each pass, as well as a ray-traced solution for comparison. The bent normal calculation is done during the second pass. We compute the bent normal by first multiplying the normalized vector between elements and the form factor. Then we subtract this result from the original element normal.

【其实通过上面的两步还是得不到很好的结果,比如第二步只去除的是双重叠加的效果,如果是三重叠加我们还需要更进一步的 pass来去除叠加效果,这是个无底洞。 因此我们采用对第二步的结果再设置权重值的方式来获得更好的近似效果,下下图就是结果展示。】


Figure 14-4 Correcting for Occlusion by Overlapping Objects


Figure 14-5 Comparing Models Rendered with Our Technique to Reference Images

We calculate the occlusion result by rendering a single quad (or two triangles) so that one pixel is rendered for each surface element. The shader calculates the amount of shadow received at each element and writes it as the alpha component of the color of the pixel. The results are rendered to a texture map so the second pass can be performed with another render. In this pass, the bent normal is calculated and written as the RGB value of the color with a new shadow value that is written in the alpha component.

【每一个pass,一个surface element当作一个pixel来处理,这样shader将每个element计算得到的阴影值作为这个pixelalpha值,结果渲染到texture map,这样就可以用于下一个passnormal值当作textureRGB分量参与计算。】

14.2.2 Improving Performance

Even though the element-to-element shadow calculation is very fast (a GeForce 6800 can do 150 million of these calculations per second), we need to improve our algorithm to work on more than a couple of thousand elements in real time. We can reduce the amount of work by using simplified geometry for distant surfaces. This approach works well for diffuse lighting environments because the shadows are so soft that those cast by details in distant geometry are not visible. Fortunately, because we do not use the polygons themselves in our technique, we can create surface elements to represent simplified geometry without needing to create alternate polygonal models. We simply group elements whose vertices are neighbors in the original mesh and represent them with a single, larger element. We can do the same thing with the larger elements, creating fewer and even larger elements, forming a hierarchy. Now instead of traversing every single element for each pixel we render, we traverse the hierarchy of elements. If the receiver element is far enough away from the emitter—say, four times the radius of the emitter—we use it for our calculation. Only if the receiver is close to an emitter do we need to traverse its children (if it has any). See Figure 14-6. By traversing a hierarchy in this way, we can improve the performance of our algorithm from O(n 2) to O(n log n) in practice. The chart in Figure 14-7 shows that the performance per vertex stays consistent as the number of vertices in the hierarchy increases.

【其实这样的element to element(pixel to pixel)的计算已经很快了。我们要增强我们的算法来尽可能多的支持顶点(element/pixel)数。这里的想法就是通过空间几何关系,相邻的一些定点可以组合当作一个element group(计算的时候当作一个element)来处理,然后起作用再细分,就是一般层次化的方法。】


Figure 14-6 Hierarchical Elements


Figure 14-7 Ambient Occlusion Shader Performance for Meshes of Different Densities


We calculate a parent element’s data using its direct descendants in the hierarchy. We calculate the position and normal of a parent element by averaging the positions and normals of its children. We calculate its area as the sum of its children’s areas. We can use a shader for these calculations by making one pass of the shader for each level in the hierarchy, propagating the values from the leaf nodes up. We can then use the same technique to average the results of an occlusion pass that are needed for a following pass or simply treat parent nodes the same as children and avoid the averaging step. It is worth noting that the area of most animated elements varies little, if at all, even for nonrigid objects; therefore, the area does not have to be recalculated for each frame.


The ambient occlusion fragment shader appears in Listing 14-1.


bgt_4_11计算机生成了可选文字: Ambi entOcc1usi o at4 o at4 o at4 i for m 1 as tRe tMap i for m posi ti onMap i for m i for m oat o at4 o at4 o at3 o at3 o at3 oat o at4 or-mal = rNorma_1 rmnurl el em entNorm rEKUNIT3) nom COL posi t 1 vector receiva t used to calculate El at3 bent N oat oat H th recenvu nonal


计算机生成了可选文字: (em t terlndewx texREcr om XYZ not shed Yaversal , emi t terlnd xy) emi t terlnd xy) ge t t calc squued. r qrt(d2). value = texREcr s receiver close to puent < —4*emitterkrea) (p s hav e go eruchy em 1 t terkre E, rNorma_1, eNorma_L modulate normal by last remlt bentNorma1 value total value; (1 only need normal for last else retu-n Eloat4 normali ze(bentNorma1),

Example 14-1. Ambient Occlusion Shader

14.3 Indirect Lighting and Area Lights

We can add an extra level of realism to rendered images by adding indirect lighting caused by light reflecting off diffuse surfaces (Tabellion 2004). We can add a single bounce of indirect light using a slight variation of the ambient occlusion shader. We replace the solid angle function with a disk-to-disk radiance transfer function. We use one pass of the shader to transfer the reflected or emitted light and two passes to shadow the light.


For indirect lighting, first we need to calculate the amount of light to reflect off the front face of each surface element. If the reflected light comes from environment lighting, then we compute the ambient occlusion data first and use it to compute the environment light that reaches each vertex. If we are using direct lighting from point or directional lights, we compute the light at each element just as if we are shading the surface, including shadow mapping. We can also do both environment lighting and direct lighting and sum the two results. We then multiply the light values by the color of the surface element, so that red surfaces reflect red, yellow surfaces reflect yellow, and so on. Area lights are handled just like light-reflective diffuse surfaces except that they are initialized with a light value to emit.

【这里解释怎么合兵:首先我们要得到直接光照的结果和OSAO的结果,直接光照结果的计算来自于一般的光照计算方法方法shadow map。亮度就是两种光照结果只和,颜色就是光线颜色。面积光就当作是发光表面来处理。】

Here is the fragment program function to calculate element-to-element radiance transfer:

element-to-element radiance transfer处理的代码片段】


计算机生成了可选文字: oat El o at3 El oats e oat o at3 oat that ttuArea has vided by PI ate ( dot


Equation 14-2 Disk-to-Disk Form Factor Approximation

We calculate the amount of light transferred from one surface element to another using the geometric term of the disk-to-disk form factor given in Equation 14-2. We leave off the visibility factor, which takes into account blocking (occluding) geometry. Instead we use a shadowing technique like the one we used for calculating ambient occlusion—only this time we use the same form factor that we used to transfer the light. Also, we multiply the shadowing element’s form factor by the three-component light value instead of a single-component accessibility value.

【我们使用上面的公式来计算光线从一个element transfer 到另一个。也就是说我们这里用了OSAO那种思想来做光线的传播。】

We now run one pass of our radiance-transfer shader to calculate the maximum amount of reflected or emitted light that can reach any element. Then we run a shadow pass that subtracts from the total light at each element based on how much light reaches the shadowing elements. Just as with ambient occlusion, we can run another pass to improve the lighting by removing double shadowing. Figure 14-8 shows a scene lit with direct lighting plus one and two bounces of indirect lighting.

【我们首先用一个pass来跑radiance-transfer shader来计算element之间的光线的发出和反射来得到每一个element的光线总和,接着跑shadow pass:从到达element的光线总和的结果再减去这个pass计算的结果就是AO的结果,处理多重阴影的覆盖问题就是通过多个pass和参数解,见上面的讲解。下图展示结果】


Figure 14-8 Combining Direct and Indirect Lighting

14.4 Conclusion

Global illumination techniques such as ambient occlusion and indirect lighting greatly enhance the quality of rendered diffuse surfaces. We have presented a new technique for calculating light transfer to and from diffuse surfaces using the GPU. This technique is suitable for implementing various global illumination effects in dynamic scenes with deformable geometry.


14.5 References

Landis, Hayden. 2002. “Production-Ready Global Illumination.” Course 16 notes, SIGGRAPH 2002.

Pharr, Matt, and Simon Green. 2004. “Ambient Occlusion.” In GPU Gems, edited by Randima Fernando, pp. 279–292. Addison-Wesley.

Sloan, Peter-Pike, Jan Kautz, and John Snyder. 2002. “Precomputed Radiance Transfer for Real-Time Rendering in Dynamic, Low-Frequency Lighting Environments.” ACM Transactions on Graphics (Proceedings of SIGGRAPH 2002) 21(3), pp. 527–536.

Tabellion, Eric, and Arnauld Lamorlette. 2004. “An Approximate Global Illumination System for Computer Generated Films.” ACM Transactions on Graphics (Proceedings of SIGGRAPH 2004) 23(3), pp. 469–476.

Screen Space Ambient Occlusion(SSAO)


Ambient occlusion is an approximation of the amount by which a point on a surface is occluded by the surrounding geometry, which affects the accessibility of that point by incoming light. (主要看是否靠近物体)

In effect, ambient occlusion techniques allow the simulation of proximity shadows – the soft shadows that you see in the corners of rooms and the narrow spaces between objects. (用于模拟软阴影)

Ambien occlusion is often subtle, but will dramatically improve the visual realism of a computer-generated scene:



The basic idea is to compute an occlusion factor(阻塞要素) for each point on a surface and incorporate(合并) this into the lighting model, usually by modulating the ambient term such that more occlusion = less light, less occlusion = more light. Computing the occlusion factor can be expensive; offline renderers typically do it by casting a large number of rays in a normal-oriented hemisphere to sample the occluding geometry around a point. In general this isn’t practical for realtime rendering.

To achieve interactive frame rates, computing the occlusion factor needs to be optimized as far as possible. One option is to pre-calculate it, but this limits how dynamic a scene can be (the lights can move around, but the geometry can’t).(速度是大问题)


Way back in 2007, Crytek implemented a realtime solution for Crysis, which quickly became the yardstick for game graphics. The idea is simple: use per-fragment depth information as an approximation of the scene geometry and calculate the occlusion factor in screen space. This means that the whole process can be done on the GPU, is 100% dynamic and completely independent of scene complexity. Here we’ll take a quick look at how the Crysis method works, then look at some enhancements.

Rather than(与其) cast(投射) rays in a hemisphere, Crysis samples the depth buffer at points derived(来源) from samples in a sphere:[在深度buffer以当前点为中心的一个圆内取sample]



This works in the following way:

  • project each sample point into screen space to get the coordinates into the depth buffer(获得深度图及坐标)
  • sample the depth buffer(取深度图的sample)
  • if the sample position is behind the sampled depth (i.e. inside geometry), it contributes to the occlusion factor(sample平均值小于其本身深度值,则起作用)

Clearly the quality of the result is directly proportional to the number of samples, which needs to be minimized in order to achieve decent performance. Reducing the number of samples, however, produces ugly ‘banding’ artifacts in the result. This problem is remedied by randomly rotating the sample kernel at each pixel, trading banding for high frequency noise which can be removed by blurring the result.


The Crysis method produces occlusion factors with a particular ‘look’ – because the sample kernel is a sphere, flat walls end up looking grey because ~50% of the samples end up being inside the surrounding geometry. Concave corners darken as expected, but convex ones appear lighter since fewer samples fall inside geometry. Although these artifacts are visually acceptable, they produce a stylistic effect which strays somewhat from photorealism.


Rather than sample a spherical kernel at each pixel, we can sample within a hemisphere, oriented along the surface normal at that pixel. This improves the look of the effect with the penalty of requiring per-fragment normal data. For a deferred renderer, however, this is probably already available, so the cost is minimal (especially when compared with the improved quality of the result).



  • Generating the Sample Kernel

The first step is to generate the sample kernel itself. The requirements are that

  • sample positions fall within the unit hemisphere
  • sample positions are more densely clustered towards the origin. This effectively attenuates the occlusion contribution according to distance from the kernel centre – samples closer to a point occlude it more than samples further away

Generating the hemisphere is easy:

This creates sample points on the surface of a hemisphere oriented along the z axis.(先建一个标准半球) The choice of orientation is arbitrary(随意) – it will only affect the way we reorient the kernel in the shader. The next step is to scale each of the sample positions to distribute them within the hemisphere. This is most simply done as:

which will produce an evenly distributed set of points. What we actually want is for the distance from the origin to falloff as we generate more points, according to a curve like this:(权重和距离相关)


  • Generating the Noise Texture

Next we need to generate a set of random values used to rotate the sample kernel, which will effectively increase the sample count and minimize the ‘banding’ artefacts mentioned previously.

Note that the z component is zero; since our kernel is oriented along the z-axis, we want the random rotation to occur around that axis.(竟然是random rotation!难道不能是顶点或者面法线更符合实际情况?)

These random values are stored in a texture and tiled over(铺满) the screen. The tiling of the texture causes the orientation of the kernel to be repeated and introduces regularity into the result. By keeping the texture size small we can make this regularity occur at a high frequency, which can then be removed with a blur step that preserves the low-frequency detail of the image. Using a 4×4 texture and blur kernel produces excellent results at minimal cost. This is the same approach as used in Crysis.

  • The SSAO Shader

With all the prep work done, we come to the meat of the implementation: the shader itself. There are actually two passes: calculating the occlusion factor, then blurring the result.

Calculating the occlusion factor requires first obtaining the fragment’s view space position and normal:

I reconstruct the view space position by combining the fragment’s linear depth with the interpolated vViewRay. See Matt Pettineo’s blog for a discussion of other methods for reconstructing position from depth. The important thing is that origin ends up being the fragment’s view space position.

Retrieving(检索) the fragment’s normal is a little more straightforward(直截了当); the scale/bias and normalization steps are necessary unless you’re using some high precision format to store the normals:

Next we need to construct a change-of-basis matrix to reorient our sample kernel along the origin’s normal. We can cunningly(巧妙地) incorporate(合并) the random rotation here, as well:

(这儿可以看作shader 如何使用random数范例)

The first line retrieves a random vector rvec from our noise texture. uNoiseScale is a vec2 which scales vTexcoord to tile the noise texture. So if our render target is 1024×768 and our noise texture is 4×4, uNoiseScale would be (1024 / 4, 768 / 4). (This can just be calculated once when initialising the noise texture and passed in as a uniform).

The next three lines use the Gram-Schmidt process to compute an orthogonal basis, incorporating our random rotation vector rvec.

The last line constructs the transformation matrix from our tangent, bitangent and normal vectors. The normal vector fills the z component of our matrix because that is the axis along which the base kernel is oriented.

Next we loop through the sample kernel (passed in as an array of vec3, uSampleKernel), sample the depth buffer and accumulate the occlusion factor:

Getting the view space sample position is simple; we multiply by our orientation matrix tbn, then scale the sample by uRadius (a nice artist-adjustable factor, passed in as a uniform) then add the fragment’s view space position origin.

We now need to project sample (which is in view space) back into screen space to get the texture coordinates with which we sample the depth buffer. This step follows the usual process – multiply by the current projection matrix (uProjectionMat), perform w-divide then scale and bias to get our texture coordinate: offset.xy.

Next we read sampleDepth out of the depth buffer (uTexLinearDepth). If this is in front of the sample position, the sample is ‘inside’ geometry and contributes to occlusion. If sampleDepth is behind the sample position, the sample doesn’t contribute to the occlusion factor. Introducing a rangeCheck helps to prevent erroneous occlusion between large depth discontinuities:


As you can see, rangeCheck works by zeroing any contribution from outside the sampling radius.

The final step is to normalize the occlusion factor and invert it, in order to produce a value that can be used to directly scale the light contribution.

  • The Blur Shader

The blur shader is very simple: all we want to do is average a 4×4 rectangle around each pixel to remove the 4×4 noise pattern:

The only thing to note in this shader is uTexelSize, which allows us to accurately sample texel centres based on the resolution of the AO render target.



The normal-oriented hemisphere method produces a more realistic-looking than the basic Crysis method, without much extra cost, especially when implemented as part of a deferred renderer where the extra per-fragment data is readily available. It’s pretty scalable, too – the main performance bottleneck is the size of the sample kernel, so you can either go for fewer samples or have a lower resolution AO target.

A demo implementation is available here.




超级采样抗锯齿(Super-Sampling Anti-aliasing,简称SSAA)此是早期抗锯齿方法,比较消耗资源,但简单直接,先把图像映射到缓存并把它放大,再用超级采样把放大后的图像像素进行采样,一般选取2个或4个邻近像素,把这些采样混合起来后,生成的最终像素,令每个像素拥有邻近像素的特征,像素与像素之间的过渡色彩,就变得近似,令图形的边缘色彩过渡趋于平滑。再把最终像素还原回原来大小的图像,并保存到帧缓存也就是显存中,替代原图像存储起来,最后输出到显示器,显示出一帧画面。这样就等于把一幅模糊的大图,通过细腻化后再缩小成清晰的小图。如果每帧都进行抗锯齿处理,游戏或视频中的所有画面都带有抗锯齿效果。而将图像映射到缓存并把它放大时,放大的倍数被用于分别抗锯齿的效果,如:图1AA后面的x2x4x8就是原图放大的倍数。 超级采样抗锯齿中使用的采样法一般有两种:

1.顺序栅格超级采样(Ordered Grid Super-Sampling,简称OGSS),采样时选取2个邻近像素。

2.旋转栅格超级采样(Rotated Grid Super-Sampling,简称RGSS),采样时选取4个邻近像素。


多重采样抗锯齿(MultiSampling Anti-Aliasing,简称MSAA)是一种特殊的超级采样抗锯齿(SSAA)。MSAA首先来自于OpenGL。具体是MSAA只对Z缓存(Z-Buffer)和模板缓存(Stencil Buffer)中的数据进行超级采样抗锯齿的处理。可以简单理解为只对多边形的边缘进行抗锯齿处理。这样的话,相比SSAA对画面中所有数据进行处理,MSAA对资源的消耗需求大大减弱,不过在画质上可能稍有不如SSAA


覆盖采样抗锯齿(CoverageSampling Anti-Aliasing,简称CSAA)是nVidiaG80及其衍生产品首次推向实用化的AA技术,也是目前nVidia GeForce 8/9/G200系列独享的AA技术。CSAA就是在MSAA基础上更进一步的节省显存使用量及带宽,简单说CSAA就是将边缘多边形里需要取样的子像素坐标覆盖掉,把原像素坐标强制安置在硬件和驱动程序预先算好的坐标中。这就好比取样标准统一的MSAA,能够最高效率的执行边缘取样,效能提升非常的显著。比方说16xCSAA取样性能下降幅度仅比4xMSAA略高一点,处理效果却几乎和8xMSAA一样。8xCSAA有着4xMSAA的处理效果,性能消耗却和2xMSAA相同。[1]



可编程过滤抗锯齿(Custom Filter Anti-Aliasing)技术起源于AMD-ATIR600家庭。简单地说CFAA就是扩大取样面积的MSAA,比方说之前的MSAA是严格选取物体边缘像素进行缩放的,而CFAA则可以通过驱动和谐灵活地选择对影响锯齿效果较大的像素进行缩放,以较少的性能牺牲换取平滑效果。显卡资源占用也比较小。


快速近似抗锯齿(Fast Approximate Anti-Aliasing) 它是传统MSAA(多重采样抗锯齿)效果的一种高性能近似值。它是一种单程像素着色器,和MLAA一样运行于目标游戏渲染管线的后期处理阶段,但不像后者那样使用DirectCompute,而只是单纯的后期处理着色器,不依赖于任何GPU计算API。正因为如此,FXAA技术对显卡没有特殊要求,完全兼容NVIDIAAMD的不同显卡(MLAA仅支持A)DX9DX10DX11





TXAA 是一种全新的电影风格抗锯齿技术,旨在减少时间性锯齿 (运动中的蠕动和闪烁) 该技术集时间性过滤器、硬件抗锯齿以及定制的 CG 电影式抗锯齿解算法于一身。 要过滤屏幕上任意特定的像素,TXAA 需要使用像素内部和外部的采样以及之前帧中的采样,以便提供超级高画质的过滤。 TXAA 在标准 2xMSAA 4xMSAA 的基础上改进了时间性过滤。 例如,在栅栏或植物上以及在运动画面中,TXAA 已经开始接近、有时甚至超过了其它高端专业抗锯齿算法的画质。TXAA 由于采用更高画质的过滤,因而与传统 MSAA 较低画质的过滤相比,图像更加柔和。



NVIDIA(英伟达)根据MSAA改进出的一种抗锯齿技术。目前只有使用麦克斯韦架构GPU的显卡才可以使用。在 Maxwell 上,英伟达推出了用于光栅化的可编程采样位置,它们被存储在随机存取存储器 (RAM) 中。如此一来便为更灵活、更创新的全新抗锯齿技术创造了机会,这类抗锯齿技术能够独特地解决现代游戏引擎所带来的难题,例如高画质抗锯齿对性能的更高要求。只要在NVIDIA控制面板里为程序开启MFAA并在游戏中选择MSAA就可以开启。画面表现明显强于同级别的MSAA,这种全新抗锯齿技术在提升边缘画质的同时能够将性能代价降至最低。通过在时间和空间两方面交替使用抗锯齿采样格式,4xMFAA 的性能代价仅相当于 2xMSAA,但是抗锯齿效果却与 4xMSAA 相当。[3]


Defered/Forward Rendering

1. rendering path的技术基础

在介绍各种光照渲染方式之前,首先必须介绍一下现代的图形渲染管线。这是下面提到的几种Rendering Path的技术基础。


目前主流的游戏和图形渲染引擎,包括底层的API(如DirectXOpenGL)都开始支持现代的图形渲染管线。现代的渲染管线也称为可编程管线(Programmable Pipeline),简单点说就是将以前固定管线写死的部分(比如顶点的处理,像素颜色的处理等等)变成在GPU上可以进行用户自定义编程的部分,好处就是用户可以自由发挥的空间增大,缺点就是必须用户自己实现很多功能。

下面简单介绍下可编程管线的流程。以OpenGL绘制一个三角形举例。首先用户指定三个顶点传给Vertex Shader。然后用户可以选择是否进行Tessellation Shader(曲面细分可能会用到)和Geometry Shader(可以在GPU上增删几何信息)。紧接着进行光栅化,再将光栅化后的结果传给Fragment Shader进行pixel级别的处理。最后将处理的像素传给FrameBuffer并显示到屏幕上。

2. 几种常用的Rendering Path

Rendering Path其实指的就是渲染场景中光照的方式。由于场景中的光源可能很多,甚至是动态的光源。所以怎么在速度和效果上达到一个最好的结果确实很困难。以当今的显卡发展为契机,人们才衍生出了这么多的Rendering Path来处理各种光照。

2.1 Forward Rendering


Forward Rendering是绝大数引擎都含有的一种渲染方式。要使用Forward Rendering,一般在Vertex Shader或Fragment Shader阶段对每个顶点或每个像素进行光照计算,并且是对每个光源进行计算产生最终结果。下面是Forward Rendering的核心伪代码[1]。

比如在Unity3D 4.x引擎中,对于下图中的圆圈(表示一个Geometry),进行Forward Rendering处理。




也就是说,对于ABCD四个光源我们在Fragment Shader中我们对每个pixel处理光照,对于DEFG光源我们在Vertex Shader中对每个vertex处理光照,而对于GH光源,我们采用球调和(SH)函数进行处理。

Forward Rendering优缺点

很明显,对于Forward Rendering,光源数量对计算复杂度影响巨大,所以比较适合户外这种光源较少的场景(一般只有太阳光)。

但是对于多光源,我们使用Forward Rendering的效率会极其低下。光源数目和复杂度是成线性增长的。


1.多在vertex shader中进行光照处理,因为有一个几何体有10000个顶点,那么对于n个光源,至少要在vertex shader中计算10000n次。而对于在fragment shader中进行处理,这种消耗会更多,因为对于一个普通的1024×768屏幕,将近有8百万的像素要处理。所以如果顶点数小于像素个数的话,尽量在vertex shader中进行光照。

2.如果要在fragment shader中处理光照,我们大可不必对每个光源进行计算时,把所有像素都对该光源进行处理一次。因为每个光源都有其自己的作用区域。比如点光源的作用区域是一个球体,而平行光的作用区域就是整个空间了。对于不在此光照作用区域的像素就不进行处理。但是这样做的话,CPU端的负担将加重,因为要计算作用区域。


2.2 Deferred Rendering


Deferred Rendering(延迟渲染)顾名思义,就是将光照处理这一步骤延迟一段时间再处理。具体做法就是将光照处理这一步放在已经三维物体生成二维图片之后进行处理。也就是说将物空间的光照处理放到了像空间进行处理。要做到这一步,需要一个重要的辅助工具——G-Buffer。G-Buffer主要是用来存储每个像素对应的Position,Normal,Diffuse Color和其他Material parameters。根据这些信息,我们就可以在像空间中对每个像素进行光照处理[3]。下面是Deferred Rendering的核心伪代码。


首先我们用存储各种信息的纹理图。比如下面这张Depth Buffer,主要是用来确定该像素距离视点的远近的。


. Depth Buffer



.Specular Intensity/Power



.Normal Buffer

下图使用了Diffuse Color Buffer。


.Diffuse Color Buffer

这是使用Deferred Rendering最终的结果。


.Deferred Lighting Results

Deferred Rendering的最大的优势就是将光源的数目和场景中物体的数目在复杂度层面上完全分开。也就是说场景中不管是一个三角形还是一百万个三角形,最后的复杂度不会随光源数目变化而产生巨大变化。从上面的伪代码可以看出deferred rendering的复杂度为 。

但是Deferred Rendering局限性也是显而易见。比如我在G-Buffer存储以下数据



对于Deferred Rendering的优化也是一个很有挑战的问题。下面简单介绍几种降低Deferred Rendering存取带宽的方式。最简单也是最容易想到的就是将存取的G-Buffer数据结构最小化,这也就衍生出了light pre-pass方法。另一种方式是将多个光照组成一组,然后一起处理,这种方法衍生了Tile-based deferred Rendering。

2.2.1 Light Pre-Pass

Light Pre-Pass最早是由Wolfgang Engel在他的博客[2]中提到的。具体的做法是

(1)只在G-Buffer中存储Z值和Normal值。对比Deferred Render,少了Diffuse Color Specular Color以及对应位置的材质索引值。

(2)FS阶段利用上面的G-Buffer计算出所必须的light properties,比如Normal*LightDir,LightColor,Specularlight properties。将这些计算出的光照进行alpha-blend并存入LightBuffer(就是用来存储light propertiesbuffer)。

(3)最后将结果送到forward rendering渲染方式计算最后的光照效果。

相对于传统的Deferred Render,使用Light Pre-Pass可以对每个不同的几何体使用不同的shader进行渲染,所以每个物体的material properties将有更多变化。这里我们可以看出相对于传统的Deferred Render,它的第二步(见伪代码)是遍历每个光源,这样就增加了光源设置的灵活性,而Light Pre-Pass第三步使用的其实是forward rendering,所以可以对每个mesh设置其材质,这两者是相辅相成的,有利有弊。另一个Light Pre-Pass的优点是在使用MSAA上很有利。虽然并不是100%使用上了MSAA(除非使用DX10/11的特性),但是由于使用了Z值和Normal值,就可以很容易找到边缘,并进行采样。

下面这两张图,左边是使用传统Deferred Render绘制的,右边是使用Light Pre-Pass绘制的。这两张图在效果上不应该有太大区别。


2.2.2 Tile-Based Deferred Rendering

TBDR主要思想就是将屏幕分成一个个小块tile。然后根据这些Depth求得每个tilebounding box。对每个tilebounding boxlight进行求交,这样就得到了对该tile有作用的light的序列。最后根据得到的序列计算所在tile的光照效果。[4][5]

对比Deferred Render,之前是对每个光源求取其作用区域light volume,然后决定其作用的的pixel,也就是说每个光源要求取一次。而使用TBDR,只要遍历每个pixel,让其所属tile与光线求交,来计算作用其上的light,并利用G-Buffer进行Shading。一方面这样做减少了所需考虑的光源个数,另一方面与传统的Deferred Rendering相比,减少了存取的带宽。

2.3 Forward+

Forward+ == Forward + Light Culling[6]Forward+ 很类似Tiled-based Deferred Rendering。其具体做法就是先对输入的场景进行z-prepass,也就是说关闭写入color,只向z-buffer写入z值。注意此步骤是Forward+必须的,而其他渲染方式是可选的。接下来来的步骤和TBDR很类似,都是划分tiles,并计算bounding box。只不过TBDR是在G-Buffer中完成这一步骤的,而Forward+是根据Z-Buffer。最后一步其实使用的是forward方式,即在FS阶段对每个pixel根据其所在tilelight序列计算光照效果。而TBDR使用的是基于G-Bufferdeferred rendering

实际上,forward+deferred运行的更快。我们可以看出由于Forward+只要写深度缓存就可以,而Deferred Render除了深度缓存,还要写入法向缓存。而在Light Culling步骤,Forward+只需要计算出哪些light对该tile有影响即可。而Deferred Render还在这一部分把光照处理给做了。而这一部分,Forward+是放在Shading阶段做的。所以Shading阶段Forward+耗费更多时间。但是对目前硬件来说,Shading耗费的时间没有那么多。


Forward+的优势还有很多,其实大多就是传统Forward Rendering本身的优势,所以Forward+更像一个集各种Rendering Path优势于一体的Rendering Path


3. 总结

首先我们列出Rendering Equation,然后对比Forward RenderingDeferred RenderingForward+ Rendering[6]

3.1 Rendering Equation

其中点 处有一入射光,其光强为 ,入射角度为 。根据函数 来计算出射角为 处的出射光强度。最后在辅以出射光的相对于视点可见性 。注意此处的 为场景中总共有 个光源。



3.2 Forward Renderng

由于Forward本身对多光源支持力度不高,所以此处对于每个点 的处理不再考虑所有的 个光源,仅仅考虑少量的或者说经过挑选的 个光源。可以看出这样的光照效果并不完美。另外,每个光线的 是计算不了的。


3.3 Deferred Rendering

由于Deferred Rendering使用了light culling,所以不用遍历场景中的所有光源,只需遍历经过light culling后的 个光源即可。并且Deferred Rendering将计算BxDF的部分单独分出来了。


3.4 Forward+ Rendering




[1] Shawn Hargreaves. (2004) “Deferred Shading”. [Online] Available: (April 15,2015)

[2] Wolfgang Engel. (March 16, 2008) “Light Pre-Pass Renderer”. [Online] Available: 14,2015)

[3] Klint J. Deferred Rendering in Leadwerks Engine[J]. Copyright Leadwerks Corporation, 2008.

[4] 龚敏敏.(April 22, 2012) “Forward框架的逆袭:解析Forward+渲染”. [Online] Available: 13,2015)

[5] Lauritzen A. Deferred rendering for current and future rendering pipelines[J]. SIGGRAPH Course: Beyond Programmable Shading, 2010: 1-34.

[6] Harada T, McKee J, Yang J C. Forward+: Bringing deferred lighting to the next level[J]. 2012.