Category: All

软光 – the visibility problem

In the second chapter of this lesson, we learned that in the third coordinate of the projected point (the point in screen space) we store the original vertex z-coordinate (the z-coordinate of the point in camera space):【从第二章我们知道,我们存储了额外的z值】



Finding the z-coordinate of a point on the surface of the triangle is useful when a pixel overlaps more than one triangle. And the way we find that z-coordinate, is by interpolating the original vertices z-coordinates using the barycentric coordinates that we learned about in the previous chapter. In other words, we can treat the z-coordinates of the triangle vertices as any other vertex attribute, and interpolate them the same way we interpolated colors in the previous chapter. Before we look into the details of how this z-coordinate is computed, let’s start to explain why we need to do so.【在很多三角形overlap的时候,我们可以通过z值比较来确定可见性,这里我们来看怎么处理z值】



The Depth-Buffer or Z-Buffer Algorithm and Hidden Surface Removal


When a pixel overlaps a point, what we actually see through that pixel is a small area on the surface of a triangle, which for simplification we will reduce to a single point (denoted P in figure 1). Thus to each pixel covering a triangle, corresponds a point on the surface of that triangle. Of course, if a pixel covers more than one triangle, we then have several of these points. The problem when this happens is to find which one of these points is actually visible? We have illustrated this concept in 2D in figure 2. We could test triangles from back to front (this technique would require to sort triangles by decreasing depth first) but this doesn’t always work when triangle intersects each other (figure 2, bottom). The only reliable solution is to compute the depth of each triangle a pixel overlaps, and then compare these depth values to find out which one is the closest to camera. If you look at figure 2, you can see that a pixel in the image overlaps two triangle in P1 and P2. However P1 z-coordinate (Z1) is lower than P2 z-coordinate (Z2) thus we can deduce that P1 is in front of P2. Note that this technique is needed because triangles are tested in “random” order. As mentioned before we could sort out triangles in decreasing depth order but this not good enough. Generally, they are just tested in in the order they are specified in the program, and for this reason, a triangle T1 that is closer to camera can be tested before a triangle T2 that is further away. If we were not comparing these triangles depth, then we would end up in this case seeing the triangle which was tested last (T2) when in fact we should be seeing T1. As mentioned many times before, this is called the visibility problem or hidden surface problem. Algorithms for ordering objects so that they are drawn correctly are called visible surface algorithms or hidden surface removal algorithms. The depth-buffer or z-buffer algorithm that we are going to study next, belongs to this category of algorithms.【当一个p对应于一个pixel的时候,直接光栅化显示(图一);如果多个p对应于一个pixel的时候,比较好的方法就是比较这些p的深度,画最近的(图二)】




One solution to the visibility problem is to use a depth-buffer or z-buffer. A depth-buffer is nothing more than a two-dimensional array of floats that has the same dimension than the frame-buffer and that is used to store the objects depth as the triangles are being rasterized. When this array is created, we initialize each pixel in the array with a very large number. If we find that a pixel overlaps the current triangle, we do as follows:visibility的解法是采用depth-bufferdepth-bufferframe-buffer一一对应,存储当前pixel显示的内容的深度,初始化为最大值,如果一个triangle与pixel overlap,则见下面步骤】

  • We first compute the z-coordinate or depth of the point on the triangle that the pixel overlaps.【计算三角形每个点的深度】
  • We then compare that current triangle depth with the value stored in the depth buffer for that pixel.【比较这个三角形上面P的深度和其depth-buffer的深度】
  • If we find that the value stored in the depth-buffer is greater than the depth of the point on the triangle, then the new point is closer to the observer or the camera than the point stored in the depth buffer at that pixel location. The value stored in the depth-buffer is then replaced with the new depth, and the frame-buffer is updated with the current triangle color. On the other hand, if the value stored in the depth-buffer is smaller than the current depth sample, then the triangle that the pixel overlaps is hidden by the object whose depth is currently stored in the depth-buffer.【新的小于depth-buffer则替换显示,否则就是被遮挡】




Finding Z by Interpolation


Hopefully the principle of the depth-buffer is simple and easy to understand. All we need to do now, is explained how depth values are computed. First let’s repeat one more time what that depth value is. When a pixel overlaps a triangle, it actually overlaps a small surface on the surface of the triangle, which as mentioned in the introduction we will reduce to a point for simplification (point P in figure 1). What we want to find here, is this point z-coordinate. As also mentioned earlier in this chapter, if we know the triangle vertices’ z-coordinate (which we do, they are stored in the projected point z-coordinate), all we need to do is interpolate these coordinates using P’s barycentric coordinates (figure 4):【depth buffer算法本身比较简单易于理解,我们还需要考虑的是P的深度值怎么计算,我们通过差值来获得,比如考虑重心坐标差值方式可行么?】




Technically this sounds reasonable, though unfortunately it doesn’t work. Let’s see why. The problem is not in the formula itself which is perfectly fine. The problem is that once the vertices of a triangle are projected onto the canvas (once we have performed the perspective divide), then z, the value we want to interpolate, doesn’t vary linearly anymore across the surface of the 2D triangle. This is easier to demonstrate with a 2D example.【不可行,因为投影过后的2D三角形的Z值不是线性的】


The secret lies in figure 4. Imagine that we want to find the “image” of a line defined in 2D space by two vertices V0 and V1. The canvas is represented by the horizontal green line. This line is one unit away (along the z-axis) from the coordinate system origin. If we trace lines from V0 and V1 to the origin, then we intersect the green lines in two points (denoted V0′ and V1′ in the figure). The z-coordinate of these point is 1 since they lie on the canvas which is 1 unit away from the origin. The x-coordinate of the points can easily be computed using perspective projection. We just need to divide the original vertex x-coordinates by their z-coordinate. We get:【如图四,我们要考虑的是原始的线段P所在位置的比例,这个直接插值就是对的】



The goal of the exercise is to find the z-coordinate of P, a point on the line defined by V0 and V1. In this example, all we know about P is the position of its projection P’, on the green line. The coordinates of P’ are {0,1}. The problem is similar to trying to find the z-coordinate of a point on the triangle that a pixel overlaps. In our example, P’ would be the pixel and P would be the point on the triangle that the pixel overlaps. What we need to do now, is compute the “barycentric coordinate” of P’ with respect to V0′ and V1′. Let’s call the resulting value λλ. Like our triangle barycentric coordinates, λλ is also in the range [0,1]. To find λλ, we just need to take the distance between V0′ and P’ (along the x-axis), and divide this number by the distance between V0′ and V1′. If linearly interpolating the z-coordinates of the original vertices V0 and V1 using λλ to find the depth of P works, then we should get the number 4 (we can easily see by just looking at the illustration that the coordinates of P are {0,4}). Let’s first compute λλ:【我们的目标是找到P的z坐标,这里我们还是可以求重心坐标比率lamda,公示如下】



If we now linearly interpolate V0 and V1 z-coordinate to find P z-coordinate we get:【另外P的线性差值表示如下】



Clearly this is not the value we expect! Interpolating the original vertices z-coordinates, using P’s “barycentric coordinates” or λλ in this example, to find P z-coordinate doesn’t work. Why? The reason is simple. Perspective projection preserves lines, but does not preserve distances. It’s quite easy to see in figure 4, that the ratio of the distance between V0 and P over the distance between V0 and V1 (0.666) is not the same than the ratio of the distance between V0′ and P’ over the distance between V0′ and V1′ (0.833). If λλ was equal to 0.666 it would work fine, but here is the problem, it’s equal to 0.833 instead! So, how do we find the z-coordinate of P?【很明显如图四,到此为止得不到我们想要的P的z值,我们怎么去处理?】


The solution to the problem is to compute the inverse of P z-coordinate by interpolating the inverse of the vertices V0 and V1 z-coordinates using λλ. In other words, the solution is:【解决方法是去计算P的z坐标的倒数,一下就可以得到下面的公式】




If now take the inverse of this result, we get for P z-coordinate the value 4. Which is the correct result! As mentioned before, the solution is to linearly interpolate the vertices z-coordinates using barycentric coordinates, and invert the resulting number to find the depth of P (its z-coordinate). In the case of our triangle, the formula is:【然后你会发现,直接代入值,比例居然是对的。就如前面所说,解决方案是使用重心坐标对顶点Z坐标进行线性内插,并将结果数字倒置以找到P的深度。三角形的情况公式如下】



Let’s now look into this problem more formally. Why do we need to interpolate the vertices inverse z-coordinates? The formal explanation is a bit complicated and you can skip it if you want. Let’s consider a line in camera space defined by two vertices whose coordinates are denoted (X0,Z0)(X0,Z0) and (X1,Z1)(X1,Z1). The projection of these vertices on the screen are denoted S0S0 and S1S1 respectively (in our example, we will assume that the distance between the camera origin and the canvas is 1 as shown in figure 5). Let’s call S a point on the line defined by S0S0 and S1S1. S has a corresponding point P on the 2D line whose coordinates are (X,Z = 1) (we assume in this example that the screen or the vertical line on which the points are projected is 1 unit away from the coordinate system origin). Finally, the parameters tt and qq are defined such that:【我们来思考为什么需要inverse z坐标。如果在XZ平面内,我们来考虑camera space 的一条线,其对应到screen就是s这条线,一些参数定义如下图所示】







































软光 – the rasterization stage


Rasterization: What Are We Trying to Solve?


In the previous chapter, we learned how to performed the first step of the rasterization algorithm in a way, which is to project the triangle from 3D space onto the canvas. This definition is not entirely accurate in fact, since what we actually really did was to transform the triangle from camera space to screen space, which as mentioned in the previous chapter, is also a three-dimensional space. However the x- and y-coordinates of the vertices in screen-space correspond to the position of the triangle vertices on the canvas, and by converting them from screen-space to NDC space and then finally from NDC-space to raster-space, what we actually get in the end are the vertices 2D coordinates in raster space. Finally, we also know that the z-coordinates of the vertices in screen-space holds the original z-coordinate of the vertices in camera space (inverted so that we deal with positive numbers rather than negatives ones).【上一章节讲的是将三角形从三维空间投影到canvas,最终得到裁剪空间的2D坐标和z值】


What we need to do next, is to loop over the pixel in the image and find out if any of these pixels overlap the “projected image of the triangle” (figure 1). In graphics APIs specifications, this test is sometimes called the inside-outside test or the coverage test. If they do, we then set the pixel in the image to the triangle’s color. The idea is simple but of course, we now need to come up with a method to find if a given pixel overlaps a triangle. This is essentially what we will study in this chapter. We will learn about the method that is typically used in rasterization to solve this problem. It uses a technique known as the edge function which we are now going to describe and study. This edge function is also going to provide valuable information about the position of the pixel within the projected image of the triangle known as barycentric coordinates. Barycentric coordinates play an essential role in computing the actual depth (or the z-coordinate) of the point on the surface of the triangle that the pixel overlaps. We will also explain what barycentric coordinates are in this chapter and how they are computed.【下一步我们要做的是递归所有的overlap pixel。图形API中叫这一步叫做 inside-outside test,这边方法的主要思想第一章讲过,主要要注意的是边界的处理,这边使用的技术叫edge function来处理】

At the end of this chapter, you will be able to actually produce a very basic rasterizer. In the next chapter, we will actually look into the possible issues with this very naive implementation of the rasterization algorithm. We will list what these issues are as well as study how they are typically addressed.【这边还会给出基本的实现,以及一些常见问题的解决】


A lot of research has been done to optimize the algorithm. The goal of this lesson is not to teach you how to write or develop an optimized and efficient renderer based on the rasterization algorithm. The goal of this lesson is to teach the basic principles of the rendering technique. Don’t think though that the techniques we present in these chapters, are not actually used. They are used to some extent, but how they are actually implemented either on the GPU or in a CPU version of a production renderer, is just likely to be a highly optimized version of the same idea. What is truly important is to understand the principle and how it works in general. From there, you can study on your own about the different techniques which are used to speed up the algorithm. But the techniques presented in this lesson are generic and make up the foundations of any rasterizer.【有很多研究都是在优化这一步,但是本教材主要告诉你的是基本的方法,你可以自学其他的优化方法】



Keep in mind that drawing a triangle (since triangle is primitive we will use in this case), is a two steps problem:

  • We first need to find which pixels overlap the triangle.【找出三角形overlap哪些pixel
  • We then need to define which colors should the pixels overlapping the triangle be set to, a process that is called shading【决定pixel的颜色】



The Edge Function


As mentioned above, they are several possible methods to find if a pixel overlaps a triangle. It would be good to document older techniques, but in this lesson, will only present the method that is generally used today. This method was presented by Juan Pineda in 1988 and a paper called “A Parallel Algorithm for Polygon Rasterization” (see references in the last chapter).【这里我们所采用的方法】


Before we look into Pineda’s technique itself, we will first describe the principle of his method. Let’s say that the edge of a triangle can be seen as a line splitting the 2D plane (the plane of the image) in two (as shown in figure 2). The principle of Pineda’s method is to find a function which he called the edge function, so that when we test on which side of this line a given point is (the point P in figure 2), the function returns a negative number when it is to the left of the line, a positive number when it is to the right of this line, and zero, when the point is exactly on the line.【每一条边可以看作是划分plane成两部分的线,这个方法就是找出edge functions,然后测试point在这些线的哪一边,坐便输出positive number,右边输出 negative number。如下图所示】



In figure 2, we applied this method to the first edge of the triangle (defined by the vertices v0-v1. Be careful the order is important). If we now apply the same method to the two other edges (v1-v2 and v2-v0), we then can clearly see that there is an area (the white triangle) within which all points are positive (figure 3). If we take a point within this area, then we will find that this point is to the right of all three edges of the triangle. If P is in fact a point in the centre of a pixel, we can then use this method to find if the pixel overlaps the triangle. If for this point, we find that the edge function returns a positive number for all three edges, then the pixel is contained in the triangle (or may lie on one of its edges). The function Pinada uses also happens to be linear which means that it can be computed incrementally but we will come back on this point later.【上图只考虑一条边,这里考虑三条边如下图。顺时针描述线,则point的描述结果都为positive的话,就是在三角形内部】



Now that we understand the principle, let’s find out what that function is. The edge function is defined as (for the edge defined by vertices V0 and V1):【来看公式表述】



As the paper mentions, this function has the useful property that its value is related to the position of the point (x,y) relative to the edge defined by the points V0 and V1:【得到的结果与意义如下】

  • E(P) > 0 if P is to the “right” side
  • E(P) = 0 if P is exactly on the line
  • E(P) < 0 if P is to the “left ” side


In fact this function is equivalent in mathematics to the magnitude of the cross products between the vector (v1-v0) and (P-v0). We can also write these vectors in a matrix form (presenting this as a matrix has no other interest than just presenting the two vectors in a neat way):【三条边一起考虑,可以矩阵表示】



If we write that A=(PV0)A=(PV0) and B=(V1V0)B=(V1V0), then we can also write the vectors A and B as a 2×2 matrix:



The determinant of this matrix can be computed as:




Understanding what’s actually happening is easier when we look at the result of a cross product between two 3D vectors (figure 4). 【为了理解,我们来看图四】



In 3D, the cross-product returns another 3D vector which is perpendicular (or orthonormal) to the two original vectors. But as you can see in figure 4, the magnitude of that orthonormal vector also changes with the orientation of the two vectors with respect to each other. In figure 4, we assume a right-hand coordinate system. When the two vectors A (red) and B (blue) are either pointing exactly in the same direction or in opposite directions, the magnitude of the third vector C (in green) is zero. Vector A has coordinates (1,0,0) and is fixed. When vector B has coordinates (0,0,-1), then the green vector, vector C has coordinate (0,-1,0). If we were to find its “signed” magnitude, we would find that it is equal to -1. On the other hand, when vector B has coordinates (0,0,1), then C has coordinates (0,1,0) and its signed magnitude is equal to 1. In one case the “signed” magnitude is negative, and in the second case, the signed magnitude is positive. In fact, in 3D, the magnitude of a vector can be interpreted as the area of the parallelogram having A and B as sides as shown in figure 5 (read the Wikipedia article on the cross product to get more details on this interpretation):【这边想说的是上面的公式可以看作是坐标系变换,把ABC看作是一个坐标系,就可以标出一个positive空间】



Obviously an area should always be positive, though the sign of the above equation provides an indication about the orientation of the vectors A and B with respect to each other. When with respect to A, B is within the half plane defined by vector A and a vector orthogonal to A (let’s call this vector D; note that A and D form an 2D Cartesian coordinate system), then the result of the equation is positive. When B is within the opposite half plane, the result of the equation is negative (figure 6). Another way of explaining this result, is that the result is positive when the angle θθ is in the range ]0,π[]0,π[ and negative when θθ is in the range ]π,2π[]π,2π[. Note then when theta is exactly equals to 0 or ππ then the cross-product or the edge function returns 0.【上面公式的另一解释,半平面方面的理解(妈的一点也不直观,没看明白这东西怎么表示,不看没关系)】



To find if a point is inside a triangle, all we care about really is the sign of the function we used to compute the area of the parallelogram. However, the area itself also plays an important role in the rasterization algorithm; it is used to compute the barycentric coordinates of the point in the triangle, a technique we will study next. The cross product in 3D and 2D has the same geometric interpretation, thus the cross-product between two 2D vectors also returns the “signed” area of the parallelogram defined by the two vectors. The only difference is that in 3D, to compute the area of the parallelogram you need to use this equation:【为了判断point是否在三角形内,我们关心的是公式解的符号。但是其实三角形内部区域本身也很重要,它用来计算三角形中点的重心坐标,3D计算公式如下】


while in 2D, this area is given by the cross-product itself (which as mentioned before can also be interpreted as the determinant of a 2×2 matrix):2D计算公式】



From a practical point of view, all we need to do now, is test the sign of the edge function computed for each edge of the triangle and another vector defined by a point and the first vertex of the edge (figure 7).【实际实现就是对于每个点检测三条边公式如下】



If all three tests are positive or equal to 0, then the point is inside the triangle (or lie on one of the edges of the triangle). If any one of the test is negative, then the point is outside the triangle. In code we get:【如果结果都是positive则是在三角形内】




Alternative to the Edge Function


There are obviously other ways than the edge function method to find if pixels overlap triangles, however as mentioned in the introduction of this chapter, we won’t study them in this lesson. Just for reference though, the other common technique is called scanline rasterization. It is based on the Brenseham algorithm that is generally used to draw lines. GPUs use the edge method mostly because it is more generic than the scanline approach which is also more difficult to run in parallel that the edge method, but we won’t provide more information on this topic in this lesson.【其他替代方法自己看去,这里有reference】



Be Careful! Winding Order Matters


One of the things we have been talking about yet, but which has a great importance in CG, is the order in which you declare the vertices making up the triangles. They are two possible conventions which you can see illustrated n the figure 8: clockwise or counter-clockwise ordering or winding. Winding is important because it essentially defines one important property of the triangle which is the orientation of its normal. Remember that the normal of the triangle can be computed from the cross product of the two vectors A=(V2-V0) and B=(V1-V0). Let’s say that V0={0,0,0}, V1={1,0,0} and V2={0,-1,0} then (V1-V0)={1,0,0} and (V2-V0)={0,-1,0}. Let’s now compute the cross product of these two vectors:【这个算法非常重要的一点是要注意三角形顶点顺序,顺时针和逆时针会得到相反的结果】



As expected, the two normals are pointing in opposite directions. The orientation of the normal has a great importance for lots of different reasons, but one of the most important ones is called face culling. Most rasterizers and even ray-tracer for that matter may not render triangles whose normal is facing away from the camera. This is called back-face culling. Most rendering APIs such as OpenGL or DirectX give the option to turn back-face culling off, however you should still be aware that vertex ordering plays a role in what’s actually rendered, among many other things. And not surprisingly, the edge function is one of these other things. Before we get to explaining why it matters in our particular case, let’s say that there is not particular rule when it comes to choosing the order. In reality so many details in a renderer implementation may change the orientation of the normal that you can’t assume that by declaring vertices in a certain order, you will get the guarantee that the normal will be oriented a certain way. For instance rather that using the vectors (V1-V0) and (V2-V0) in the cross-product you could as have used (V0-V1) and (V2-V1) instead. It would have produced the same normal but flipped. Even if you use the vectors (V1-V0) and (V2-V0), remember that the order of the vectors in the cross-product changes the sign of the normal: A×B=B×AA×B=B×A. So the direction of your normal also depends of the order of the vectors in the cross-product. For all these reasons, don’t try to assume that declaring vertices in one order rather than the other will give you one result or the other. What’s important though, is that once you stick to the convention you have chosen. Generally, graphics APIs such as OpenGL and DirectX expect triangles to be declared in counter-clockwise order. We will also use counter-clockwise winding. Now let’s see how ordering impacts the edge function.【不同的顶点排列顺序得到的法线结果不一样,法线也很重要,正好可以用于背面剔除,OpenGL/DX都是这么用的,顺便更进一步就是用来做光栅化。】





Barycentric Coordinates (重心坐标)


Computing barycentric coordinates is not necessary to get the rasterization algorithm working. For a really naive implementation of the rendering technique, all you need is to project the vertices and use a technique like the edge function that we described above, to find if pixels are inside triangles. These are the only two necessary steps to produce an image. However the result of the edge function which as we explained above, can be interpreted as the area of the parallelogram defined by vector A and B can actually directly be used to compute these barycentric coordinates. Thus, it makes sense to study the edge function and the barycentric coordinates at the same time.【计算重心坐标在光栅化这一步不是必要的,但是可以在这一步顺便计算出来给其他方面使用】


Before we get any further though, let’s explain what these barycentric coordinates are. First, they come in a set of three floating point numbers which in this lesson, we will denote λ0λ0, λ1λ1 and λ2λ2. Many different conventions exist but Wikipedia uses the greek letter lambda as well (λλ) which is also used by other authors (the greek letter omega ωω is also sometimes used). This doesn’t matter, you can call them the way you want. In short, the coordinates can be used to define any point on the triangle in the following manner:【如果我们有三个点,则任意的点P可以表示为如下公示所示】


Where as usual, V0, V1 and V2 are the vertices of a triangle. These coordinates can take on any value, but for points which are inside the triangle (or lying on one of its edges) they can only be in the range [0,1] and the sum of the three coordinates is equal to 1. In other words:【如果三个点是三角形顶点,则P如果在三角形内,lamda的值都应该在[0,1]内且相加为1,如下所示】


This is a form of interpolation if you want. They are also sometimes defined as weights for the triangle’s vertices (which is why in the code we will denote them with the letter w). A point overlapping the triangle can be defined as “a little bit of V0 plus a little bit of V1 plus a little bit of V2”. Note that when any of the coordinates is 1 (which means that the others in this case are necessarily 0) then the point P is equal to one of the triangle’s vertices. For instance if λ2=1λ2=1 then P is equal to V2. Interpolating the triangle’s vertices to find the position of a point inside the triangle is not that useful. But the method can also be used to interpolate across the surface of the triangle any quantity or variable that has been defined at the triangle’s vertices. Imagine for instance that you have defined a color at each vertex of the triangle. Say V0 is red, V1 is green and V2 is blue (figure 12). What you want to do, is find how are these three colors interpolated across the surface of the triangle. If you know the barycentric coordinates of a point P on the triangle, then its color CPCP (which is a combination of the triangle vertices’ colors) is defined as:【这就是我们所期望的差值表示形式,这个差值可以直接用于颜色的平滑表示,如图12,点p的颜色结果就直接可以差值得到,公式如下】



This is a very handy technique which is going to be useful to shade triangles. Data associate with the vertices of triangles are called vertex attribute. This is a very common and very important technique in CG. The most common vertex attributes are colors, normals and texture coordinates. What this means in practice, is that generally when you define a triangle you don’t only pass on to the renderer the triangle vertices but also its associated vertex attributes. For example if you want to shade the triangle you may need a color and normal vertex attribute, which means that each triangle will be defined by 3 points (the triangle vertex positions), 3 colors (the color of the triangle vertices) and 3 normals (the normal of the triangle vertices). Normals too can be interpolated across the surface of the triangle. Interpolated normals are used in a technique called smooth shading which was first introduced by Henri Gouraud. We will explain this technique later when we get to shading.【smooth shading就是这么实现的】




Let’s see how it looks in code. We were already computing the edge functions before to test if points were inside triangles. Only, in our previous implementation we were just returning true or false depending on whether the result of the function was either positive or negative. To compute the barycentric coordinates, we need the actual result of the edge function. We can also use the edge function to compute the area (multiplied by 2) of the triangle. Here is a version of an implementation that tests if a point P is inside a triangle and if so, computes its barycentric coordinates:【代码,edge function的实现】




Interpolate vs. Extrapolate


One thing worth noticing is that the computation of barycentric coordinates works independently from its position with respect to the triangle. In other words, the coordinates are valid if the point is inside our outside the triangle. When the point is inside, using the barycentric coordinates to evaluate the value of a vertex attribute is called interpolation, and when the point is outside, we speak of extrapolation. This an important detail because in some cases, we will have to evaluate the value of a given vertex attribute for points that potentially don’t overlap triangles. To be more specific, this will be needed to compute the derivatives of the triangle texture coordinates for example. These derivatives are used to filter textures properly. If you are interested in learning more about this particular topic we invite you to read the lesson on Texture Mapping. In the meantime, all you need to remember is that barycentric coordinates are valid even when the point doesn’t cover the triangle. You also need to know about the difference between vertex attribute extrapolation and interpolation.【这里值得注意的是点p用三个顶点表示的方式,不管p在不在三角形内,都是成立的。在三角形内的时候叫做差值,在三角形外的时候叫做外推,这个外推在texture坐标对应的时候还是很有用的】



Rasterization Rules


In some special cases, a pixel may overlap more than one triangle. This happens when a pixel lies exactly on an edge shared by two triangles as shown in figure 17. Such pixel would pass the coverage test for both triangles. If they are semi-transparent, a dark edge may appear where the pixels overlap the two triangles as a result of the way semi-transparent objects are combined with each other (imagine two super-imposed semi-transparent sheets of plastic. The surface is more opaque and looks darker than the individual sheets). You would get something similar to what you can see in figure 18, which is a darker line where the two triangles share an edge.【考虑一个特殊情况,如图所示,两个三角形的一条边共线,这时候如果三角形都是半透明的,光栅化的结果会是中间有一条比较深的线,原因是这里两个三角形都绘制了】




The solution to this problem is to come up with some sort of rule that guarantees that a pixel can never overlap twice two triangles sharing an edge. How do we do that? Most graphics APIs such as OpenGL and DirectX define something which they call the top-left rule. We already know the coverage test returns true if a point is either inside the triangle or if it lies on any of the triangle edges. What the top-left rule says though, is that the pixel or point is considered to overlap a triangle if it is either inside the triangle or lies on either a triangle top edge or any edge that is considered to be a left edge. What is a top and left edge? If you look at figure 19, you can easily see what we mean by top and left edge.【解决方法是需要有一个排序做法排除这种情况。Opengl/DirectX 采用的做法叫做 top-left rule,就是overlap的边界如果处在左边和上面的边会绘制,判断方法如下】


  • A top edge is a edge that is perfectly horizontal and whose defining vertices are above the third one. Technically this means that the y-coordinates of the vector V[(X+1)%3]-V[X] is equal to 0 and that its x-coordinates is positive (greater than 0).
  • A left edge is essentially an edge that is going up. Keep in mind that in our case, vertices are defined in clockwise order. An edge is considered to go up if its respective vector V[(X+1)%3]-V[X] (where X can either be 0, 1, 2) has a positive y-coordinate.






Putting Things Together: Finding if a Pixel Overlaps a Triangle


Let’s test the different techniques we learned about in this chapter, in a program that produces an actual image. We will just assume that we have projected the triangle already (check the last chapter of this lesson for a complete implementation of the rasterization algorithm). We will also assign a color to each vertex of the triangle. Here is how the image is formed. We will loop over all the pixels in the image and test if they overlap the triangle using the edge function method. All three edges of the triangle are tested against the current position of the pixel, and if the edge function returns a positive number for all the edges then the pixel overlaps the triangle. We can then compute the pixel’s barycentric coordinates and use these coordinates to shade the pixel by interpolating the color defined at each vertex of the triangle. The result of the frame-buffer is saved to a PPM file (that you can read with Photoshop). The output of the program is shown in figure 20.【到此为止可以完成光栅化这一步了,可以使用颜色填充三角形,可以解决多个三角形overlap的问题,可以输出如图所示的结果】





Conclusion and What’s Next?


There are many interesting techniques and trivia related to the topics barycentric coordinates but this lesson is just an introduction to the rasterization algorithm thus we won’t go any further. One trivia that is interesting to know though, is that barycentric coordinates are constant along lines parallel to an edge (as shown in figure 21).【同样还有很多相关的技术来解决同样的问题,这里只讲基础的,其他的自己看】



In this lesson, we learned two important method and various concepts.

  • First, we learned about the edge function and how it can be used to find if a point P overlaps a triangle. The edge function is computed for each edge of the triangle, and a second vector defined by the edge first vertex and another point P. If for all three edges, the function is positive, then point P overlaps the triangle.【使用edge function解决pixel是否在三角形内的问题】
  • Furthermore, we also learned that the result of the edge function can also be used to compute the barycentric coordinates of point P. These coordinates can be used to interpolate vertex data or vertex attribute across the surface of the triangle. They can be interpreted as weights for the various vertices. The most common vertex attribute are color, normal and texture coordinates.edge function也可以用来计算中心坐标等,作用于颜色,法线和纹理的处理】























软光 – the projection stage

Quick Review


In the previous chapter, we gave a high-level overview of the rasterization rendering technique. It can be decomposed into two main stages: first, the projection of the triangle’s vertices onto the canvas, then the rasterization of the triangle itself. Rasterization really means in this case, “breaking apart” the triangle’s shape into pixels or raster element squares; this is how pixels used to be called in the past. In this chapter, we will review the first step. We have already described this method in the two previous lessons, thus we won’t explain it here again. If you have any doubt about the principles behind perspective projection, check these lessons again. However, in this chapter, we will study a couple of new tricks related to projection that are going to be useful when we will get to the lesson on the perspective projection matrix. We will learn about a new method to remap the coordinates of the projected vertices from screen space to NDC space. We will also learn more about the role of the z-coordinate in the rasterization alogrithm and how it should be handled at the projection stage.【上一篇讲了整体思路,主要步骤。这一篇主要讲的是第一步骤的做法】


Keep in mind as already mentioned in the previous chapter, that the goal of the rasterization rendering technique is to solve the visibility or hidden surface problem, which is to determine with parts of a 3D object are visible and which parts are hidden.【时刻记住软光栅是用来做可见性判断的】


Projection: What Are We Trying to Solve?


What are we trying to solve here at that stage of the rasterization algorithm? As explained in the previous chapter, the principle of rasterization is to find if pixels in the image overlap triangles. To do so, we first need to project triangles onto the canvas, and then convert their coordinates from screen space to raster space. Pixels and triangles are then defined in the same space, which means that it becomes possible to compare their respective coordinates (we can check the coordinates of a given pixel against the raster-space coordinates of a triangle’s vertices).

The goal of this stage, is thus to convert the vertices making up triangles from camera space to raster space.



Projecting Vertices: Mind the Z-Coordinate!


In the previous two lessons, we mentioned that when we compute the raster coordinates of a 3D point what we really need in the end are its x- and y-coordinates (the position of the 3D point in the image). As a quick reminder, recall that these 2D coordinates are obtained by dividing the x and y coordinates of the 3D point in camera space, by the point respective z-coordinate (what we called the perspective divide), and then remapping the resulting 2D coordinates from screen space to NDC space and then NDC space to raster space. Keep in mind that because the image plane is positioned at the near clipping plane, we also need to multiply the x- and y-coordinate by the near clipping plane. Again, we explained this process in great detail in the previous two lessons.


Note that so far, we have been considering points in screen space as essentially 2D points (we didn’t need to use the points’ z-coordinate after the perspective divide). From now on though, we will declare points in screen-space, as 3D points and set their z-coordinate to the camera-space points’ z-coordinate as follow:




It is best at this point to set the projected point z-coordinate to the inverse of the original point z-coordinate, which as you know by now, is negative. Dealing with positive z-coordinates will make everything simpler later on (but this is not mandatory).【注意z值取反】


Keeping track of the vertex z-coordinate in camera space is needed to solve the visibility problem. Understanding why is easier if you look at figure 1. Imagine two vertices v1 and v2 which when projected onto the canvas, have the same raster coordinates (as shown in figure 1). If we project v1 before v2 then v2 will be visible in the image when it should actually be v1 (v1 is clearly in front of v2). However if we store the z-coordinate of the vertices along with their 2D raster coordinates, we can use these coordinates to define which point is closest to camera independently of the order in which the vertices are projected (as shown in the code fragment below).【保持z值存储就是为了解决可见性判断的问题,方法就是与z-buffer里面的比较看遮挡关系,如图】




What we want to render though are triangles not vertices. So the question is, how does the method we just learned about apply to triangles? In short, we will use the triangle vertices coordinates to find the position of the point on the triangle that the pixel overlaps (and thus its z-coordinate). This idea is illustrated in figure 2. If a pixel overlaps two or more triangles, we should be able to compute the position of the points on the triangles that the pixel overlap, and use the z-coordinates of these points as we did with the vertices, to know which triangle is the closest to the camera. This method will be described in detail in chapter 4 (The Depth Buffer. Finding the Depth Value of a Sample by Interpolation).【如何比较Z值的问题,就是采用z-buffer方式,前面讲过,细节看第四章节】


Screen Space is Also Three-Dimensional


To summarize, to go from camera space to screen space (which is the process during which the perspective divide is happening), we need to:【总结一下】

  • Perform the perspective divide: that is dividing the point in camera space x- and y-coordinate by the point z-coordinate.xy值由坐标转换得到】
  • But also set the projected point z-coordinate to the original point z-coordinate (the point in camera space).【z值由原来的z得到】


Practically, this means that our projected point is not a 2D point anymore, but in fact a 3D point. Or to say it differently, that screen space is not two- by three-dimensional. In his thesis Ed-Catmull writes:【由上面可以看出screen space也是一个3D空间】


Screen-space is also three-dimensional, but the objects have undergone a perspective distortion so that an orthogonal projection of the object onto the x-y plane, would result in the expected perspective image (Ed-Catmull’s Thesis, 1974).


You should now be able to understand this quote. The process is also illustrated in figure 3. First the geometry vertices are defined in camera space (top image). Then, each vertex undergoes a perspective divide. That is, the vertex x- and y-coordinates are divided by its z-coordinate, but as mentioned before, we also set the resulting projected point z-coordinate to the inverse of the original vertex z-coordinate. This by the way, infers a change of direction in the z-axis of the screen space coordinate system. As you can see, the z-axis is now pointing inward rather than outward (middle image in figure 3). But the most important thing to notice, is that the resulting object is a deformed version of the original object but nonetheless a three-dimensional object. Furthermore what Ed-Catmull means when he writes “an orthogonal projection of the object onto the x-y plane, would result in the expected perspective image”, is that once the object is in screen space, if we trace lines perpendicular to the x-y image plane from the object to the canvas, then we get an perspective representation of that object (as shown in figure 4). This is an interesting observation because it means that the image creation process can be seen as a perspective projection followed by an orthographic projection. Don’t worry if you don’t understand clearly the difference between the perspective and orthographic projection. It is the topic of the next lesson. However, try to remember this observation, as it will become handy later.【如图三和四,展示的是投影】



Remapping Screen Space Coordinates to NDC Space


In the previous two lessons, we explained that once in screen space, the x- and y-coordinates of the projected points need to be remapped to NDC space. In the previous lessons, we also explained that in NDC space, points on the canvas had their x- and y-coordinates contained in the range [0,1]. In the GPU world though, coordinates in NDC space are contained in the range [-1,1]. Sadly, this is one of these conventions again, that we need to deal with. We could have kept the convention [0,1] but because GPUs are the reference when it comes to rasterization, it is best to stick to the way the term is defined in the GPU world.screen space 首先要转换到 NDC space,坐标范围可以是[0,1],GPU上面则通常为[-1, 1]】


Thus once the points have been converted from camera space to screen space, the next step is to remap them from the range [l,r] and [b,t] for the x- and y-coordinate respectively, to the range [0,1]. The term l, r, b, t here denotes the left, right, bottom and top coordinates of the canvas. By re-arranging the terms, we can easily find a equation that performs the remapping we want:pointscamera space to screen space,要做的就是xy坐标由[l, r][b, t]转换到[11],推导如下所示】








This is a very important equation because the red and green term of the equation in the middle of the formula will become coefficients of the perspective projection matrix. We will study this matrix in the next lesson. But for now, we will just apply this equation to remap the x-coordinate of a point in screen space to NDC space (any point that lies on the canvas has its coordinates contained in the range [-1.1] when defined in NDC space). If we apply the same reasoning to the y-coordinate we get:【上面中间部分直接就可以转换为坐标转换的矩阵参数,结果就是从屏幕空间坐标转换到了NDC space。同样的方法用到y轴结果如下】




Putting Things Together


At the end of this lesson, we now can perform the first stage of the rasterization algorithm which you can decompose into two steps:【这节讲的两个步骤如下】


  • Convert a point in camera space to screen space. It essentially projects a point onto the canvas, but keep in mind that we also need to store the original point z-coordinate. The point in screen-space is tree-dimensional and the z-coordinate will be useful to solve the visibility problem later space to screen space

  • We then convert the x- and y-coordinates of these points in screen space to NDC space using the following formulas:screen space to NDC space


From there, it is extremely simple to convert the coordinates to raster space. We just need to remap the x- and y-coordinates in NDC space to the range [0,1] and multiply the resulting number by the image width and height respectively (don’t forget that in raster space the y-axis goes down while in NDC space it goes up. Thus we need to change y’s direction during this remapping process). In code we get:






























软光 – An Overview of the Rasterization Algorithm


  1. OpenGL
  2. OpenGL
  3. Opengl ES 2.0




Rasterization: a Practical Implementation




An Overview of the Rasterization Algorithm


The rasterization rendering technique is surely the most commonly used technique to render images of 3D scenes, and yet, that is probably the least understood and the least properly documented technique of all (especially compared to ray-tracing).【光栅化是3D图形渲染的基础技术,大家都需要了解】


Why this is so, depends on different factors. First, it’s a technique from the past. We don’t mean to say the technique is obsolete, quite the contrary, but that most of the techniques that are used to produce an image with this algorithm, were developed somewhere between the 1960s and the early 1980s. In the world of computer graphics, this is middle-ages and the knowledge about the papers in which these techniques were developed tends to be lost. Rasterization is also the technique used by GPUs to produce 3D graphics. Hardware technology changed a lot since GPUs were first invented, but the fondamental techniques they implement to produce images haven’t changed much since the early 1980s (the hardware changed, but the underlying pipeline by which an image is formed hasn’t). In fact these techniques are so fondamental and consequently so deeply integrated within the hardware architecture, that no one pays attention to them anymore (only people designing GPUs can tell what they really do, and this is far from being a trivial task; but designing a GPU and understanding the principle of the rasterization algorithm are two different things; thus explaining the latter should actually not be that hard!).【原因:有历史的技术不代表过时,光栅化同样是需要GPU来渲染图像,因为其技术直接被硬件实现了,所以大家的关注度不高,但确确实实是最基础的图形渲染技术】


Regardless, we thought it was urgent and important to correct this situation. With this lesson, we believe to be the first ressource that provides a clear and complete picture of the algorithm as well as a complete and full practical implementation of the technique. If you found in this lesson the answers you have been desperately looking for anywhere else, please consider making a donation! This work is provided to you for free and requires many hours of hard work.【这里提供光栅化的最完整的讲解】




Rasterization and ray tracing try to solve the visibility or hidden surface problem but in a different order (the visibility problem was introduced in the lesson Rendering an Image of a 3D Scene, an Overview). Both algorithms have in common that they essentially use techniques from geometry to solve that problem. In this lesson, we will describe briefly how the rasterization (you can write rasterisation if you prefer UK English to US English) algorithm works. Understanding the principle is quite simple but implementing it requires to use a series of techniques notably from the field of geometry, that you will also find explained in this lesson.【光栅化和光线追踪都是从几何体信息出发,解决surface可见性的问题,但是order不一样,具体见后面的章节,这里主要是算法讲解。】


The program we will develop in this lesson to demonstrate how rasterization works in practice is important, because we will use it again in the next lessons to implement the ray-tracing algorithm as well. Having both algorithms implemented in the same program will allow us to more easily compare the output produced by the two rendering techniques (they should both produce the same result at least before shading is applied) and performances. It will be great way of better understanding the pros and cons of both algorithms.【这里的关注点是去了解光栅化是怎么工作的,但是我们同时会实现光线追踪,并比较其优劣性能和特点】


The Rasterization Algorithm


In fact there is not one but multiple rasterization algorithms, but to go straight to the point, let’s say that all these different algorithms though are based upon the same overall principle. In other words, all these algorithms are just variants of the same idea. It is this idea or principle, we will refer to when we actually speak of rasterization in this lesson.【算法有很多种,但是overall principle是一样的】


What is that idea? In the previous lessons, we already talked about the difference between rasterization and ray-tracing. We also suggested that the rendering process can essentially be decomposed into two main tasks: visibility and shading. Rasterization to say things quickly, is essentially a method to solve the visibility problem. Visibility consists of being able to tell which parts of 3D objects are visible to the camera. Some parts of these objects can be bidden because they are either outside the camera’s visible area or hidden by others objects.【渲染过程可以看作是两个主要步骤:visibility and shading,光栅化解决的是可视化的问题,就是相机可以看见哪些对象,看不见的隐藏。】


Solving this problem can be done in essentially two ways. You can either trace a ray through every pixel in the image to find out the distance between the camera and any object this ray intersects (if any). The object visible through that pixel is the object with the smallest intersection distance (generally denoted t). This is the technique used in ray tracing. Note that in this particular case, you create an image by looping over all pixels in the image, tracing a ray for each one of these pixels, and then finding out if these rays intersect any of the objects in the scene. In other words, the algorithm requires two main loops. The outer loop, iterates over the pixel in the image and the inner loop iterates over the objects in the scene:【解决visible的问题有两种做法:第一种是光线追踪,从相机位置向每一个像素位置射出一条线,找到其相交的对象做处理,这个算法需要两个loop,首先迭代每一个像素,然后迭代每一个object】


Note that in this example, the objects are actually considered to be made of triangles (and triangles only). Rather than iterating other the objects, we just consider the objects as a pool of triangles and iterate other the triangles instead. For reasons we have already explained in the previous lessons, the triangle is often used as the basic rendering primitive both in ray tracing and in rasterization (GPUs requires the geometry to be triangulated).【不管是光线追踪还是光栅化,三角形都作为是基础几何体来做】


Ray tracing is the first possible approach to solve the visibility problem. We say the technique is image centric because we shoot rays from the camera into the scene (we start from the image) as opposed to the other way around, which is the approach we will be using in rasterization.【我们把基于ray tracing的方法叫做image centric


Rasterization takes the opposite approach. To solve for visibility, it actually “projects” triangles onto the screen, in other words, we go from a 3D representation to a 2D representation of that triangle, using perspective projection. This can easily be done by projecting the vertices making up the triangle onto the screen (using perspective projection as we just explained). The next step in the algorithm is to use some technique to fill up all the pixels of the image that are covered by that 2D triangle. These two steps are illustrated in figure 2. From a technical point of view, they are very simple to perform. The projection steps only requires a perspective divide and a remapping of the resulting coordinates from image space to raster space, a process we already covered in the previous lessons. Finding out which pixels in the image the resulting triangles cover, is also very simple and will be described later.【第二种是光栅化,采用了和ray tracing相反的做法,其第一步是使用perspective projection将三角形投影到屏幕,也就是说用2D来表示三角形而不是3D。第二步是在这个屏幕上填充这个三角形覆盖的区域,见图二】



What does the algorithm look like compared to the ray tracing approach? First, note that rather than iterating over all the pixels in the image first, in rasterization, in the outer loop, we need to iterate over all the triangles in the scene. Then, in the inner loop, we iterate over all pixels in the image, and find out if the current pixel is “contained” within the “projected image” of the current triangle (figure 2). In other words, the inner and outer loops of the two algorithms are swapped.【这个算法也需要两个loop,首先迭代每一个object,然后迭代每一个像素】


This algorithm is object centric because we actually start from the geometry and walk our way back to the image as opposed to the approach used in ray tracing where we started from the image and walked our way back into the scene.【我们把基于光栅化的方法叫做object centric


Both algorithms are simple in their principle, but they differ slightly in their complexity when it comes to actually implementing them and finding solutions to the different problems they require to solve. In ray tracing, actually generating the rays is simple but finding the intersection of the ray with the geometry can reveal itself to be difficult (depending on the type of geometry you deal with) and is also potentially computationally expensive. But let’s ignore ray tracing for now. In the rasterization algorithm we need to project vertices onto the screen which is simple and fast, and we will see that the second step which requires to actually find out if a pixel is contained within the 2D representation of a triangle has an equally simple geometric solution. In other words, computing an image using the rasterization approach relies on two very simple and fast techniques (the perspective process and finding out if a pixel lies within a 2D triangle). Rasterization is a good example of an “elegant” algorithm. The techniques it relies on have simple solutions; they are also easy to implement and produce predictable results. For all these reasons, the algorithm is very well suited for the GPU and is actually the rendering technique applied by GPUs to generate images of 3D objects (it can also easily be run in parallel).【理论上两种方法框架类似,但是实现的时候的性能复杂度很不同。Ray tracing的时候去找物体的计算复杂度非常高,光栅化的时候投影的计算复杂度则低得多得多,而且容易实现,事实上已经被GPU硬件实现】


In summary:

  • Converting geometry to triangles makes the process simpler. If all primitives are converted to the triangle primitive, we can write fast and efficient functions to project triangles onto the screen and check if pixels lie within these 2D triangles【全部使用三角形来处理】
  • Rasterization is object centric. We project geometry onto the screen and determine their visibility by looping over all pixels in the image.object centric
  • It relies on mostly two techniques: projecting vertices onto the screen and finding out if a given pixel lies within a 2D triangle.【两步骤】
  • The rendering pipeline run on GPUs is based on the rasterization algorithm.【GPU采用的是光栅化技术】


Hopefully at this point of the lesson, you have understood the way the image of a 3D scene (made of triangles) is generated using the rasterization approach. Of course what we described so far is the simplest form of the algorithm. First it can be optimized greatly but furthermore we haven’t explained yet what happens when two triangles projected onto the screen overlap the same pixels in the image. When that happens, how do we define which one of these two (or more) triangles is visible to the camera. We will now answer these two questions.【辛运的是你已经了解了怎样从3D场景做光栅化的基本做法,下面我们讲如果两个三角形投影到平面会发生什么,怎么确定可见性,怎么优化】



Optimizing: 2D Triangles Bounding Box


The problem with the naive implementation of the rasterization algorithm we gave so far, is that it requires in the inner loop to iterate over all pixels in the image, even though only a small number of these pixels may be contained within the triangle (as shown in figure 3). Of course this depends on the size of the triangle in the screen. But considering we are not interested in rendering one triangle but an object made up of potentially from a few hundreds to a few millions triangles, it is unlikely that in a typical production example, these triangles will be very large in the image.【上面我们给出的基本光栅化做法,对于每一个object需要迭代图像中的每一个像素,即使是这个对象很小,只占非常少的像素点。】


There are different ways of minimising the number of tested pixels, but the most common one consists of computing the 2D bounding box of the projected triangle, and iterating over the pixels contained in that 2D bounding box rather than the pixels of the entire image. While some of these pixels might still lie outside the triangle, at least on average, it can already considerably improve the performance of the algorithm. This idea is illustrated in figure 3.【最常用的优化做法是计算三角形的2D包围盒,至迭代处理包围盒内的像素点】




Computing the 2D bounding box of a triangle is actually very simple. We just need to find the minimum and maximum x- and y-coordinates of the three vertices making up the triangle in raster space. This is illustrated in the following pseudo code:【计算2D包围盒只需要知道原来三个点的最大最小xy值,很简单】



Once we calculated the 2D bounding box of the triangle (in raster space), we just need to loop over the pixel defined by that box. But you need to be very careful about the way you convert the raster coordinates, which in our code are defined as floats rather than integers. First note that one or two vertices may be projected outside the boundaries of the canvas. Thus, their raster coordinates may be lower than 0 or greater than the image size. We solve this problem by clamping the pixel coordinates to the range [0, Image Width – 1] for the x coordinate, and [0, Image Height – 1] for the y coordinate. Furthermore we will need to round off the minimum and maximum coordinates of the bounding box to the nearest integer value (note that this works fine when we iterate over the pixels in the loop, because we initialize the variable to xmim or ymin and break from the loop when the variable x or y is lower or equal to xmas or ymax). All these tests needs to be applied before using the final fixed point (or integer) bounding box coordinates in the loop. Here is the pseudo-code:【有了2D包围盒,下面要做的就是遍历包围盒内的像素点。但是这里需要非常注意的是光栅化坐标位置,作者的实现使用的是float而不是int,范围X从是0到Width-1,Y从0到Height-1。你需要查找float坐标周围的所有int的范围】




The Image or Frame-Buffer


Our goal is to produce an image of the scene. We have two ways of visualizing the result of the program, either by displaying the rendered image directly to the screen or saving the image to disk, and using a program such as Photoshop to preview the image later on. But in both cases though, we somehow need to store the image that is being rendered while it’s being rendered and for that purpose, we use what we call in CG an image or frame-buffer.

It is nothing else than a two-dimensional array of colors that has the size of the image. Before the rendering process starts, the frame-buffer is created and the pixels are all set to black. At render time, when the triangles are rasterized, if a given pixel overlaps a given triangle, then we store the color of that triangle in the frame-buffer at that pixel location. When all triangles have been rasterized, the frame-buffer will contain the image of the scene. All that is left to do then is either displaying the content of the buffer to the screen, or save its content to a file. In this lesson, we will choose the latter option.【我们目标是用image展示场景,有两种visualize结果,一种是展示到显示器,一种是存到磁盘,其实不管哪种结果,在此之前结果是在frame buffer里面缓存的。初始化的时候,建立framebuffer并填黑,光栅化的时候,如果pixel与triangle overlap,则存储三角形的颜色到这个pixel,所有object都完成光栅化,则frame buffer存储的就是场景图像了,就可以输出存储或者显示了。】



When Two Triangles Overlap the Same Pixel: The Depth Buffer (or Z-Buffer)


Keep in mind that the goal of the rasterization algorithm is to solve the visibility problem. In order to display 3D objects, it is necessary to determine which surfaces are visible. In the early days of computer graphics two method were used to solve the “hidden surface” problem (the other name for the visibility problem): the Newell algorithm and the z-buffer. We only mention the Newell algorithm for historical reasons but we won’t study it in this lesson because it is actually not used anymore. We will only study the z-buffer method which is used by GPUs.【光栅化的作用是处理可见性问题,hidden surface 的问题现在的统一方法就只有 z-buffer 一种】


There is one last thing though that we need to do in order to get a basic rasterizer working. We need to account for the fact that more than one triangle may overlap the same pixel in the image (as shown in figure 5). When this happens, how do we decide which triangle is visible? The solution to this problem is actually very simple. We will use what we call a z-buffer which is also called a depth buffer, two terms that you may have heard or read about already quite often. A z-buffer is nothing more than another two-dimensional array which has the same dimension than the image, however rather than being an array of colors, it is simply an array of floating numbers. 【如果多个三角面投影到像素点发生重合,做法就是z-buffer解决】


Before we start rendering the image, we initialise each pixel in this array to a very large number. When a pixel overlaps a triangle, we also read the value stored in the z-buffer at that pixel location. As you maybe guessed, this array is used to store the distance from the camera to the nearest triangle that any pixel in the image overlaps. Since this value is initially set to infinity (or any very large number), then of course, the first time we find that a given pixel X overlaps a triangle T1, the distance from the camera to that triangle is necessarily lower than the value stored in the z-buffer. What we do then, is replace the value stored for that pixel with the distance to T1. Next, when the same pixel X is tested and that we find that it overlaps another triangle T2, we then compare the distance of the camera to this new triangle to the distance stored in the z-buffer (which at this point, stores to the distance to the first triangle T1). If this distance to the second triangle is lower than the distance to the first triangle, then T2 is visible and T1 is hidden by T2. Otherwise T1 is hidden by T2, and T2 is visible. In the first case, we update the value in the z-buffer with the distance to T2 and in the second case, the z-buffer doesn’t need to be updated since the first triangle T1 is still the closest triangle we found for that pixel so far. As you can see the z-buffer is used to store the distance of each pixel to the nearest object in the scence (we don’t really use the distance, but we will give the details further in the lesson). 【用最远值初始化z-buffer,然后一个三角形投影的时候,每一个overlap的想读位置都要去读取z-buffer的值,这个值存储的是当前这个像素最近面到相机的距离。如果新的triangle pixel值小于z-buffer,则替换,否则不动。】


In figure 5, we can see that the red triangle is behind the green triangle in 3D space. If we were to render the red triangle first, and the green triangle second, for a pixel that would overlap both triangles, we would have to store in the z-buffer at that pixel location, first a very large number (that happens when the z-buffer is initialized), then the distance to red triangle and then finally the distance to the green triangle.【图五所示】


You may wonder how we find the distance from the camera to the triangle. Let’s first look at an implementation of this algorithm in pseudo-code and we will come back to this point later (for now let’s just assume the function pixelContainedIn2DTriangle computes that distance for us):





What’s Next?


Obviously this is only a very high-level description of the algorithm (figure 6) but this should hopefully already give you an idea of what we will need in the program in order to produce an image.【图6是总体结构描述】



We will need:

  • An image-buffer (a 2D array of colors),
  • A depth-buffer (a 2D array of floats),
  • Triangles (the geometry making up the scene),
  • A function to project vertices of the triangles onto the canvas,
  • A function to rasterize the projected triangles,
  • Some code to save the content of the image buffer to disk.


In the next chapter we will see how are coordinates converted from camera to raster space. The method is of course identical to the one we studied and presented in the previous lesson, however we will present a few more tricks along the way. In chapter three, we will actually learn how to rasterize triangles. In chapter four, we will study in details how the z-buffer algorithm works. As usual, we will conclude this lesson with a practical example.【下一章节讲述坐标转换】
































Software Occlusion Culling – intel



The technique divides scene objects into occluders and occludees and culls occludees based on a depth comparison with the occluders that are software rasterized to the depth buffer. 【做法】

The sample code uses frustum culling and is optimized with Streaming SIMD Extensions (SSE) instruction set and multi-threading to achieve up to 8X performance speedup compared to a non-culled display of the sample scene.【好处】




If we can lower the computational cost of occluded objects, we can improve the application’s performance with no impact to the scene quality. 【把挡住的物体不送进GPU渲染,则可以提高性能】

Using the GPU-generated depth buffer on the CPU is one technique for early detection of occluded objects. However, doing occlusion culling in this manner may cause latency and overhead issues and introduce visual artifacts since the GPU may be several frames behind the CPU. To avoid these problems, we propose software rasterization of the depth buffer on the CPU.【GPU生成的depth buffer是一种方法,但是会影响GPU性能】


In this approach, we use the CPU to rasterize the depth buffer. Then, we use axis-aligned bounding box tests to determine if an object is occluded. Occluded objects are removed from the rendering pipeline to reduce overhead. We do not cull visible and partially visible objects; we submit them to the GPU for rendering. The sample code associated with the Software Occlusion Culling sample implements this technique. Additionally, in this sample, the software rasterizer is vectorized using SSE and multi-threaded to improve performance.【这里使用CPU来做,然后使用axis-aligned bounding box来测试剔除,剔除完提交GPU渲染】


Occluders: Objects that are sufficiently large to hide or occlude other objects in the scene.【遮挡其他物体的物体】

Occludees: Objects that are hidden, or occluded, by other objects in the scene.【被其他物体遮挡的物体】


Software Occlusion Culling has two steps: depth buffer rasterization and depth test culling. The next sections provide details about these steps.


Depth Buffer Rasterization


The occluders in the scene are rasterized to the depth buffer on the CPU. Figure 1 shows a screenshot of the Software Occlusion Culling sample. The castle walls and the ground plane are considered as occluders in the scene. The castle walls along with the attached tiny wooden trim and decorations are used as occluders to avoid special pre-processing of the art assets. Ideally, it would have been better to hand pick just the large castle walls as occluders, but we wanted to make sure that the software occlusion culling algorithm worked with a realistic scene without the need for expensive content changes.【墙可以挡住大部分的东西可以不画】


Our rasterizer divides the frame buffer into tiles and bins occluder triangles based on the intersection with the tiles. The rasterizer bins triangles into multiple tiles when the triangles cross tile boundaries. When processing a tile, a thread steps through the binned triangles and uses bounding box traversal to rasterize the triangles to the depth buffer.【这边是在说自动根据三角形获得包围盒?】 The rasterizer checks if the pixel under consideration is inside the triangle, and if so, it uses barycentric coordinates to interpolate the depth value at the pixel location. The depth buffer rasterization process updates the depth buffer at the pixel position if the newly computed depth is closer to the observer as compared to the already existing depth at the same pixel location.


Once the depth buffer has been rasterized on the CPU, all the occludees in the scene have to be depth tested against the depth buffer to determine which occludees are visible and which can be culled.【等rastered完成,测试被遮挡体是否被遮挡】


Depth Test Culling


We use object space axis-aligned bounding boxes (AABBs) to depth test the occludees in the scene against the CPU-generated depth buffer. The depth test algorithm treats all objects in the scene including the occluders (castle walls and ground plane) as occludees. Using AABB makes the depth tests more conservative. If the AABB is occluded, then the object contained inside it is also occluded and can be culled. If the AABB is visible, then the assumption is that the object contained inside might also be visible. However, as the bounding box is conservative, this assumption may not always be true, and we may have some false positives.【使用物体的AABB来测试】


To determine if an object is clipped by the near clip plane, we could use the homogeneous coordinate w, of the vertices that make up the bounding box. The near clip plane is set to 1.0 in the sample. Objects that are in front of the camera have a w> 1.0. So if any of the vertices that make up the bounding box has a w < 1.0, then the object is being clipped by the near clip plane.【近面裁剪】


However, we store 1/w instead of the homogenous coordinate w to avoid divide by w at multiple locations in the rasterizer.


So the bounding box of an occludee is being clipped if 1/w > 1.0 or 1/w < 0.0 for any vertex of the bounding box in which case we trivially accept the occludee as visible. 【使用1/W替代w来表示更容易处理】





找到了paper:Masked Software Occlusion Culling


Parallel Graphics in Frostbite –
Current & Future


同样的,battlefield3 也是手工建遮挡体


Umbra occlusion culling 也是手动处理出模型

Occlusion Culling in Alan Wake

For solving visibility from the cameras point of view, Alan Wake used a hierarchical hardware based occlusion query algorithm [Bittner et al. 2004], [Guthe et al. 2006]. The advantages of using a hardware based culling algorithm outweighed the content pipeline issues associated with using a typical software occlusion system based on either precomputed potentially visible sets (PVS) or artist generated occlusion geometry and software rasterizer [Aila and Miettinen 2004]. Occlusion culling minimized the overdraw and the bottleneck shifted to shadow map generation.


GPU-Driven Rendering Pipelines



Killzone 软光栅做法

【这里的做法是使用手工调整大小的physics mesh

We didn’t want artists to hand-make occluders for the entire level, so we looked for an automatic solution. We experimented with using visual meshes, but the good polygon reduction proved to be difficult to get right. Physics mesh is much better choice. It’s available for each mesh in the game and it’s cleaned and sufficiently low polygon count. Unfortunately the physics meshes can be slightly larger than visual meshes causing objects to disappear. Worst offenders have to be fixed manually.

Here’s an example of occluders generated automatically from physics mesh. You can notice there’s some unnecessary detail, but in general the quality is pretty good

Even using physics mesh there was too much occluder geometry.

We needed to reject occluders that were unlikely to contribute much to the occlusion buffer. Our simple heuristics rejects all small objects or objects where the name suggests that they are not good occluders. We also reject meshes whose surface area suggests that they are thin or with too many holes. Artists can of course step in and override the heuristics or provide their custom occluders for difficult cases and optimization.

【physics mesh太多,我们需要筛选去掉一些:形状奇怪的,有洞的,小的都不要】



Solving Visibility and Streaming in the The Witcher 3: Wild Hunt with Umbra 3

首先看 LOD0 的才作为遮挡体





Umbra 分析

umbra3,也就是unity的做法,采用的是PVS,基于区域的预计算方法来做occlusion culling

Umbra pre-processes the scene offline, voxelizes it, then structures it into portals and cells to create a spatial database used at runtime. Queries can then be performed in real-time and involve the generation of a low-resolution hierarchical depth-buffer of the scene, performed by an highly-optimized software rasterizer.


It makes sense on a lot of levels and recently Intel also published another example of a purely CPU-based solution — similar to the Umbra technique, occluders are software-rasterized to create a hierarchical depth buffer.intel的实现可以看作是和umbra类似,采用了分层的深度缓冲器】

Umbra’s culling system precomputes a simplified visibility database representing a 3D world. A fast runtime library then traverses the database in order to solve visibility for a view.umbra的方法是需要与计算获得一些信息,运行时使用这些数据做遮挡剔除】

We call these databases tomes. They consist of automatically placed portals and cells. These are common concepts in the world of visibility: a cell is a volume in 3D space, within which visibility is similar. The world is divided into cells. Portals are links between cells: openings in space leading to another cell that also have a shape. The difficult problem is finding good portals and cells in fast and practical fashion.【预计算做的是快速的自动场景体素化】

Umbra’s precomputation step, Umbra optimizer, solves this problem. It starts with voxelization – discretization of 3D space. The algorithm begins by generating small “raw” cells. These are then improved by combining them together into bigger final cells, so that loss in occlusion is minimal.【细分体素后,再重新组合形成最终的cell

Each raw cell is a continuous set of empty voxels not bigger than some user defined size. Portals are generated on open surfaces between these cells. User controls database accuracy by supplying the size for these raw cells. This “raw cell size” together with voxel size are two most important user controlled parameters in Umbra. They are publicly given more understandable names: “smallest occluder” (raw cell size) and “smallest hole” (voxel size). Controlling these parameters allows scaling the data for any environment and requirements. Several versions of data are generated for different levels of detail – occlusion near camera is more important than farther away.cell>=体素大小的块,cell的最大值则是用户可以控制的,这点要注意设置好】

Umbra’s runtime walks these cells linked by portals to figure out the visible volume. Visible objects are collected along the path. Compared to other kind of portal culling systems, Umbra generates a lot of portals. Runtime thus presents quite an engineering challenge: it needs to be fast and optimized on each platform to handle high portal count. Common method for portal culling is narrowing the view frustum to fit encountered portals. Umbra doesn’t do this – Umbra rasterizes the portals instead. In the end, a low-resolution depth buffer is generated.【运行时遍历cell之间的关系链收集到可见物体,如果遇到太多的物体,一般做法是缩小可见区域来删减需要处理的物体的量,umbra的做法则是采用更粗的粒度来参与剔除】

Some visibility approaches are based on hand made occluder models – a representation of what’s solid in the world. You could see Umbra as a method for modelling empty space instead. This approach has the benefit that all inaccuracy can always be countered by growing the empty space (i.e. portals). Sources of inaccuracy include voxelization and low rasterization resolution. Conversely this allows us make the runtime faster by using lower database and runtime resolutions, while always retaining error-free output.【一般做法是手动建遮挡体模型,umbra则是对空区域建模,核心思路就是标记不被遮挡的cell,看有哪些看得见的,看不见的不就不用管了,这样voxel越粗只会影响精度,不会导致错误】

My previous blog post discussed global nature of visibility. Umbra is divided into a local offline operation and a global runtime one. Unlike PVS systems, this allows fast reuse of partial results from previous precomputations – and thus fast local updates to the database.【回顾第一篇的内容,umbra将操作分为本地操作和全局操作,这方法的好处是可以重用预计算结果】

There’s also benefits in having a separate set of static data on which the runtime algorithm operates. It makes the system independent from anything else: it’s parallelizable and easier to integrate. An application can begin to execute Umbra’s runtime as soon as camera’s position is known for a frame.【还有的好处就是数据独立,可差值】


There’s an early and simple solution to the visibility problem aptly called the painter’s algorithm ( 画家算法’s_algorithm ). Like an old renaissance master, the painter’s algorithm has you drawing everything in back-to-front order. While the result looks correct, it causes pentimenti – overdraw that wastes computing cycles. Computer scientists knew how to solve the visibility problem in the 70s like the artists did in the 1700s. The challenge is in solving it in a fast and practical fashion. The art of doing so is called occlusion culling, and we’ve thought about it quite a bit here at Umbra.【画家算法,会导致很多的画的部分最终看不到,overdraw

Modern rendering uses z-buffers – sort of the same thing as painter’s algorithm but for each point instead of each surface. While that does solve the problem, it does that very late in the rendering pipeline. By the time we get around rejecting anything, we’ve wasted a lot of time processing things hiding behind walls. A good visibility solution should know what the hidden objects are very early on, so that we can avoid even sending them for rendering. Avoiding spending time on hidden objects allows for more budget with visible ones.【现代渲染采用zbuffer来考虑深度信息,通过这个让后面的不画,但是如果可以很早的知道哪些对象不画,可以在渲染阶段避免很多的查询计算量,提高渲染效率】

Of course you could solve visibility by doing manual work. And indeed, many occlusion culling systems are based on hand tuning portals or occluders. Less manual work means better workflow for anyone creating content for the system. The bad end of automation spectrum would require tagging all visibility manually, a huge task that will quickly become impractical. An ideal solution would be in the other end – not require manual work and also not place restrictions on content.【基于手动的遮挡剔除管用,但是我们希望这一步可以自动化】

It’s worth mentioning a couple obvious things. A visibility solution should also produce correct results. No visible object should be reported hidden. To guarantee visually correct result, any inaccuracy should be on the conservative side: reporting hidden objects visible. It should also be fast: too slow occlusion culling would negate the benefits.【不管怎样简化都一定要保证最终的渲染结果是正确的,宁可有些应该不可见的没被裁掉】

One big practical issue with visibility is that it is a global problem that involves everything – a small change in visibility can affect the other side of a 3D world. This must somehow be reflected in the computation we perform to solve it. Reducing something into a local problem would allow to divide and conquer: easy parallelization and reuse of partial results. A generic visibility solution doesn’t really lend itself to this. This also means amount of work performed by the algorithm is at least relative to visible volume. We can compute some parts beforehand offline however and some parts at runtime. This is a choice we do get to make, and it allows us to find some extent of localization.【把global问题转化为local的问题好处在于可以并行运算】


再补充些信息:umbra 3的PPT

This is the high level overview of the workflow.

From the original geometry we want to create a data structure that can be used for efficient visibility queries. And the whole process should be both fast and automatic. (and allow at least some amount of dynamic changes to the data)【预计算获得后面用于可见性查询的数据结构,同时支持数据变化】

In runtime visibility query the basic operation is figuring out which objects are visible from specific view points.

These are the requirements we had, we need to create a data structure and algorithm within these bounds.【设计数据结构和算法需要如下的支持】

  • For any input 3D scene, memory usage and processing time of the visibility system must be bounded and independent of the complexity of the input.【够快】
  • The whole process should be automatic.【全自动化】
  • It should allow fast content iteration. 【可写】
  • And it should set no limits on the size of the game worlds.【支持足够大的地图】

So what is visibility?

It’s the set of all the lines in space which do not intersect occluding geometry.


Creating data strucure for accurate visibility presentation from polygon soup is theoretically possible, but in practice the data strucure would be so large and so slow to compute it’s unusable.【基于可见几何体的数据结构,对于大场景太过庞大,没有实用性】

Instead we need to use conservative visibility, where we have bounded time and space and within these bounds give the best results we can.

Visibility is ultimately reasoning about space. Interesting thing here to note is that once we have a topological data structure like this it can be used for other things as well, not just visibility.


This is the high level overview of the workflow.

From the original geometry we want to create a data structure that can be used for efficient visibility queries. And the whole process should be both fast and automatic. (and allow at least some amount of dynamic changes to the data)【我们需要根据可见对象预计算建立实用的可见性查询数据结构】

In runtime visibility query the basic operation is figuring out which objects are visible from specific view points.【运行时查询相关数据获得可见性】

Here you’re looking at top down view of an example scene.【场景顶视图】

The requirement was that there should be no manual markup or other requirements on the input geometry. So what we take as input is all the level geometry as it is.

So we really don’t have any other information besides the long list of polygons, which are possibly grouped into objects.

Doing geometric operations with polygons directly has all kinds of difficulties related to floating point accuracy. Also, almost all real life game levels contain some small modeling errors such as t-vertices and cracks between objects. We need to be able to work with this kind of input as well.


The approach we chose is to create a cell-and-portal graph by grouping the voxels together based on proximity and connectivity. Cells are created from groups of voxels. Portals are then created on the boundaries of these groups.We chose to create portals because in the past they have been proven to be an efficient way to represent visibility, and we solve the issues of manually placed portals by generating them automatically.


In constrast to manually placed portals, we might generate thousands of portals which allows us to have accurate visibility in outdoor spaces as well.

By controlling the number of output cells and portals we can choose the output resolution of the visibility data so that it meets the memory and performance requirements.


In addition to the cell and portal graph we need to be able to figure out in which cell the viewpoint is located when making a visibility query.【可见性查询的时候我们还需要知道viewport位于哪个cell】

We create a KD-tree from the voxels, which can be packed very efficiently with only a few bits per node. The leaf nodes of the tree contain indices to cells in the graph.【voxels建立KD-tree】

The final data stucture contains the cell-and-portal graph, the view tree, and a list of statis objects contained in each cell. 【最终的数据结构】

We call the output data structure a tome .. Tome of visibility .. It contains all the information about topological structure need for visibility determination in a 3D scene.






The basic operation is the visibility query, which uses the tome data to figure out what is visible from a specific view point inside the scene.【根据当前所在场景位置去找出可见性对象】

The goal is to create a depth buffer, or occlusion buffer, from the tome, which can be then used to very efficiently test the visibility of objects.

First step is to figure out in which cell the view point is located in. This is done by traversing the view tree based on the query origin. This is a simple tree traversal and it’s a fast operation.【第一步找出视点在哪个cell

There are several options on how we could use the portal data. We could do a traditional recursive portal traversal, clipping the view frustum as we go through each portal. Or we could do ray-tracing or cone-tracing in the graph.

The approach we choose is to rasterize the portals using a custom software rasterizer optimized for this purpose. With rasterizer we need to touch each portal only once, as opposed to recursive portal traversal which can suffer from exponential blowup if there’s a lot of portal intersections in the screenspace.

(We could also traverse other kind of queries for connectivity.)【通过软光栅来处理可见性,好处是每个portal只需要处理一次】


Also really useful property of the rasterizer is that it produces a depth buffer as output, which is almost optimal data structure for doing further visibility tests on the query output.

Also with rasterization we can choose the output resolution based on the platform and accuracy requirements.

Since we’re rasterizing portals instead of occluders, it’s trivial to implement conservative rasterization, which is a requirement for getting correct results in lower resolutions.

【光栅化输出深度信息,再用来可见性判断,portals instead of occluders 的好处是光栅化粒度可控】

After we have generated the depth buffer, testing objects for visibility is quite straightforward bounding box vs. depth buffer visibility test.

If we keep the depth buffer resolution small enough to fit into the CPU cache, this test is super-super fast, and also trivial to parallelize.

boundingboxdepth buffer比较做可见性判断】





SurRender Umbra: A Visibility Determination Framework for Dynamic Environments

Efficient Algorithms for occlusion culling and shadows (Jan2005)

dPVS: An Occlusion Culling System for Massive Dynamic Environments (2004)















Occlusion Culling – oxel



  1. Hierarchical Z-Buffer Occlusion Culling


Hierarchical Z-Buffer Culling Steps

  1. Bake step – Have your artists prepare occlusion geometry for things in the world that make sense as occluders, buildings, walls…etc. They should all be super cheap to render, boxes/planes. I actually ran across this paper, Geometric Simplification For Efficient Occlusion Culling In Urban Scenes ( ), that sounded like a neat way of automating the process.【生成遮挡体】
  2. CPU – Take all the occlusion meshes and frustum cull them.【遮挡体进入CPU
  3. GPU – Render the remaining occluders to a ‘depth buffer’. The depth buffer should not be full sized, in my code I’m using 512×256. There is a Frostbite paper that mentions using a 256×114-ish sized buffer for a similar solution to do occlusion culling. The ‘depth buffer’ should just be mip 0 in a full mip chain of render targets (not the actual depth buffer).【渲染depth buffer,不需要全尺寸】
  4. GPU – Now downsample the RT containing depth information filling out the entire mipchain. You’ll do this by rendering a fullscreen effect with a pixel shader taking the last level of the mip chain and down sampling it into the next, preserving the highest depth value in a sample group of 4 pixels. For DX11, you can just constrain the shader resource view so that you can both read and render from the same mip chain. For DX9 you’ll have to use StretchRect to copy from a second mip chain, since you can’t sample and render to the same mip chain in DX9. In my code I actually found a more optimized solution, by ping-ponging between 2 mip chains one containing even and the other odd levels, and a single branch in your shader code you can get around the overhead of having to do the StretchRect and just sample from a different mip chain based on the even/odd mip level you need.【对RT降采样获取深度信息】
  5. CPU – Gather all the bounding spheres for everything in your level that could possibly be visible.【收集所有的包围盒】
  6. GPU – DX11 send the list of bounds to a compute shader, which computes the screen space width of the sphere then uses the width to compute the mip level to sample from the HiZ map generated in step 4, such that the sphere covers no more than 2 pixels wide. So large objects in screen space will sample from very high values in the mip chain since they require a coarse view of the world. Whereas small objects in screen space will sample from very low values in the mip chain. In DX9 the process is basically the same, the only difference is that you’ll render a point list of vertices, that instead of a Float3 position are Float4 bounds (xyz = position, w = radius). You’ll also send down a stream of texcoords that will represent x/y pixel values of where to encode the results of the occlusion test for that bound. Instead of a compute shader you’ll process the vertices using a vertex shader, you’ll also need to use the pixel location provided in the texcoord stream to make sure the results of the test are written out to that point in a render target, and in a pixel shader you’ll need to do the sampling to test to see if it’s visible, and output a color like white for culled, black for visible.【将所有的包围盒,以及第四步生成的HiZ Map送入Compute Shader,来比较判断遮挡与否】
  7. CPU – Try to do some work on the CPU after the occluder rendering and culling process is kicked off, for me the entire process took about 0.74 ms of GPU time on a Radeon 5450, with 900 bounds. The overhead of generating the HiZ mip chain and dispatching the culling process is the real bottleneck though, there’s little difference between 900 bounds and 10,000 bounds.occluder rendering and culling 过程中的CPU工作开销】
  8. CPU – Read back the results. DX11 you’re just reading back a buffer output by a compute shader. For DX9 you’ll have to copy the render target rendered to in step 6 containing the checker pattern of black and white pixels and then iterate over the pixels on the CPU to know what is visible and what is hidden.【读取结果】


Hierarchical Z-Buffer Downsampling Code



Hierarchical Z-Buffer Culling Code:












  1. Hierarchical Z-Buffer Occlusion Culling – Shadows


目的就是通过Z-buffer裁剪阴影,见paper:CC Shadow Volumes



  1. Create a HZB from the point of view of the player.【从玩家视点创建HZB
  2. Create a HZB from the point of view of the shadow casting light.【从光源处创建HZB
  3. Cull the shadow casting objects using the lights HZB. 1. If the object is culled, mark it as culled.【裁剪objects
  4. If the object isn’t culled, we need to compute a rough approximation of the extents of shadow volume. You can think of this volume as being the minimum depth points on your bounding box (A-D) and the maximum depth sampled from where those points project onto the light’s HZB (E-H) (Figure 1). Now that you have a bounding volume for the shadow, you can transform that volume into the camera space of the player, and test it against the player’s HZB; just as if the extents were any other object you were attempting to cull with the HZB (Figure 2). If you can’t see the volume, you can write out that the caster is culled, otherwise write it as visible.【如果object没有被裁减,才需要计算shadow volume



  1. Hierarchical Z-Buffer Occlusion Culling – Generating Occlusion Volumes


occlusion volume 特征:

  1. Conservativeness – Doesn’t extend beyond the surface of the mesh
  2. Simplicity – The occlusion volume is made of very few triangles or is fast to render
  3. Volume Conservation – Closely matches the original mesh’s volume
  4. Dynamic – Some games have large movable occluders or destroyable walls


Normal methods of simplifying a mesh such as typical triangle simplification causes problems in both the area of conservativeness and volume conservation. In some cases you can use the physics collision mesh if available. One thing to be aware of is that when the physics volume is larger than the visual mesh it can cause false occlusions leading to things popping into view. Also not every mesh needs a physics volume nor are physics volumes always meshes. Will Vale’s talk from GDC 2011 covered using physics collision meshes pretty thoroughly for HZB-OC, you should check it out.【直接使用碰撞体会导致错误,原因是碰撞体一般都比对应的可见体大】




  1. Find all the voxels completely inside a mesh


The voxelization algorithms I ended up using comes from this paper Fast Parallel Surface and Solid Voxelization on GPUs [Schwarz 2010]. ( ) I used the 3.1 section for surface voxelization and the 4.1 section for solid voxelization. Both are required because a voxel is considered part of the solid volume if the center of the voxel is inside the mesh. We want the entire voxel to be inside the mesh though. So you have to remove the excess voxels from the solid set using the surface set.【方法来源】



My first implementation was in C++ and entirely on the CPU which took about 20 seconds to run on a mesh with ~10,000 triangles and a 503 voxel volume. In the version I’m presenting here I moved to C#, when I did that time went up to about 60 seconds. So I ended up porting a similar version of the CPU implementation to the GPU using OpenCL (Cloo ), on the same data it ran in about 6 seconds on my Nvidia 540M, with no attempt to optimize.CPU,GPU性能差异】


One unfortunate limitation of Schwarz’s solid voxelization algorithm it that it requires watertight meshes. The reason for this is you’re shooting a ray down a column of voxels until you intersect with a triangle. You’re then xor’ing the bit for each voxel signaling that this column of voxel’s is inside or outside. Voxels that are inside a mesh will be Xor’ed an odd number of times, where as outside voxels will be xored an even number of times. So if there is a hole in the mesh, you’d leave a incomplete trail of xor’s in that column, leaving every voxel from the last triangle to the bounding box marked as ‘inside’.【算法的问题】


Two papers I’ve run across that might contain solutions to this problem are An Automatic Hole-Filling Algorithm for Polygon Meshes and Complete Polygonal Scene Voxelization.【问题的解决方法】


  1. Find the voxel at the densest point in the volume



  1. Expand a box from this point until all sides collide with the mesh surface or another box


Once you’ve found the densest voxel you’re going to create a 1^3 voxel sized box at that location. You’ll then proceed to iteratively expand each side of the box in voxel space until you can’t expand any side of the box without entering an empty voxel or another box that has already been placed.【从最密的层向上找到最大的box】


  1. Repeat 2-3 until you’ve consumed X% of the total volume



  1. Filter small or useless boxes (Unimplemented)



  1. Use a Constructive Solid Geometry (CSG) algorithm to merge the boxes you create




  1. Oxel: A Tool for Occluder Generation







































Android Terrain Rendering



To my surprise, I found out that GPU (PowerVR SGX 540) in my venerable Nexus S (2010) supports vertex texture fetch (VTF). That is, accessing texture pixels in vertex shader — a very useful feature for terrain rendering. About a year ago, when I started investigating terrain rendering on Android devices, I did some searching for VTF support on Android devices and figured out it’s not there yet (similar to situation years ago when desktop OpenGL 2.0 was released with support for texture sampling in GLSL verter shaders but most GL implementation just reported GL_MAX_VERTEX_TEXTURE_IMAGE_UNITS to be zero). Now I don’t know how I missed it on my own phone, maybe there was some Android update with updated GPU drivers during last year? I have no idea how many other devices support it now. Hopefully, newest ones with OpenGL ES 3 support it all. I wouldn’t be surprised if among GLES 2 devices only PowerVR + Android 4+ combinations supported it.

opengl es 3.0 全面支持 vertex texture fetch(VTF)】




Anyway, let’s focus on terrain rendering – rough outline:


  • Put the entire height-map into a texture. 【准备好高度图纹理】
  • Have a small 2D grid (say 16×16 or 32×32) geometry ready for rendering terrain tiles.【准备好2D网格】
  • Build a quad tree over over the terrain. Root node covers entire terrain and each child then covers 1/4th of parent’s area.【建地形四叉树】
  • Now we can start rendering, do this every frame: 【渲染】
    • Traverse the quadtree starting from the root.【遍历四叉树】
    • For each child node test if the geometry grid provides sufficient detail for rendering the area covered by this node:【检查每一个节点,判断是否属于渲染区域】
      • YES it does, mark this node for rendering and end traversal of this subtree.【渲染节点,添加到子树】
      • NO it does not, continue traversal and test children of this node (unless we’re at leaf already).【继续遍历子节点】
    • Now take this list of marked nodes and render them. The same 2D grid is used to render each tile: it’s scaled according to tile’s covered area and its vertices are displaced by height values read from texture.【获得所有的标记节点并渲染他们】


This is basically what I originally wanted for Multilevel Geomipmapping years ago but couldn’t do in the end because of the state of VTF support on desktop at that time.【这是最初的想法,但是迫于VTF支持而一开始没有实现】


So what exactly is the benefit of VTF over let’s say geomipmapping here?【VTF能带来什么好处】


Main benefit is ability to get height of the terrain at any position (and multiple times) when processing each tile vertex. In traditional geomipmapping, even if you can move tile vertices around it’s no use since you have only one fixed height value available. With VTF, you can modify tile position of vertex as you like and still be able to get correct height value. 【好处就是可以很容易的获取和修改每一个节点的高度】

This greatly simplifies tasks like connecting neighboring tiles with different levels of detail. No ugly skirts or special stitching stripes of geometry are needed as you can simply move edge points around in the vertex shader. Also geomorphing solutions begin to be usable without much work. And you can display larger terrains as well. With geomipmapping you always have to draw fixed number of tiles (visible leaves) — a number that goes up fast when you enlarge the terrain. VTF may allow you draw fixed number of tiles — regardless of actual terrain size (as distant tiles cover much larger area compared to geomipmap tiles with fixed area). Another one, terrain normals can be calculated inside the shaders from neighboring height values.


Finally, since heightmap is now a regular texture you get filtering and access to compressed texture formats to stuff more of the terrain to the memory.






A few weeks ago I had some sort of a urge to try game and/or graphics development on Android. I’ve been doing some Android application development lately so I had all the needed tools setup already. However, what I wanted was some cross-platform engine or framework so that I could develop mostly on desktop. Working with emulators and physical devices is tolerable when doing regular app development and of course you want to use UI framework native for the platform. For games or other hardware stressing apps I really wouldn’t like to work like that. Emulator is useless as it doesn’t support OpenGL ES 2.0 and is generally very slow. And it’s no pleasure with physical device either, grabbing it all the time off the table, apk upload and installation takes time, etc. So I started looking for some solutions. 【手机平台的图形渲染太太太慢了】


The Search


Initially, I considered using something like Oxynege. It uses Object Pascal syntax, can compile to .NET bytecode and also Java bytecode (and use .NET and Java libs). However, I would still need to use some framework for graphics, input, etc. so I would be locked in .NET or Java world anyway. I’m somewhat inclined to pick up Oxygene in the future for something (there’s also native OSX and iOS target in the works) but not right now. Also, it’s certainly not cheap for just a few test programs. Then there is Xamarin – Mono for Android and iOS (MonoGame should work here?) but even more expensive. When you look Android game engines and frameworks specifically you end up with things like AndEngine, Cocos2D-X, and libGDX. Oxynege 功能不够】


After a little bit of research, I have settled for libGDX. It is a Java framework with some native parts (for tasks where JVM performance maybe inadequate) currently targeting desktop (Win/OSX/Linux), Android, and HTML5 (Javascript + WebGL in fact). Graphics are based on OpenGL (desktop) and OpenGL ES (Android and web). Great thing is that you can do most of the development on desktop Java version with faster debugging and hot swap code. One can also use OpenGL commands directly which is a must (at least for me).【libGDX 基于 opengl 的跨平台游戏开发框架】


The Test


I wanted to do some test program first and I decided to convert glSOAR terrain renderer to Java and OpenGL ES (GLES). The original is written in Object Pascal using desktop OpenGL. The plan was to convert it to Java and OpenGL ES with as little modifications as possible. At least for 3D graphics, libGDX is relatively thin layer on top of GLES. You have some support classes like texture, camera, vertex buffer, etc. but you still need to know what’s a projection matrix, bind resources before rendering, etc. Following is the listing of a few ideas, tasks, problems, etc. during the conversion (focusing on Java and libGDX).【原来写的SOAR是基于Opengl的桌面地形渲染工具,这里的目标是修改它使得支持移动端】




Using Eclipse, I created the projects structure for libGDX based projects. It is quite neat actually, you have a main Java project with all the shared platform independent code and then one project for each platform (desktop, Android, …). Each platform project has just a few lines of starter code that instantiates your shared main project for the respective platform (of course you can do more here).【工程设置】


Two problems here though. Firstly, changes in main project won’t trigger rebuild of the Android project so you have to trigger it manually (like by adding/deleting empty line in Android project code) before running on Android. This is actually a bug in ADT Eclipse plugin version 20 so hopefully it will be fixed for v21 (you star this issue).【问题1:编译调试不方便】


Second issue is asset management but that is easy to fix. I want to use the same textures and other assets in version for each platform so they should be located in some directory shared between all projects (like inside main project, or few dir levels up). The thing is that for Android all assets are required to be located in assets subdirectory of Android project. Recommended solution for libGDX here is to store the assets in Android project and create links (link folder or link source) in Eclipse for desktop project pointing to Android assets. I didn’t really like that. I want my stuff to be where I want it to be so I created some file system links instead. I used mklink command in Windows to create junctions (as Total Commander wouldn’t follow symlinks properly):【问题2asset management 不方便】


d:\Projects\Tests\glSoarJava> mklink /J glSoarAndroid\assets\data Data

Junction created for glSoarAndroid\assets\data <> Data

d:\Projects\Tests\glSoarJava> mklink /J glSoarDesktop\data Data

Junction created for glSoarDesktop\data <> Data


Now I have shared Data folder at the same level as project folders. In future though, I guess there will be some platform specific assets needed (like different texture compressions, sigh).




Too bad Java does not have any user defined value types (like struct in C#). I did split TVertex type into three float arrays. One for 3D position (at least value type vectors would be nice, vertices[idx * 3 + 1] instead of vertices[idx].y now, which is more readable?) and rest for SOAR LOD related parameters. Making TVertex a class and spawning millions of instances didn’t seem like a good idea even before I began. It would be impossible to pass positions to OpenGL anyway.【自定义数据类型用于SOAR LOD相关参数】


Things like data for VBOs, vertex arrays, etc. are passed to OpenGL in Java using descendants of Buffer (direct) class. Even stuff like generating IDs with glGen[Textures|Buffers|…] and well basically everything that takes pointer-to-memory parameters. Of course, it makes sense in environment where you cannot just touch the memory as you want. Still it is kind of annoying for someone not used to that. At least libGDX comes with some buffer utils, including fast copying trough native JNI code.【变量的存储】




Roots of SOAR terrain rendering method are quite old today and come from the times when it was okay to spend CPU time to limit the number of triangles sent to GPUs as much as possible. That has been a complete opposite of the situation on PCs for a long time (if did not have old integrated GPU that is). I guess today that is true for hardware in mobile devices as well (at least the new ones). And there will also be some Java/Dalvik overhead…【原有的SOAR性能不行】


Anyway, this is just an exercise so it the end result may very well be useless and the experience gained during development is what counts.







I have added experimental VTF based terrain renderer to Terrain Rendering Demo for Android (
直接接下去看) testbed and it looks promising. Stitching of the tiles works flawlessly. More work is needed on selecting nodes for rendering (render node or split to children?). Currently, there’s only simple distance based metric but I want to devise something that takes classical “screen-space error” into account. And maybe some fiddling with geomorphing on top …【直接看demo】



Terrain Rendering Demo for Android


I finally got around to releasing Android terrain rendering demo I’ve been working on last few months (a few moments here and there). I did the bulk of work in November 2012, partly described in posts Porting glSOAR to Android and OpenGL ES, Part 1 and Porting glSOAR to Android and OpenGL ES, Part 2 – third part is still just a draft 【最终完成了这个地形demo】


Anyway, here’s the current version which supports Geomipmapping and SOAR terrain rendering methods. Gritty details about the internals will follow in some future post. There is also a nearly identical desktop version for reference, advantage of using LibGDX for this.【这个地形demo支持Geomipmapping和SOAR地形渲染】













glSOAR is ObjectPascal implementation of SOAR terrain rendering algorithm using OpenGL. It can be compiled using FPC or Delphi compiler and works in Windows and Linux (only benchmarking mode). No fancy OpenGL extensions are required, it should work with any OpenGL 1.2 and newer drivers.glSOAR是SOAR算法的Opengl实现】


SOAR algorithm was published by Peter Lindstrom and Valerio Pascucci in 2001. Papers and original C implementation can be found at SOAR starts with coarse terrain mesh each frame and uses view-dependent refinement (standard longest edge bisection) to calculate mesh to be rendered – stress is on the CPU.【SOAR的基本思想就是基于视点的地形渲染优化】









Terrain Simplification Simplified: A General Framework for View-Dependent Out-of-Core Visualization


Abstract—This paper describes a general framework for out-of-core rendering and management of massive terrain surfaces. The two key components of this framework are: view-dependent refinement of the terrain mesh; and a simple scheme for organizing the terrain data to improve coherence and reduce the number of paging events from external storage to main memory.【两大关键做法:基于视点的细化,数据存储优化】




























Phase-Functioned Neural Networks for Character Control













输入x:Trajectory Positions, Directions, Heights;Previous Joint Position, Velocities…



















Phase-functioned NN意思就是我们采用了一种特殊的NN方法,对于不同的motion采用不同的网络权重,避免混合motions,细节请看paper

这是我们最终获得的,简单的动画驱动模型来替代state machineg和blend tree











  1. 需要将整个数据库全存放于内存
  2. 需要手动处理数据
  3. 需要一些复杂的加速方法


  1. 虚拟的无限制的数据容量(任意动作)
  2. 快速的实时的低内存使用







  1. 需要特殊处理解决掉歧义
  2. 一开始需要提供所有的输入轨迹情况
  3. 多层CNN对于游戏来讲还是太慢了




  1. 只能坚持10
  2. 无法避免漂浮
  3. 无法避免歧义


  1. 我们怎么去处理大规模的数据
  2. 我们怎么解决歧义的问题
  3. 我们怎么样让生成的结果看上去不错






  1. 我们希望地形数据和运动数据一起加入学习
  2. 但是同时捕捉运动和地形数据是比较麻烦的
  3. 制作一个高度图数据库,然后让一段运动匹配高度图中的一块



  1. 最终效果不错
  2. 角色轨迹采用窗口模式
  3. 加上了步态,地形高度等信息



phase是0-2pi的标量,表示的是当前locomotion cycle下当前的pose

图示:输入是当前帧pose,输出是下一帧pose,NN里面的参数是phase function





  1. 输入phase生成权重
  2. 使用权重和输入值到nn得到输出值
  3. 衡量输出错误
  4. 反向传播nn和phase函数更新控制点的值





  1. 控制数值范围 0-2pi,在这个范围内可以预计算
  2. 运行时对于预计算的结果做差值
  3. 得到速度和内存之间的平衡



  1. 模型的训练时间非常耗时
  2. 对于美术的编辑和修改,不能直接得到正反馈
  3. 很难预测结果,有问题也不能直接知道为什么


  1. NN很容易处理大量的数据,可以得到万般种结果
  2. 语义分解解决了歧义的问题
  3. 简单的结构和参数化的使用方式很容易控制质量




  1. 先看demo怎么实现的
  2. 再看network怎么处理的


























































  • 首先是SDL初始化,Simple DirectMedia Layer(SDL) is a cross-platform development library designed to provide low level access to audio, keyboard, mouse, joystick, and graphics hardware via OpenGL and Direct3D. 可以说是一个常用的跨平台opengl支持库。



  • GLEW初始化

  • 资源加载

    • Options 里面定义的是一些我们可以修改的设置选项:

    • CameraOrbit 则是相机初始化设置

    • LightDirectional 场景光照初始化设置


    • Character 角色初始化设置






    • Trajectory 路径初始化设置


    • IK 初始化设置



    • Shader 相关


    • Heightmap 高度图初始化设置





      Load()函数就是一个一个读取float存储在 vector<vector<float>> data 里面:

      xxx.txt 信息就是用来生出 data,xxx_ao.txt 信息则是用来生出 vbo/tbo(坐标,颜色等信息);vbo_data/tbo_data(索引坐标信息)。

    • Areas 区域初始化设置


    • PFNN模型


      ArrayXf 这个数据结构是Eigen下面存储 float array的结构。Load函数底下就是加载的文件,很多很多文件啊!


    • 加载游戏世界

      load_world 这些函数,目前来看这些函数里面主要是在做地形标记,所以来说这程序跑起来需要做的地形标记?

  • Game Loop 部分
    • Input处理



    • 渲染




  • 更新相机(直接按键确定)


  • 更新目标方向和速度(直接按键确定)


  • 更新步态(算法数据前处理第一步)

    通过上一时刻的 trajectory 参数 和 options 参数来确定当前时刻 trajectory 的参数。

  • 预测未来的 Trajectory(算法数据前处理第二步)

    通过上一步获得的 trajectory 参数 和 character 参数,来混合获得 trajectory_positions_blend 这个对象

  • 碰撞处理(算法数据前处理第三步)

    根据 areas 的 walls 的信息,来调整 trajectory_positions_blend 的值。

    在这里,又做了一步将 trajectory_positions_blend 的值写回 trajectory

  • 跳跃(算法数据前处理第四步)

    根据 areas 的 jump 的信息,来调整 trajectory 的值。

  • Crouch 区域(算法数据前处理第五步)

    根据 areas 的 crouch_pos 的信息,来调整 trajectory 的值。

  • 墙(算法数据前处理第六步)

    根据 areas 的 walls 的信息,来直接调整 trajectory 的值。

  • Trajectory 旋转(算法数据前处理第七步)

    trajectory->rotations 的值调整

  • Trajectory 高(算法数据前处理第八步)

    根据 heightmap 的值来调整 trajectory 的值

  • 输入的 Trajectory 位置方向(pfnn输入内容第一步)

    Trajectory 信息来获得 pfnn->Xp

  • 输入的 Trajectory 步态(pfnn输入内容第二步)

    Trajectory 信息来获得 pfnn->Xp

  • 输入的 当前的 Joint 位置速度和旋转角度(pfnn输入内容第三步)

    Trajectory 信息来获得 pfnn->Xp

  • 输入的 Trajectory 高度(pfnn输入内容第四步)

    Trajectory 信息来获得 pfnn->Xp

  • Perform Regression 【核心步骤:模型predict


  • 时间处理,这一步就是计算一下predict时间,debug用。
  • Build Local Transformpfnn输出)



  • IK 处理




  • Render Shadow
  • Render Terrain
  • Render Character
  • Render the Rest
  • Render Crouch Area
  • Render Jump Areas
  • Render Walls
  • Render Trajectory
  • Render Joints
  • UI Elements
  • PFNN Visual
  • Display UI




  • Update Past Trajectory

    Trajectory 数据传递更新

  • Update Current Trajectory


  • Collide with walls

    Trajectory 碰撞更新

  • Update Future Trajectory

    Trajectory 依据 pfnn结果来做更新

  • Update Phase

  • Update Camera


















































AI4Animation 工程











  • NeuralNetwork 这个类,里面只做了一件事情,就是让用户选择NN模型,也就是说这个类处理的是交互UI表现和逻辑,没有其他。NN里面应该包含的信息都在Model这个类里面。下图就是model里面存储的数据结构:





  • Controller 这个类,处理的是Input,主要就是WSADQE. 还有一个很重要的变量是Styles数组,记录的应该是状态权重。
  • Character 这里做的就是驱动骨架运动。

而作为核心的中介数据 Trajectory 这个类,其就是一组数据点数组,并且包含对这个数组,单个点的操作方法。单个点的数据内容很丰富,包括各种变换和状态等:



  • 只有存在NN模型的情况下,才会执行下面的所有内容。
  • Update Target Direction / Velocity


    TargetDirection = TargetDirection Trajectory定义的当前位置 跟据 TargetBlending 权重混合。

    TargetVelocity = TargetVelocity Controller输入速度 跟据 TargetBlending 权重混合。

  • Update Gait

    Trajectory.Points[RootPointIndex] 的每一个Style的值 = 当前值 和 用户是否点选了要成为该Style 跟据 GaitTransition 权重混合。

  • Predict Future Trajectory

    预测的位置 = 当前位置和前一个位置的差值的延续 和 TargetPosition 差值获得

    预测的style = 延续当前的style

    预测的方向 = 当前的方向 和 TargetDirection 差值获得

  • Avoid Collisions


  • Input Trajectory Positions / Directions


  • Input Trajectory Gaits


  • Input Previous Bone Positions / Velocities


  • Input Trajectory Heights


  • Predict【利用模型运算】

  • Update Past Trajectory (轨迹上 i < RootPointIndex 的点)

    更新Trajectory.Points[i] 的每一个点的信息:i位置=i+1位置的值(意思就是向前取一个点)

  • Update Current Trajectory(轨迹上 RootPointIndex 所在的点)


  • Update Future Trajectory(轨迹上 RootPointIndex+1 < i < Trajectory.Points.Length的点)

    每个点新的位置 = 每个点原来位置 + 当前方向 与 跟据模型输出值混合得到的距离和方向 差值(这边做了多重的影响差值考虑)

  • Avoid Collisions

    同 5 做法一致

  • Compute Posture


    每个 positions[i] = NN返回结果 * 0.5 + 上一个位置按照上一个方向到达的这一个应该在的位置 * 0.5;

    每个 Velocities[i] = NN返回的结果

  • Update Posture


  • Map to Character