Author: sc1991327

软光 – the visibility problem

In the second chapter of this lesson, we learned that in the third coordinate of the projected point (the point in screen space) we store the original vertex z-coordinate (the z-coordinate of the point in camera space):【从第二章我们知道,我们存储了额外的z值】

 
 

 
 

Finding the z-coordinate of a point on the surface of the triangle is useful when a pixel overlaps more than one triangle. And the way we find that z-coordinate, is by interpolating the original vertices z-coordinates using the barycentric coordinates that we learned about in the previous chapter. In other words, we can treat the z-coordinates of the triangle vertices as any other vertex attribute, and interpolate them the same way we interpolated colors in the previous chapter. Before we look into the details of how this z-coordinate is computed, let’s start to explain why we need to do so.【在很多三角形overlap的时候,我们可以通过z值比较来确定可见性,这里我们来看怎么处理z值】

 
 

 
 

The Depth-Buffer or Z-Buffer Algorithm and Hidden Surface Removal

 
 

When a pixel overlaps a point, what we actually see through that pixel is a small area on the surface of a triangle, which for simplification we will reduce to a single point (denoted P in figure 1). Thus to each pixel covering a triangle, corresponds a point on the surface of that triangle. Of course, if a pixel covers more than one triangle, we then have several of these points. The problem when this happens is to find which one of these points is actually visible? We have illustrated this concept in 2D in figure 2. We could test triangles from back to front (this technique would require to sort triangles by decreasing depth first) but this doesn’t always work when triangle intersects each other (figure 2, bottom). The only reliable solution is to compute the depth of each triangle a pixel overlaps, and then compare these depth values to find out which one is the closest to camera. If you look at figure 2, you can see that a pixel in the image overlaps two triangle in P1 and P2. However P1 z-coordinate (Z1) is lower than P2 z-coordinate (Z2) thus we can deduce that P1 is in front of P2. Note that this technique is needed because triangles are tested in “random” order. As mentioned before we could sort out triangles in decreasing depth order but this not good enough. Generally, they are just tested in in the order they are specified in the program, and for this reason, a triangle T1 that is closer to camera can be tested before a triangle T2 that is further away. If we were not comparing these triangles depth, then we would end up in this case seeing the triangle which was tested last (T2) when in fact we should be seeing T1. As mentioned many times before, this is called the visibility problem or hidden surface problem. Algorithms for ordering objects so that they are drawn correctly are called visible surface algorithms or hidden surface removal algorithms. The depth-buffer or z-buffer algorithm that we are going to study next, belongs to this category of algorithms.【当一个p对应于一个pixel的时候,直接光栅化显示(图一);如果多个p对应于一个pixel的时候,比较好的方法就是比较这些p的深度,画最近的(图二)】

 
 

 
 

 
 

One solution to the visibility problem is to use a depth-buffer or z-buffer. A depth-buffer is nothing more than a two-dimensional array of floats that has the same dimension than the frame-buffer and that is used to store the objects depth as the triangles are being rasterized. When this array is created, we initialize each pixel in the array with a very large number. If we find that a pixel overlaps the current triangle, we do as follows:visibility的解法是采用depth-bufferdepth-bufferframe-buffer一一对应,存储当前pixel显示的内容的深度,初始化为最大值,如果一个triangle与pixel overlap,则见下面步骤】

  • We first compute the z-coordinate or depth of the point on the triangle that the pixel overlaps.【计算三角形每个点的深度】
  • We then compare that current triangle depth with the value stored in the depth buffer for that pixel.【比较这个三角形上面P的深度和其depth-buffer的深度】
  • If we find that the value stored in the depth-buffer is greater than the depth of the point on the triangle, then the new point is closer to the observer or the camera than the point stored in the depth buffer at that pixel location. The value stored in the depth-buffer is then replaced with the new depth, and the frame-buffer is updated with the current triangle color. On the other hand, if the value stored in the depth-buffer is smaller than the current depth sample, then the triangle that the pixel overlaps is hidden by the object whose depth is currently stored in the depth-buffer.【新的小于depth-buffer则替换显示,否则就是被遮挡】

 
 

 
 

 
 

Finding Z by Interpolation

 
 

Hopefully the principle of the depth-buffer is simple and easy to understand. All we need to do now, is explained how depth values are computed. First let’s repeat one more time what that depth value is. When a pixel overlaps a triangle, it actually overlaps a small surface on the surface of the triangle, which as mentioned in the introduction we will reduce to a point for simplification (point P in figure 1). What we want to find here, is this point z-coordinate. As also mentioned earlier in this chapter, if we know the triangle vertices’ z-coordinate (which we do, they are stored in the projected point z-coordinate), all we need to do is interpolate these coordinates using P’s barycentric coordinates (figure 4):【depth buffer算法本身比较简单易于理解,我们还需要考虑的是P的深度值怎么计算,我们通过差值来获得,比如考虑重心坐标差值方式可行么?】

 
 

 
 

 
 

Technically this sounds reasonable, though unfortunately it doesn’t work. Let’s see why. The problem is not in the formula itself which is perfectly fine. The problem is that once the vertices of a triangle are projected onto the canvas (once we have performed the perspective divide), then z, the value we want to interpolate, doesn’t vary linearly anymore across the surface of the 2D triangle. This is easier to demonstrate with a 2D example.【不可行,因为投影过后的2D三角形的Z值不是线性的】

 
 

The secret lies in figure 4. Imagine that we want to find the “image” of a line defined in 2D space by two vertices V0 and V1. The canvas is represented by the horizontal green line. This line is one unit away (along the z-axis) from the coordinate system origin. If we trace lines from V0 and V1 to the origin, then we intersect the green lines in two points (denoted V0′ and V1′ in the figure). The z-coordinate of these point is 1 since they lie on the canvas which is 1 unit away from the origin. The x-coordinate of the points can easily be computed using perspective projection. We just need to divide the original vertex x-coordinates by their z-coordinate. We get:【如图四,我们要考虑的是原始的线段P所在位置的比例,这个直接插值就是对的】

 
 

 
 

The goal of the exercise is to find the z-coordinate of P, a point on the line defined by V0 and V1. In this example, all we know about P is the position of its projection P’, on the green line. The coordinates of P’ are {0,1}. The problem is similar to trying to find the z-coordinate of a point on the triangle that a pixel overlaps. In our example, P’ would be the pixel and P would be the point on the triangle that the pixel overlaps. What we need to do now, is compute the “barycentric coordinate” of P’ with respect to V0′ and V1′. Let’s call the resulting value λλ. Like our triangle barycentric coordinates, λλ is also in the range [0,1]. To find λλ, we just need to take the distance between V0′ and P’ (along the x-axis), and divide this number by the distance between V0′ and V1′. If linearly interpolating the z-coordinates of the original vertices V0 and V1 using λλ to find the depth of P works, then we should get the number 4 (we can easily see by just looking at the illustration that the coordinates of P are {0,4}). Let’s first compute λλ:【我们的目标是找到P的z坐标,这里我们还是可以求重心坐标比率lamda,公示如下】

 
 

 
 

If we now linearly interpolate V0 and V1 z-coordinate to find P z-coordinate we get:【另外P的线性差值表示如下】

 
 

 
 

Clearly this is not the value we expect! Interpolating the original vertices z-coordinates, using P’s “barycentric coordinates” or λλ in this example, to find P z-coordinate doesn’t work. Why? The reason is simple. Perspective projection preserves lines, but does not preserve distances. It’s quite easy to see in figure 4, that the ratio of the distance between V0 and P over the distance between V0 and V1 (0.666) is not the same than the ratio of the distance between V0′ and P’ over the distance between V0′ and V1′ (0.833). If λλ was equal to 0.666 it would work fine, but here is the problem, it’s equal to 0.833 instead! So, how do we find the z-coordinate of P?【很明显如图四,到此为止得不到我们想要的P的z值,我们怎么去处理?】

 
 

The solution to the problem is to compute the inverse of P z-coordinate by interpolating the inverse of the vertices V0 and V1 z-coordinates using λλ. In other words, the solution is:【解决方法是去计算P的z坐标的倒数,一下就可以得到下面的公式】

 
 

 
 

 
 

If now take the inverse of this result, we get for P z-coordinate the value 4. Which is the correct result! As mentioned before, the solution is to linearly interpolate the vertices z-coordinates using barycentric coordinates, and invert the resulting number to find the depth of P (its z-coordinate). In the case of our triangle, the formula is:【然后你会发现,直接代入值,比例居然是对的。就如前面所说,解决方案是使用重心坐标对顶点Z坐标进行线性内插,并将结果数字倒置以找到P的深度。三角形的情况公式如下】

 
 

 
 

Let’s now look into this problem more formally. Why do we need to interpolate the vertices inverse z-coordinates? The formal explanation is a bit complicated and you can skip it if you want. Let’s consider a line in camera space defined by two vertices whose coordinates are denoted (X0,Z0)(X0,Z0) and (X1,Z1)(X1,Z1). The projection of these vertices on the screen are denoted S0S0 and S1S1 respectively (in our example, we will assume that the distance between the camera origin and the canvas is 1 as shown in figure 5). Let’s call S a point on the line defined by S0S0 and S1S1. S has a corresponding point P on the 2D line whose coordinates are (X,Z = 1) (we assume in this example that the screen or the vertical line on which the points are projected is 1 unit away from the coordinate system origin). Finally, the parameters tt and qq are defined such that:【我们来思考为什么需要inverse z坐标。如果在XZ平面内,我们来考虑camera space 的一条线,其对应到screen就是s这条线,一些参数定义如下图所示】

 
 

 
 

那么可以得到的P点以及对应的屏幕S点的表示是:

或者

那么XZ平面内差值表示:

S本身是一维的,不存在z值,则有

则有

X,S用上面公式替代得到

我们一样有

因此:

代入上面的式子有:

因为

所以有

化简得到

因此t的表示如下:

再回头带入Z公式得到:

因此推导得到:

 
 

这就是P点z值的求解公示

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

软光 – the rasterization stage

 
 

Rasterization: What Are We Trying to Solve?

 
 

In the previous chapter, we learned how to performed the first step of the rasterization algorithm in a way, which is to project the triangle from 3D space onto the canvas. This definition is not entirely accurate in fact, since what we actually really did was to transform the triangle from camera space to screen space, which as mentioned in the previous chapter, is also a three-dimensional space. However the x- and y-coordinates of the vertices in screen-space correspond to the position of the triangle vertices on the canvas, and by converting them from screen-space to NDC space and then finally from NDC-space to raster-space, what we actually get in the end are the vertices 2D coordinates in raster space. Finally, we also know that the z-coordinates of the vertices in screen-space holds the original z-coordinate of the vertices in camera space (inverted so that we deal with positive numbers rather than negatives ones).【上一章节讲的是将三角形从三维空间投影到canvas,最终得到裁剪空间的2D坐标和z值】

 
 

What we need to do next, is to loop over the pixel in the image and find out if any of these pixels overlap the “projected image of the triangle” (figure 1). In graphics APIs specifications, this test is sometimes called the inside-outside test or the coverage test. If they do, we then set the pixel in the image to the triangle’s color. The idea is simple but of course, we now need to come up with a method to find if a given pixel overlaps a triangle. This is essentially what we will study in this chapter. We will learn about the method that is typically used in rasterization to solve this problem. It uses a technique known as the edge function which we are now going to describe and study. This edge function is also going to provide valuable information about the position of the pixel within the projected image of the triangle known as barycentric coordinates. Barycentric coordinates play an essential role in computing the actual depth (or the z-coordinate) of the point on the surface of the triangle that the pixel overlaps. We will also explain what barycentric coordinates are in this chapter and how they are computed.【下一步我们要做的是递归所有的overlap pixel。图形API中叫这一步叫做 inside-outside test,这边方法的主要思想第一章讲过,主要要注意的是边界的处理,这边使用的技术叫edge function来处理】

At the end of this chapter, you will be able to actually produce a very basic rasterizer. In the next chapter, we will actually look into the possible issues with this very naive implementation of the rasterization algorithm. We will list what these issues are as well as study how they are typically addressed.【这边还会给出基本的实现,以及一些常见问题的解决】

 
 

A lot of research has been done to optimize the algorithm. The goal of this lesson is not to teach you how to write or develop an optimized and efficient renderer based on the rasterization algorithm. The goal of this lesson is to teach the basic principles of the rendering technique. Don’t think though that the techniques we present in these chapters, are not actually used. They are used to some extent, but how they are actually implemented either on the GPU or in a CPU version of a production renderer, is just likely to be a highly optimized version of the same idea. What is truly important is to understand the principle and how it works in general. From there, you can study on your own about the different techniques which are used to speed up the algorithm. But the techniques presented in this lesson are generic and make up the foundations of any rasterizer.【有很多研究都是在优化这一步,但是本教材主要告诉你的是基本的方法,你可以自学其他的优化方法】

 
 

【牢记】

Keep in mind that drawing a triangle (since triangle is primitive we will use in this case), is a two steps problem:

  • We first need to find which pixels overlap the triangle.【找出三角形overlap哪些pixel
  • We then need to define which colors should the pixels overlapping the triangle be set to, a process that is called shading【决定pixel的颜色】

 
 

 
 

The Edge Function

 
 

As mentioned above, they are several possible methods to find if a pixel overlaps a triangle. It would be good to document older techniques, but in this lesson, will only present the method that is generally used today. This method was presented by Juan Pineda in 1988 and a paper called “A Parallel Algorithm for Polygon Rasterization” (see references in the last chapter).【这里我们所采用的方法】

 
 

Before we look into Pineda’s technique itself, we will first describe the principle of his method. Let’s say that the edge of a triangle can be seen as a line splitting the 2D plane (the plane of the image) in two (as shown in figure 2). The principle of Pineda’s method is to find a function which he called the edge function, so that when we test on which side of this line a given point is (the point P in figure 2), the function returns a negative number when it is to the left of the line, a positive number when it is to the right of this line, and zero, when the point is exactly on the line.【每一条边可以看作是划分plane成两部分的线,这个方法就是找出edge functions,然后测试point在这些线的哪一边,坐便输出positive number,右边输出 negative number。如下图所示】

 
 

 
 

In figure 2, we applied this method to the first edge of the triangle (defined by the vertices v0-v1. Be careful the order is important). If we now apply the same method to the two other edges (v1-v2 and v2-v0), we then can clearly see that there is an area (the white triangle) within which all points are positive (figure 3). If we take a point within this area, then we will find that this point is to the right of all three edges of the triangle. If P is in fact a point in the centre of a pixel, we can then use this method to find if the pixel overlaps the triangle. If for this point, we find that the edge function returns a positive number for all three edges, then the pixel is contained in the triangle (or may lie on one of its edges). The function Pinada uses also happens to be linear which means that it can be computed incrementally but we will come back on this point later.【上图只考虑一条边,这里考虑三条边如下图。顺时针描述线,则point的描述结果都为positive的话,就是在三角形内部】

 
 

 
 

Now that we understand the principle, let’s find out what that function is. The edge function is defined as (for the edge defined by vertices V0 and V1):【来看公式表述】

 
 

 
 

As the paper mentions, this function has the useful property that its value is related to the position of the point (x,y) relative to the edge defined by the points V0 and V1:【得到的结果与意义如下】

  • E(P) > 0 if P is to the “right” side
  • E(P) = 0 if P is exactly on the line
  • E(P) < 0 if P is to the “left ” side

 
 

In fact this function is equivalent in mathematics to the magnitude of the cross products between the vector (v1-v0) and (P-v0). We can also write these vectors in a matrix form (presenting this as a matrix has no other interest than just presenting the two vectors in a neat way):【三条边一起考虑,可以矩阵表示】

 
 

 
 

If we write that A=(PV0)A=(PV0) and B=(V1V0)B=(V1V0), then we can also write the vectors A and B as a 2×2 matrix:

 
 

 
 

The determinant of this matrix can be computed as:

 
 

【这就是矩阵表示的最终公式】

 
 

Understanding what’s actually happening is easier when we look at the result of a cross product between two 3D vectors (figure 4). 【为了理解,我们来看图四】

 
 

 
 

In 3D, the cross-product returns another 3D vector which is perpendicular (or orthonormal) to the two original vectors. But as you can see in figure 4, the magnitude of that orthonormal vector also changes with the orientation of the two vectors with respect to each other. In figure 4, we assume a right-hand coordinate system. When the two vectors A (red) and B (blue) are either pointing exactly in the same direction or in opposite directions, the magnitude of the third vector C (in green) is zero. Vector A has coordinates (1,0,0) and is fixed. When vector B has coordinates (0,0,-1), then the green vector, vector C has coordinate (0,-1,0). If we were to find its “signed” magnitude, we would find that it is equal to -1. On the other hand, when vector B has coordinates (0,0,1), then C has coordinates (0,1,0) and its signed magnitude is equal to 1. In one case the “signed” magnitude is negative, and in the second case, the signed magnitude is positive. In fact, in 3D, the magnitude of a vector can be interpreted as the area of the parallelogram having A and B as sides as shown in figure 5 (read the Wikipedia article on the cross product to get more details on this interpretation):【这边想说的是上面的公式可以看作是坐标系变换,把ABC看作是一个坐标系,就可以标出一个positive空间】

 
 

 
 

Obviously an area should always be positive, though the sign of the above equation provides an indication about the orientation of the vectors A and B with respect to each other. When with respect to A, B is within the half plane defined by vector A and a vector orthogonal to A (let’s call this vector D; note that A and D form an 2D Cartesian coordinate system), then the result of the equation is positive. When B is within the opposite half plane, the result of the equation is negative (figure 6). Another way of explaining this result, is that the result is positive when the angle θθ is in the range ]0,π[]0,π[ and negative when θθ is in the range ]π,2π[]π,2π[. Note then when theta is exactly equals to 0 or ππ then the cross-product or the edge function returns 0.【上面公式的另一解释,半平面方面的理解(妈的一点也不直观,没看明白这东西怎么表示,不看没关系)】

 
 

 
 

To find if a point is inside a triangle, all we care about really is the sign of the function we used to compute the area of the parallelogram. However, the area itself also plays an important role in the rasterization algorithm; it is used to compute the barycentric coordinates of the point in the triangle, a technique we will study next. The cross product in 3D and 2D has the same geometric interpretation, thus the cross-product between two 2D vectors also returns the “signed” area of the parallelogram defined by the two vectors. The only difference is that in 3D, to compute the area of the parallelogram you need to use this equation:【为了判断point是否在三角形内,我们关心的是公式解的符号。但是其实三角形内部区域本身也很重要,它用来计算三角形中点的重心坐标,3D计算公式如下】

 
 

while in 2D, this area is given by the cross-product itself (which as mentioned before can also be interpreted as the determinant of a 2×2 matrix):2D计算公式】

 
 

 
 

From a practical point of view, all we need to do now, is test the sign of the edge function computed for each edge of the triangle and another vector defined by a point and the first vertex of the edge (figure 7).【实际实现就是对于每个点检测三条边公式如下】

 
 

 
 

If all three tests are positive or equal to 0, then the point is inside the triangle (or lie on one of the edges of the triangle). If any one of the test is negative, then the point is outside the triangle. In code we get:【如果结果都是positive则是在三角形内】

 
 

 
 

 
 

Alternative to the Edge Function

 
 

There are obviously other ways than the edge function method to find if pixels overlap triangles, however as mentioned in the introduction of this chapter, we won’t study them in this lesson. Just for reference though, the other common technique is called scanline rasterization. It is based on the Brenseham algorithm that is generally used to draw lines. GPUs use the edge method mostly because it is more generic than the scanline approach which is also more difficult to run in parallel that the edge method, but we won’t provide more information on this topic in this lesson.【其他替代方法自己看去,这里有reference】

 
 

 
 

Be Careful! Winding Order Matters

 
 

One of the things we have been talking about yet, but which has a great importance in CG, is the order in which you declare the vertices making up the triangles. They are two possible conventions which you can see illustrated n the figure 8: clockwise or counter-clockwise ordering or winding. Winding is important because it essentially defines one important property of the triangle which is the orientation of its normal. Remember that the normal of the triangle can be computed from the cross product of the two vectors A=(V2-V0) and B=(V1-V0). Let’s say that V0={0,0,0}, V1={1,0,0} and V2={0,-1,0} then (V1-V0)={1,0,0} and (V2-V0)={0,-1,0}. Let’s now compute the cross product of these two vectors:【这个算法非常重要的一点是要注意三角形顶点顺序,顺时针和逆时针会得到相反的结果】

 
 

 
 

As expected, the two normals are pointing in opposite directions. The orientation of the normal has a great importance for lots of different reasons, but one of the most important ones is called face culling. Most rasterizers and even ray-tracer for that matter may not render triangles whose normal is facing away from the camera. This is called back-face culling. Most rendering APIs such as OpenGL or DirectX give the option to turn back-face culling off, however you should still be aware that vertex ordering plays a role in what’s actually rendered, among many other things. And not surprisingly, the edge function is one of these other things. Before we get to explaining why it matters in our particular case, let’s say that there is not particular rule when it comes to choosing the order. In reality so many details in a renderer implementation may change the orientation of the normal that you can’t assume that by declaring vertices in a certain order, you will get the guarantee that the normal will be oriented a certain way. For instance rather that using the vectors (V1-V0) and (V2-V0) in the cross-product you could as have used (V0-V1) and (V2-V1) instead. It would have produced the same normal but flipped. Even if you use the vectors (V1-V0) and (V2-V0), remember that the order of the vectors in the cross-product changes the sign of the normal: A×B=B×AA×B=B×A. So the direction of your normal also depends of the order of the vectors in the cross-product. For all these reasons, don’t try to assume that declaring vertices in one order rather than the other will give you one result or the other. What’s important though, is that once you stick to the convention you have chosen. Generally, graphics APIs such as OpenGL and DirectX expect triangles to be declared in counter-clockwise order. We will also use counter-clockwise winding. Now let’s see how ordering impacts the edge function.【不同的顶点排列顺序得到的法线结果不一样,法线也很重要,正好可以用于背面剔除,OpenGL/DX都是这么用的,顺便更进一步就是用来做光栅化。】

 
 

 
 

 
 

 
 

Barycentric Coordinates (重心坐标)

 
 

Computing barycentric coordinates is not necessary to get the rasterization algorithm working. For a really naive implementation of the rendering technique, all you need is to project the vertices and use a technique like the edge function that we described above, to find if pixels are inside triangles. These are the only two necessary steps to produce an image. However the result of the edge function which as we explained above, can be interpreted as the area of the parallelogram defined by vector A and B can actually directly be used to compute these barycentric coordinates. Thus, it makes sense to study the edge function and the barycentric coordinates at the same time.【计算重心坐标在光栅化这一步不是必要的,但是可以在这一步顺便计算出来给其他方面使用】

 
 

Before we get any further though, let’s explain what these barycentric coordinates are. First, they come in a set of three floating point numbers which in this lesson, we will denote λ0λ0, λ1λ1 and λ2λ2. Many different conventions exist but Wikipedia uses the greek letter lambda as well (λλ) which is also used by other authors (the greek letter omega ωω is also sometimes used). This doesn’t matter, you can call them the way you want. In short, the coordinates can be used to define any point on the triangle in the following manner:【如果我们有三个点,则任意的点P可以表示为如下公示所示】

 
 

Where as usual, V0, V1 and V2 are the vertices of a triangle. These coordinates can take on any value, but for points which are inside the triangle (or lying on one of its edges) they can only be in the range [0,1] and the sum of the three coordinates is equal to 1. In other words:【如果三个点是三角形顶点,则P如果在三角形内,lamda的值都应该在[0,1]内且相加为1,如下所示】

 
 

This is a form of interpolation if you want. They are also sometimes defined as weights for the triangle’s vertices (which is why in the code we will denote them with the letter w). A point overlapping the triangle can be defined as “a little bit of V0 plus a little bit of V1 plus a little bit of V2”. Note that when any of the coordinates is 1 (which means that the others in this case are necessarily 0) then the point P is equal to one of the triangle’s vertices. For instance if λ2=1λ2=1 then P is equal to V2. Interpolating the triangle’s vertices to find the position of a point inside the triangle is not that useful. But the method can also be used to interpolate across the surface of the triangle any quantity or variable that has been defined at the triangle’s vertices. Imagine for instance that you have defined a color at each vertex of the triangle. Say V0 is red, V1 is green and V2 is blue (figure 12). What you want to do, is find how are these three colors interpolated across the surface of the triangle. If you know the barycentric coordinates of a point P on the triangle, then its color CPCP (which is a combination of the triangle vertices’ colors) is defined as:【这就是我们所期望的差值表示形式,这个差值可以直接用于颜色的平滑表示,如图12,点p的颜色结果就直接可以差值得到,公式如下】

 
 

 
 

This is a very handy technique which is going to be useful to shade triangles. Data associate with the vertices of triangles are called vertex attribute. This is a very common and very important technique in CG. The most common vertex attributes are colors, normals and texture coordinates. What this means in practice, is that generally when you define a triangle you don’t only pass on to the renderer the triangle vertices but also its associated vertex attributes. For example if you want to shade the triangle you may need a color and normal vertex attribute, which means that each triangle will be defined by 3 points (the triangle vertex positions), 3 colors (the color of the triangle vertices) and 3 normals (the normal of the triangle vertices). Normals too can be interpolated across the surface of the triangle. Interpolated normals are used in a technique called smooth shading which was first introduced by Henri Gouraud. We will explain this technique later when we get to shading.【smooth shading就是这么实现的】

 
 

【原文这里还描述了面积与lamda之间的关系的推导,目测不是特别有用不再赘述,这里的结论还是需要的,就是告诉你怎么去求解lamda的值,lamda就是各个面积的比值结果,也就是E对应的比值结果】

 
 

Let’s see how it looks in code. We were already computing the edge functions before to test if points were inside triangles. Only, in our previous implementation we were just returning true or false depending on whether the result of the function was either positive or negative. To compute the barycentric coordinates, we need the actual result of the edge function. We can also use the edge function to compute the area (multiplied by 2) of the triangle. Here is a version of an implementation that tests if a point P is inside a triangle and if so, computes its barycentric coordinates:【代码,edge function的实现】

 
 

 
 

 
 

Interpolate vs. Extrapolate

 
 

One thing worth noticing is that the computation of barycentric coordinates works independently from its position with respect to the triangle. In other words, the coordinates are valid if the point is inside our outside the triangle. When the point is inside, using the barycentric coordinates to evaluate the value of a vertex attribute is called interpolation, and when the point is outside, we speak of extrapolation. This an important detail because in some cases, we will have to evaluate the value of a given vertex attribute for points that potentially don’t overlap triangles. To be more specific, this will be needed to compute the derivatives of the triangle texture coordinates for example. These derivatives are used to filter textures properly. If you are interested in learning more about this particular topic we invite you to read the lesson on Texture Mapping. In the meantime, all you need to remember is that barycentric coordinates are valid even when the point doesn’t cover the triangle. You also need to know about the difference between vertex attribute extrapolation and interpolation.【这里值得注意的是点p用三个顶点表示的方式,不管p在不在三角形内,都是成立的。在三角形内的时候叫做差值,在三角形外的时候叫做外推,这个外推在texture坐标对应的时候还是很有用的】

 
 

 
 

Rasterization Rules

 
 

In some special cases, a pixel may overlap more than one triangle. This happens when a pixel lies exactly on an edge shared by two triangles as shown in figure 17. Such pixel would pass the coverage test for both triangles. If they are semi-transparent, a dark edge may appear where the pixels overlap the two triangles as a result of the way semi-transparent objects are combined with each other (imagine two super-imposed semi-transparent sheets of plastic. The surface is more opaque and looks darker than the individual sheets). You would get something similar to what you can see in figure 18, which is a darker line where the two triangles share an edge.【考虑一个特殊情况,如图所示,两个三角形的一条边共线,这时候如果三角形都是半透明的,光栅化的结果会是中间有一条比较深的线,原因是这里两个三角形都绘制了】

 
 

 
 

 
 

The solution to this problem is to come up with some sort of rule that guarantees that a pixel can never overlap twice two triangles sharing an edge. How do we do that? Most graphics APIs such as OpenGL and DirectX define something which they call the top-left rule. We already know the coverage test returns true if a point is either inside the triangle or if it lies on any of the triangle edges. What the top-left rule says though, is that the pixel or point is considered to overlap a triangle if it is either inside the triangle or lies on either a triangle top edge or any edge that is considered to be a left edge. What is a top and left edge? If you look at figure 19, you can easily see what we mean by top and left edge.【解决方法是需要有一个排序做法排除这种情况。Opengl/DirectX 采用的做法叫做 top-left rule,就是overlap的边界如果处在左边和上面的边会绘制,判断方法如下】

 
 

  • A top edge is a edge that is perfectly horizontal and whose defining vertices are above the third one. Technically this means that the y-coordinates of the vector V[(X+1)%3]-V[X] is equal to 0 and that its x-coordinates is positive (greater than 0).
  • A left edge is essentially an edge that is going up. Keep in mind that in our case, vertices are defined in clockwise order. An edge is considered to go up if its respective vector V[(X+1)%3]-V[X] (where X can either be 0, 1, 2) has a positive y-coordinate.

 
 

 
 

 
 

 
 

 
 

Putting Things Together: Finding if a Pixel Overlaps a Triangle

 
 

Let’s test the different techniques we learned about in this chapter, in a program that produces an actual image. We will just assume that we have projected the triangle already (check the last chapter of this lesson for a complete implementation of the rasterization algorithm). We will also assign a color to each vertex of the triangle. Here is how the image is formed. We will loop over all the pixels in the image and test if they overlap the triangle using the edge function method. All three edges of the triangle are tested against the current position of the pixel, and if the edge function returns a positive number for all the edges then the pixel overlaps the triangle. We can then compute the pixel’s barycentric coordinates and use these coordinates to shade the pixel by interpolating the color defined at each vertex of the triangle. The result of the frame-buffer is saved to a PPM file (that you can read with Photoshop). The output of the program is shown in figure 20.【到此为止可以完成光栅化这一步了,可以使用颜色填充三角形,可以解决多个三角形overlap的问题,可以输出如图所示的结果】

 
 

 
 

 
 

 
 

Conclusion and What’s Next?

 
 

There are many interesting techniques and trivia related to the topics barycentric coordinates but this lesson is just an introduction to the rasterization algorithm thus we won’t go any further. One trivia that is interesting to know though, is that barycentric coordinates are constant along lines parallel to an edge (as shown in figure 21).【同样还有很多相关的技术来解决同样的问题,这里只讲基础的,其他的自己看】

 
 

【总结】

In this lesson, we learned two important method and various concepts.

  • First, we learned about the edge function and how it can be used to find if a point P overlaps a triangle. The edge function is computed for each edge of the triangle, and a second vector defined by the edge first vertex and another point P. If for all three edges, the function is positive, then point P overlaps the triangle.【使用edge function解决pixel是否在三角形内的问题】
  • Furthermore, we also learned that the result of the edge function can also be used to compute the barycentric coordinates of point P. These coordinates can be used to interpolate vertex data or vertex attribute across the surface of the triangle. They can be interpreted as weights for the various vertices. The most common vertex attribute are color, normal and texture coordinates.edge function也可以用来计算中心坐标等,作用于颜色,法线和纹理的处理】

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

软光 – the projection stage

Quick Review

 
 

In the previous chapter, we gave a high-level overview of the rasterization rendering technique. It can be decomposed into two main stages: first, the projection of the triangle’s vertices onto the canvas, then the rasterization of the triangle itself. Rasterization really means in this case, “breaking apart” the triangle’s shape into pixels or raster element squares; this is how pixels used to be called in the past. In this chapter, we will review the first step. We have already described this method in the two previous lessons, thus we won’t explain it here again. If you have any doubt about the principles behind perspective projection, check these lessons again. However, in this chapter, we will study a couple of new tricks related to projection that are going to be useful when we will get to the lesson on the perspective projection matrix. We will learn about a new method to remap the coordinates of the projected vertices from screen space to NDC space. We will also learn more about the role of the z-coordinate in the rasterization alogrithm and how it should be handled at the projection stage.【上一篇讲了整体思路,主要步骤。这一篇主要讲的是第一步骤的做法】

 
 

Keep in mind as already mentioned in the previous chapter, that the goal of the rasterization rendering technique is to solve the visibility or hidden surface problem, which is to determine with parts of a 3D object are visible and which parts are hidden.【时刻记住软光栅是用来做可见性判断的】

 
 

Projection: What Are We Trying to Solve?

 
 

What are we trying to solve here at that stage of the rasterization algorithm? As explained in the previous chapter, the principle of rasterization is to find if pixels in the image overlap triangles. To do so, we first need to project triangles onto the canvas, and then convert their coordinates from screen space to raster space. Pixels and triangles are then defined in the same space, which means that it becomes possible to compare their respective coordinates (we can check the coordinates of a given pixel against the raster-space coordinates of a triangle’s vertices).

The goal of this stage, is thus to convert the vertices making up triangles from camera space to raster space.

【我们这里要解决的是3D2D平面】

 
 

Projecting Vertices: Mind the Z-Coordinate!

 
 

In the previous two lessons, we mentioned that when we compute the raster coordinates of a 3D point what we really need in the end are its x- and y-coordinates (the position of the 3D point in the image). As a quick reminder, recall that these 2D coordinates are obtained by dividing the x and y coordinates of the 3D point in camera space, by the point respective z-coordinate (what we called the perspective divide), and then remapping the resulting 2D coordinates from screen space to NDC space and then NDC space to raster space. Keep in mind that because the image plane is positioned at the near clipping plane, we also need to multiply the x- and y-coordinate by the near clipping plane. Again, we explained this process in great detail in the previous two lessons.

 
 

Note that so far, we have been considering points in screen space as essentially 2D points (we didn’t need to use the points’ z-coordinate after the perspective divide). From now on though, we will declare points in screen-space, as 3D points and set their z-coordinate to the camera-space points’ z-coordinate as follow:

x,y坐标转换到2D空间,z值也需要处理,获得屏幕空间x,y,z如下】

 
 

 
 

It is best at this point to set the projected point z-coordinate to the inverse of the original point z-coordinate, which as you know by now, is negative. Dealing with positive z-coordinates will make everything simpler later on (but this is not mandatory).【注意z值取反】

 
 

Keeping track of the vertex z-coordinate in camera space is needed to solve the visibility problem. Understanding why is easier if you look at figure 1. Imagine two vertices v1 and v2 which when projected onto the canvas, have the same raster coordinates (as shown in figure 1). If we project v1 before v2 then v2 will be visible in the image when it should actually be v1 (v1 is clearly in front of v2). However if we store the z-coordinate of the vertices along with their 2D raster coordinates, we can use these coordinates to define which point is closest to camera independently of the order in which the vertices are projected (as shown in the code fragment below).【保持z值存储就是为了解决可见性判断的问题,方法就是与z-buffer里面的比较看遮挡关系,如图】

 
 

 
 

 
 

What we want to render though are triangles not vertices. So the question is, how does the method we just learned about apply to triangles? In short, we will use the triangle vertices coordinates to find the position of the point on the triangle that the pixel overlaps (and thus its z-coordinate). This idea is illustrated in figure 2. If a pixel overlaps two or more triangles, we should be able to compute the position of the points on the triangles that the pixel overlap, and use the z-coordinates of these points as we did with the vertices, to know which triangle is the closest to the camera. This method will be described in detail in chapter 4 (The Depth Buffer. Finding the Depth Value of a Sample by Interpolation).【如何比较Z值的问题,就是采用z-buffer方式,前面讲过,细节看第四章节】

 
 

Screen Space is Also Three-Dimensional

 
 

To summarize, to go from camera space to screen space (which is the process during which the perspective divide is happening), we need to:【总结一下】

  • Perform the perspective divide: that is dividing the point in camera space x- and y-coordinate by the point z-coordinate.xy值由坐标转换得到】
  • But also set the projected point z-coordinate to the original point z-coordinate (the point in camera space).【z值由原来的z得到】

 
 

Practically, this means that our projected point is not a 2D point anymore, but in fact a 3D point. Or to say it differently, that screen space is not two- by three-dimensional. In his thesis Ed-Catmull writes:【由上面可以看出screen space也是一个3D空间】

 
 

Screen-space is also three-dimensional, but the objects have undergone a perspective distortion so that an orthogonal projection of the object onto the x-y plane, would result in the expected perspective image (Ed-Catmull’s Thesis, 1974).

 
 

You should now be able to understand this quote. The process is also illustrated in figure 3. First the geometry vertices are defined in camera space (top image). Then, each vertex undergoes a perspective divide. That is, the vertex x- and y-coordinates are divided by its z-coordinate, but as mentioned before, we also set the resulting projected point z-coordinate to the inverse of the original vertex z-coordinate. This by the way, infers a change of direction in the z-axis of the screen space coordinate system. As you can see, the z-axis is now pointing inward rather than outward (middle image in figure 3). But the most important thing to notice, is that the resulting object is a deformed version of the original object but nonetheless a three-dimensional object. Furthermore what Ed-Catmull means when he writes “an orthogonal projection of the object onto the x-y plane, would result in the expected perspective image”, is that once the object is in screen space, if we trace lines perpendicular to the x-y image plane from the object to the canvas, then we get an perspective representation of that object (as shown in figure 4). This is an interesting observation because it means that the image creation process can be seen as a perspective projection followed by an orthographic projection. Don’t worry if you don’t understand clearly the difference between the perspective and orthographic projection. It is the topic of the next lesson. However, try to remember this observation, as it will become handy later.【如图三和四,展示的是投影】

 
 

 
 

Remapping Screen Space Coordinates to NDC Space

 
 

In the previous two lessons, we explained that once in screen space, the x- and y-coordinates of the projected points need to be remapped to NDC space. In the previous lessons, we also explained that in NDC space, points on the canvas had their x- and y-coordinates contained in the range [0,1]. In the GPU world though, coordinates in NDC space are contained in the range [-1,1]. Sadly, this is one of these conventions again, that we need to deal with. We could have kept the convention [0,1] but because GPUs are the reference when it comes to rasterization, it is best to stick to the way the term is defined in the GPU world.screen space 首先要转换到 NDC space,坐标范围可以是[0,1],GPU上面则通常为[-1, 1]】

 
 

Thus once the points have been converted from camera space to screen space, the next step is to remap them from the range [l,r] and [b,t] for the x- and y-coordinate respectively, to the range [0,1]. The term l, r, b, t here denotes the left, right, bottom and top coordinates of the canvas. By re-arranging the terms, we can easily find a equation that performs the remapping we want:pointscamera space to screen space,要做的就是xy坐标由[l, r][b, t]转换到[11],推导如下所示】

 
 

 
 

 
 

 
 

 
 

 
 

 
 

This is a very important equation because the red and green term of the equation in the middle of the formula will become coefficients of the perspective projection matrix. We will study this matrix in the next lesson. But for now, we will just apply this equation to remap the x-coordinate of a point in screen space to NDC space (any point that lies on the canvas has its coordinates contained in the range [-1.1] when defined in NDC space). If we apply the same reasoning to the y-coordinate we get:【上面中间部分直接就可以转换为坐标转换的矩阵参数,结果就是从屏幕空间坐标转换到了NDC space。同样的方法用到y轴结果如下】

 
 

 
 

 
 

Putting Things Together

 
 

At the end of this lesson, we now can perform the first stage of the rasterization algorithm which you can decompose into two steps:【这节讲的两个步骤如下】

 
 

  • Convert a point in camera space to screen space. It essentially projects a point onto the canvas, but keep in mind that we also need to store the original point z-coordinate. The point in screen-space is tree-dimensional and the z-coordinate will be useful to solve the visibility problem later on.camera space to screen space

  • We then convert the x- and y-coordinates of these points in screen space to NDC space using the following formulas:screen space to NDC space

 
 

From there, it is extremely simple to convert the coordinates to raster space. We just need to remap the x- and y-coordinates in NDC space to the range [0,1] and multiply the resulting number by the image width and height respectively (don’t forget that in raster space the y-axis goes down while in NDC space it goes up. Thus we need to change y’s direction during this remapping process). In code we get:

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

软光 – An Overview of the Rasterization Algorithm

各种教程:

  1. OpenGL https://github.com/ssloy/tinyrenderer/wiki
  2. OpenGL http://www.3dcpptutorials.sk/index.php?id=15
  3. Opengl ES 2.0 https://github.com/hmwill/GLESv20

 
 

我这里需要的软光栅是不需要OpenGLDirectX

 
 

Rasterization: a Practical Implementation

https://www.scratchapixel.com/lessons/3d-basic-rendering/rasterization-practical-implementation

 
 

代码:

https://www.scratchapixel.com/code.php?id=26&origin=/lessons/3d-basic-rendering/rasterization-practical-implementation

 
 

An Overview of the Rasterization Algorithm

 
 

The rasterization rendering technique is surely the most commonly used technique to render images of 3D scenes, and yet, that is probably the least understood and the least properly documented technique of all (especially compared to ray-tracing).【光栅化是3D图形渲染的基础技术,大家都需要了解】

 
 

Why this is so, depends on different factors. First, it’s a technique from the past. We don’t mean to say the technique is obsolete, quite the contrary, but that most of the techniques that are used to produce an image with this algorithm, were developed somewhere between the 1960s and the early 1980s. In the world of computer graphics, this is middle-ages and the knowledge about the papers in which these techniques were developed tends to be lost. Rasterization is also the technique used by GPUs to produce 3D graphics. Hardware technology changed a lot since GPUs were first invented, but the fondamental techniques they implement to produce images haven’t changed much since the early 1980s (the hardware changed, but the underlying pipeline by which an image is formed hasn’t). In fact these techniques are so fondamental and consequently so deeply integrated within the hardware architecture, that no one pays attention to them anymore (only people designing GPUs can tell what they really do, and this is far from being a trivial task; but designing a GPU and understanding the principle of the rasterization algorithm are two different things; thus explaining the latter should actually not be that hard!).【原因:有历史的技术不代表过时,光栅化同样是需要GPU来渲染图像,因为其技术直接被硬件实现了,所以大家的关注度不高,但确确实实是最基础的图形渲染技术】

 
 

Regardless, we thought it was urgent and important to correct this situation. With this lesson, we believe to be the first ressource that provides a clear and complete picture of the algorithm as well as a complete and full practical implementation of the technique. If you found in this lesson the answers you have been desperately looking for anywhere else, please consider making a donation! This work is provided to you for free and requires many hours of hard work.【这里提供光栅化的最完整的讲解】

 
 

Introduction

 
 

Rasterization and ray tracing try to solve the visibility or hidden surface problem but in a different order (the visibility problem was introduced in the lesson Rendering an Image of a 3D Scene, an Overview). Both algorithms have in common that they essentially use techniques from geometry to solve that problem. In this lesson, we will describe briefly how the rasterization (you can write rasterisation if you prefer UK English to US English) algorithm works. Understanding the principle is quite simple but implementing it requires to use a series of techniques notably from the field of geometry, that you will also find explained in this lesson.【光栅化和光线追踪都是从几何体信息出发,解决surface可见性的问题,但是order不一样,具体见后面的章节,这里主要是算法讲解。】

 
 

The program we will develop in this lesson to demonstrate how rasterization works in practice is important, because we will use it again in the next lessons to implement the ray-tracing algorithm as well. Having both algorithms implemented in the same program will allow us to more easily compare the output produced by the two rendering techniques (they should both produce the same result at least before shading is applied) and performances. It will be great way of better understanding the pros and cons of both algorithms.【这里的关注点是去了解光栅化是怎么工作的,但是我们同时会实现光线追踪,并比较其优劣性能和特点】

 
 

The Rasterization Algorithm

 
 

In fact there is not one but multiple rasterization algorithms, but to go straight to the point, let’s say that all these different algorithms though are based upon the same overall principle. In other words, all these algorithms are just variants of the same idea. It is this idea or principle, we will refer to when we actually speak of rasterization in this lesson.【算法有很多种,但是overall principle是一样的】

 
 

What is that idea? In the previous lessons, we already talked about the difference between rasterization and ray-tracing. We also suggested that the rendering process can essentially be decomposed into two main tasks: visibility and shading. Rasterization to say things quickly, is essentially a method to solve the visibility problem. Visibility consists of being able to tell which parts of 3D objects are visible to the camera. Some parts of these objects can be bidden because they are either outside the camera’s visible area or hidden by others objects.【渲染过程可以看作是两个主要步骤:visibility and shading,光栅化解决的是可视化的问题,就是相机可以看见哪些对象,看不见的隐藏。】

 
 

Solving this problem can be done in essentially two ways. You can either trace a ray through every pixel in the image to find out the distance between the camera and any object this ray intersects (if any). The object visible through that pixel is the object with the smallest intersection distance (generally denoted t). This is the technique used in ray tracing. Note that in this particular case, you create an image by looping over all pixels in the image, tracing a ray for each one of these pixels, and then finding out if these rays intersect any of the objects in the scene. In other words, the algorithm requires two main loops. The outer loop, iterates over the pixel in the image and the inner loop iterates over the objects in the scene:【解决visible的问题有两种做法:第一种是光线追踪,从相机位置向每一个像素位置射出一条线,找到其相交的对象做处理,这个算法需要两个loop,首先迭代每一个像素,然后迭代每一个object】

 
 

Note that in this example, the objects are actually considered to be made of triangles (and triangles only). Rather than iterating other the objects, we just consider the objects as a pool of triangles and iterate other the triangles instead. For reasons we have already explained in the previous lessons, the triangle is often used as the basic rendering primitive both in ray tracing and in rasterization (GPUs requires the geometry to be triangulated).【不管是光线追踪还是光栅化,三角形都作为是基础几何体来做】

 
 

Ray tracing is the first possible approach to solve the visibility problem. We say the technique is image centric because we shoot rays from the camera into the scene (we start from the image) as opposed to the other way around, which is the approach we will be using in rasterization.【我们把基于ray tracing的方法叫做image centric

 
 

Rasterization takes the opposite approach. To solve for visibility, it actually “projects” triangles onto the screen, in other words, we go from a 3D representation to a 2D representation of that triangle, using perspective projection. This can easily be done by projecting the vertices making up the triangle onto the screen (using perspective projection as we just explained). The next step in the algorithm is to use some technique to fill up all the pixels of the image that are covered by that 2D triangle. These two steps are illustrated in figure 2. From a technical point of view, they are very simple to perform. The projection steps only requires a perspective divide and a remapping of the resulting coordinates from image space to raster space, a process we already covered in the previous lessons. Finding out which pixels in the image the resulting triangles cover, is also very simple and will be described later.【第二种是光栅化,采用了和ray tracing相反的做法,其第一步是使用perspective projection将三角形投影到屏幕,也就是说用2D来表示三角形而不是3D。第二步是在这个屏幕上填充这个三角形覆盖的区域,见图二】

 
 

 
 

What does the algorithm look like compared to the ray tracing approach? First, note that rather than iterating over all the pixels in the image first, in rasterization, in the outer loop, we need to iterate over all the triangles in the scene. Then, in the inner loop, we iterate over all pixels in the image, and find out if the current pixel is “contained” within the “projected image” of the current triangle (figure 2). In other words, the inner and outer loops of the two algorithms are swapped.【这个算法也需要两个loop,首先迭代每一个object,然后迭代每一个像素】

 
 

This algorithm is object centric because we actually start from the geometry and walk our way back to the image as opposed to the approach used in ray tracing where we started from the image and walked our way back into the scene.【我们把基于光栅化的方法叫做object centric

 
 

Both algorithms are simple in their principle, but they differ slightly in their complexity when it comes to actually implementing them and finding solutions to the different problems they require to solve. In ray tracing, actually generating the rays is simple but finding the intersection of the ray with the geometry can reveal itself to be difficult (depending on the type of geometry you deal with) and is also potentially computationally expensive. But let’s ignore ray tracing for now. In the rasterization algorithm we need to project vertices onto the screen which is simple and fast, and we will see that the second step which requires to actually find out if a pixel is contained within the 2D representation of a triangle has an equally simple geometric solution. In other words, computing an image using the rasterization approach relies on two very simple and fast techniques (the perspective process and finding out if a pixel lies within a 2D triangle). Rasterization is a good example of an “elegant” algorithm. The techniques it relies on have simple solutions; they are also easy to implement and produce predictable results. For all these reasons, the algorithm is very well suited for the GPU and is actually the rendering technique applied by GPUs to generate images of 3D objects (it can also easily be run in parallel).【理论上两种方法框架类似,但是实现的时候的性能复杂度很不同。Ray tracing的时候去找物体的计算复杂度非常高,光栅化的时候投影的计算复杂度则低得多得多,而且容易实现,事实上已经被GPU硬件实现】

 
 

In summary:

  • Converting geometry to triangles makes the process simpler. If all primitives are converted to the triangle primitive, we can write fast and efficient functions to project triangles onto the screen and check if pixels lie within these 2D triangles【全部使用三角形来处理】
  • Rasterization is object centric. We project geometry onto the screen and determine their visibility by looping over all pixels in the image.object centric
  • It relies on mostly two techniques: projecting vertices onto the screen and finding out if a given pixel lies within a 2D triangle.【两步骤】
  • The rendering pipeline run on GPUs is based on the rasterization algorithm.【GPU采用的是光栅化技术】

 
 

Hopefully at this point of the lesson, you have understood the way the image of a 3D scene (made of triangles) is generated using the rasterization approach. Of course what we described so far is the simplest form of the algorithm. First it can be optimized greatly but furthermore we haven’t explained yet what happens when two triangles projected onto the screen overlap the same pixels in the image. When that happens, how do we define which one of these two (or more) triangles is visible to the camera. We will now answer these two questions.【辛运的是你已经了解了怎样从3D场景做光栅化的基本做法,下面我们讲如果两个三角形投影到平面会发生什么,怎么确定可见性,怎么优化】

 
 

 
 

Optimizing: 2D Triangles Bounding Box

 
 

The problem with the naive implementation of the rasterization algorithm we gave so far, is that it requires in the inner loop to iterate over all pixels in the image, even though only a small number of these pixels may be contained within the triangle (as shown in figure 3). Of course this depends on the size of the triangle in the screen. But considering we are not interested in rendering one triangle but an object made up of potentially from a few hundreds to a few millions triangles, it is unlikely that in a typical production example, these triangles will be very large in the image.【上面我们给出的基本光栅化做法,对于每一个object需要迭代图像中的每一个像素,即使是这个对象很小,只占非常少的像素点。】

 
 

There are different ways of minimising the number of tested pixels, but the most common one consists of computing the 2D bounding box of the projected triangle, and iterating over the pixels contained in that 2D bounding box rather than the pixels of the entire image. While some of these pixels might still lie outside the triangle, at least on average, it can already considerably improve the performance of the algorithm. This idea is illustrated in figure 3.【最常用的优化做法是计算三角形的2D包围盒,至迭代处理包围盒内的像素点】

 
 

 
 

 
 

Computing the 2D bounding box of a triangle is actually very simple. We just need to find the minimum and maximum x- and y-coordinates of the three vertices making up the triangle in raster space. This is illustrated in the following pseudo code:【计算2D包围盒只需要知道原来三个点的最大最小xy值,很简单】

 
 

 
 

Once we calculated the 2D bounding box of the triangle (in raster space), we just need to loop over the pixel defined by that box. But you need to be very careful about the way you convert the raster coordinates, which in our code are defined as floats rather than integers. First note that one or two vertices may be projected outside the boundaries of the canvas. Thus, their raster coordinates may be lower than 0 or greater than the image size. We solve this problem by clamping the pixel coordinates to the range [0, Image Width – 1] for the x coordinate, and [0, Image Height – 1] for the y coordinate. Furthermore we will need to round off the minimum and maximum coordinates of the bounding box to the nearest integer value (note that this works fine when we iterate over the pixels in the loop, because we initialize the variable to xmim or ymin and break from the loop when the variable x or y is lower or equal to xmas or ymax). All these tests needs to be applied before using the final fixed point (or integer) bounding box coordinates in the loop. Here is the pseudo-code:【有了2D包围盒,下面要做的就是遍历包围盒内的像素点。但是这里需要非常注意的是光栅化坐标位置,作者的实现使用的是float而不是int,范围X从是0到Width-1,Y从0到Height-1。你需要查找float坐标周围的所有int的范围】

 
 

 
 

 
 

The Image or Frame-Buffer

 
 

Our goal is to produce an image of the scene. We have two ways of visualizing the result of the program, either by displaying the rendered image directly to the screen or saving the image to disk, and using a program such as Photoshop to preview the image later on. But in both cases though, we somehow need to store the image that is being rendered while it’s being rendered and for that purpose, we use what we call in CG an image or frame-buffer.

It is nothing else than a two-dimensional array of colors that has the size of the image. Before the rendering process starts, the frame-buffer is created and the pixels are all set to black. At render time, when the triangles are rasterized, if a given pixel overlaps a given triangle, then we store the color of that triangle in the frame-buffer at that pixel location. When all triangles have been rasterized, the frame-buffer will contain the image of the scene. All that is left to do then is either displaying the content of the buffer to the screen, or save its content to a file. In this lesson, we will choose the latter option.【我们目标是用image展示场景,有两种visualize结果,一种是展示到显示器,一种是存到磁盘,其实不管哪种结果,在此之前结果是在frame buffer里面缓存的。初始化的时候,建立framebuffer并填黑,光栅化的时候,如果pixel与triangle overlap,则存储三角形的颜色到这个pixel,所有object都完成光栅化,则frame buffer存储的就是场景图像了,就可以输出存储或者显示了。】

 
 

 
 

When Two Triangles Overlap the Same Pixel: The Depth Buffer (or Z-Buffer)

 
 

Keep in mind that the goal of the rasterization algorithm is to solve the visibility problem. In order to display 3D objects, it is necessary to determine which surfaces are visible. In the early days of computer graphics two method were used to solve the “hidden surface” problem (the other name for the visibility problem): the Newell algorithm and the z-buffer. We only mention the Newell algorithm for historical reasons but we won’t study it in this lesson because it is actually not used anymore. We will only study the z-buffer method which is used by GPUs.【光栅化的作用是处理可见性问题,hidden surface 的问题现在的统一方法就只有 z-buffer 一种】

 
 

There is one last thing though that we need to do in order to get a basic rasterizer working. We need to account for the fact that more than one triangle may overlap the same pixel in the image (as shown in figure 5). When this happens, how do we decide which triangle is visible? The solution to this problem is actually very simple. We will use what we call a z-buffer which is also called a depth buffer, two terms that you may have heard or read about already quite often. A z-buffer is nothing more than another two-dimensional array which has the same dimension than the image, however rather than being an array of colors, it is simply an array of floating numbers. 【如果多个三角面投影到像素点发生重合,做法就是z-buffer解决】

 
 

Before we start rendering the image, we initialise each pixel in this array to a very large number. When a pixel overlaps a triangle, we also read the value stored in the z-buffer at that pixel location. As you maybe guessed, this array is used to store the distance from the camera to the nearest triangle that any pixel in the image overlaps. Since this value is initially set to infinity (or any very large number), then of course, the first time we find that a given pixel X overlaps a triangle T1, the distance from the camera to that triangle is necessarily lower than the value stored in the z-buffer. What we do then, is replace the value stored for that pixel with the distance to T1. Next, when the same pixel X is tested and that we find that it overlaps another triangle T2, we then compare the distance of the camera to this new triangle to the distance stored in the z-buffer (which at this point, stores to the distance to the first triangle T1). If this distance to the second triangle is lower than the distance to the first triangle, then T2 is visible and T1 is hidden by T2. Otherwise T1 is hidden by T2, and T2 is visible. In the first case, we update the value in the z-buffer with the distance to T2 and in the second case, the z-buffer doesn’t need to be updated since the first triangle T1 is still the closest triangle we found for that pixel so far. As you can see the z-buffer is used to store the distance of each pixel to the nearest object in the scence (we don’t really use the distance, but we will give the details further in the lesson). 【用最远值初始化z-buffer,然后一个三角形投影的时候,每一个overlap的想读位置都要去读取z-buffer的值,这个值存储的是当前这个像素最近面到相机的距离。如果新的triangle pixel值小于z-buffer,则替换,否则不动。】

 
 

In figure 5, we can see that the red triangle is behind the green triangle in 3D space. If we were to render the red triangle first, and the green triangle second, for a pixel that would overlap both triangles, we would have to store in the z-buffer at that pixel location, first a very large number (that happens when the z-buffer is initialized), then the distance to red triangle and then finally the distance to the green triangle.【图五所示】

 
 

You may wonder how we find the distance from the camera to the triangle. Let’s first look at an implementation of this algorithm in pseudo-code and we will come back to this point later (for now let’s just assume the function pixelContainedIn2DTriangle computes that distance for us):

【实现】

 
 

 
 

 
 

What’s Next?

 
 

Obviously this is only a very high-level description of the algorithm (figure 6) but this should hopefully already give you an idea of what we will need in the program in order to produce an image.【图6是总体结构描述】

 
 

 
 

We will need:

  • An image-buffer (a 2D array of colors),
  • A depth-buffer (a 2D array of floats),
  • Triangles (the geometry making up the scene),
  • A function to project vertices of the triangles onto the canvas,
  • A function to rasterize the projected triangles,
  • Some code to save the content of the image buffer to disk.

 
 

In the next chapter we will see how are coordinates converted from camera to raster space. The method is of course identical to the one we studied and presented in the previous lesson, however we will present a few more tricks along the way. In chapter three, we will actually learn how to rasterize triangles. In chapter four, we will study in details how the z-buffer algorithm works. As usual, we will conclude this lesson with a practical example.【下一章节讲述坐标转换】

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

Software Occlusion Culling – intel

https://software.intel.com/en-us/articles/software-occlusion-culling

https://software.intel.com/en-us/blogs/2013/03/22/software-occlusion-culling-update

https://software.intel.com/en-us/blogs/2013/09/06/software-occlusion-culling-update-2

 
 

Abstract:

The technique divides scene objects into occluders and occludees and culls occludees based on a depth comparison with the occluders that are software rasterized to the depth buffer. 【做法】

The sample code uses frustum culling and is optimized with Streaming SIMD Extensions (SSE) instruction set and multi-threading to achieve up to 8X performance speedup compared to a non-culled display of the sample scene.【好处】

 
 

Introduction:

 
 

If we can lower the computational cost of occluded objects, we can improve the application’s performance with no impact to the scene quality. 【把挡住的物体不送进GPU渲染,则可以提高性能】

Using the GPU-generated depth buffer on the CPU is one technique for early detection of occluded objects. However, doing occlusion culling in this manner may cause latency and overhead issues and introduce visual artifacts since the GPU may be several frames behind the CPU. To avoid these problems, we propose software rasterization of the depth buffer on the CPU.【GPU生成的depth buffer是一种方法,但是会影响GPU性能】

 
 

In this approach, we use the CPU to rasterize the depth buffer. Then, we use axis-aligned bounding box tests to determine if an object is occluded. Occluded objects are removed from the rendering pipeline to reduce overhead. We do not cull visible and partially visible objects; we submit them to the GPU for rendering. The sample code associated with the Software Occlusion Culling sample implements this technique. Additionally, in this sample, the software rasterizer is vectorized using SSE and multi-threaded to improve performance.【这里使用CPU来做,然后使用axis-aligned bounding box来测试剔除,剔除完提交GPU渲染】

 
 

Occluders: Objects that are sufficiently large to hide or occlude other objects in the scene.【遮挡其他物体的物体】

Occludees: Objects that are hidden, or occluded, by other objects in the scene.【被其他物体遮挡的物体】

 
 

Software Occlusion Culling has two steps: depth buffer rasterization and depth test culling. The next sections provide details about these steps.

 
 

Depth Buffer Rasterization

 
 

The occluders in the scene are rasterized to the depth buffer on the CPU. Figure 1 shows a screenshot of the Software Occlusion Culling sample. The castle walls and the ground plane are considered as occluders in the scene. The castle walls along with the attached tiny wooden trim and decorations are used as occluders to avoid special pre-processing of the art assets. Ideally, it would have been better to hand pick just the large castle walls as occluders, but we wanted to make sure that the software occlusion culling algorithm worked with a realistic scene without the need for expensive content changes.【墙可以挡住大部分的东西可以不画】

 
 

Our rasterizer divides the frame buffer into tiles and bins occluder triangles based on the intersection with the tiles. The rasterizer bins triangles into multiple tiles when the triangles cross tile boundaries. When processing a tile, a thread steps through the binned triangles and uses bounding box traversal to rasterize the triangles to the depth buffer.【这边是在说自动根据三角形获得包围盒?】 The rasterizer checks if the pixel under consideration is inside the triangle, and if so, it uses barycentric coordinates to interpolate the depth value at the pixel location. The depth buffer rasterization process updates the depth buffer at the pixel position if the newly computed depth is closer to the observer as compared to the already existing depth at the same pixel location.

 
 

Once the depth buffer has been rasterized on the CPU, all the occludees in the scene have to be depth tested against the depth buffer to determine which occludees are visible and which can be culled.【等rastered完成,测试被遮挡体是否被遮挡】

 
 

Depth Test Culling

 
 

We use object space axis-aligned bounding boxes (AABBs) to depth test the occludees in the scene against the CPU-generated depth buffer. The depth test algorithm treats all objects in the scene including the occluders (castle walls and ground plane) as occludees. Using AABB makes the depth tests more conservative. If the AABB is occluded, then the object contained inside it is also occluded and can be culled. If the AABB is visible, then the assumption is that the object contained inside might also be visible. However, as the bounding box is conservative, this assumption may not always be true, and we may have some false positives.【使用物体的AABB来测试】

 
 

To determine if an object is clipped by the near clip plane, we could use the homogeneous coordinate w, of the vertices that make up the bounding box. The near clip plane is set to 1.0 in the sample. Objects that are in front of the camera have a w> 1.0. So if any of the vertices that make up the bounding box has a w < 1.0, then the object is being clipped by the near clip plane.【近面裁剪】

 
 

However, we store 1/w instead of the homogenous coordinate w to avoid divide by w at multiple locations in the rasterizer.

 

So the bounding box of an occludee is being clipped if 1/w > 1.0 or 1/w < 0.0 for any vertex of the bounding box in which case we trivially accept the occludee as visible. 【使用1/W替代w来表示更容易处理】

 
 

后面讲的是demo的使用以及其他优化的加入,不详细说了。

 
 

 
 

找到了paper:Masked Software Occlusion Culling

http://fileadmin.cs.lth.se/graphics/research/papers/2016/culling/culling.pdf

 
 

Parallel Graphics in Frostbite –
Current & Future

这里面讲了寒霜怎么做的,手工建遮挡体

http://s09.idav.ucdavis.edu/talks/04-JAndersson-ParallelFrostbite-Siggraph09.pdf

同样的,battlefield3 也是手工建遮挡体

http://www.gamedevs.org/uploads/culling-the-battlefield-battlefield3.pdf

 
 

Umbra occlusion culling 也是手动处理出模型

Occlusion Culling in Alan Wake https://pdfs.semanticscholar.org/7420/c5455107bd4eddaab24f4b8736d8ab19d357.pdf

For solving visibility from the cameras point of view, Alan Wake used a hierarchical hardware based occlusion query algorithm [Bittner et al. 2004], [Guthe et al. 2006]. The advantages of using a hardware based culling algorithm outweighed the content pipeline issues associated with using a typical software occlusion system based on either precomputed potentially visible sets (PVS) or artist generated occlusion geometry and software rasterizer [Aila and Miettinen 2004]. Occlusion culling minimized the overdraw and the bottleneck shifted to shadow map generation.

 
 

GPU-Driven Rendering Pipelines

http://advances.realtimerendering.com/s2015/aaltonenhaar_siggraph2015_combined_final_footer_220dpi.pdf

这边的做法:模块化模型制作和渲染

 
 

Killzone 软光栅做法
https://www.guerrilla-games.com/read/practical-occlusion-culling-in-killzone-3

【这里的做法是使用手工调整大小的physics mesh

We didn’t want artists to hand-make occluders for the entire level, so we looked for an automatic solution. We experimented with using visual meshes, but the good polygon reduction proved to be difficult to get right. Physics mesh is much better choice. It’s available for each mesh in the game and it’s cleaned and sufficiently low polygon count. Unfortunately the physics meshes can be slightly larger than visual meshes causing objects to disappear. Worst offenders have to be fixed manually.

Here’s an example of occluders generated automatically from physics mesh. You can notice there’s some unnecessary detail, but in general the quality is pretty good

Even using physics mesh there was too much occluder geometry.

We needed to reject occluders that were unlikely to contribute much to the occlusion buffer. Our simple heuristics rejects all small objects or objects where the name suggests that they are not good occluders. We also reject meshes whose surface area suggests that they are thin or with too many holes. Artists can of course step in and override the heuristics or provide their custom occluders for difficult cases and optimization.

【physics mesh太多,我们需要筛选去掉一些:形状奇怪的,有洞的,小的都不要】

 
 

 
 

Solving Visibility and Streaming in the The Witcher 3: Wild Hunt with Umbra 3

https://www.slideshare.net/Umbra3/solving-visibility-and-streaming-in-the-the-witcher-3-wild-hunt-with-umbra-3

首先看 LOD0 的才作为遮挡体

 
 

这边似乎他们有工具生成数据,但到底是不是遮挡数据?我猜测是当前位置要传给用户的所有数据,包括LOD模型数据等等,就是用于流传输的数据,不是在说生遮挡体模型。

 
 

 
 

Umbra 分析

 
 

https://umbra3d.com/blog/overview-on-popular-occlusion-culling-techniques/

然后看了下
umbra3,也就是unity的做法,采用的是PVS,基于区域的预计算方法来做occlusion culling

Umbra pre-processes the scene offline, voxelizes it, then structures it into portals and cells to create a spatial database used at runtime. Queries can then be performed in real-time and involve the generation of a low-resolution hierarchical depth-buffer of the scene, performed by an highly-optimized software rasterizer.

 
 

It makes sense on a lot of levels and recently Intel also published another example of a purely CPU-based solution — similar to the Umbra technique, occluders are software-rasterized to create a hierarchical depth buffer.intel的实现可以看作是和umbra类似,采用了分层的深度缓冲器】

 
 

https://umbra3d.com/blog/occlusion-culling-solving-the-visibility-problem-part-ii/

Umbra’s culling system precomputes a simplified visibility database representing a 3D world. A fast runtime library then traverses the database in order to solve visibility for a view.umbra的方法是需要与计算获得一些信息,运行时使用这些数据做遮挡剔除】

We call these databases tomes. They consist of automatically placed portals and cells. These are common concepts in the world of visibility: a cell is a volume in 3D space, within which visibility is similar. The world is divided into cells. Portals are links between cells: openings in space leading to another cell that also have a shape. The difficult problem is finding good portals and cells in fast and practical fashion.【预计算做的是快速的自动场景体素化】

Umbra’s precomputation step, Umbra optimizer, solves this problem. It starts with voxelization – discretization of 3D space. The algorithm begins by generating small “raw” cells. These are then improved by combining them together into bigger final cells, so that loss in occlusion is minimal.【细分体素后,再重新组合形成最终的cell

Each raw cell is a continuous set of empty voxels not bigger than some user defined size. Portals are generated on open surfaces between these cells. User controls database accuracy by supplying the size for these raw cells. This “raw cell size” together with voxel size are two most important user controlled parameters in Umbra. They are publicly given more understandable names: “smallest occluder” (raw cell size) and “smallest hole” (voxel size). Controlling these parameters allows scaling the data for any environment and requirements. Several versions of data are generated for different levels of detail – occlusion near camera is more important than farther away.cell>=体素大小的块,cell的最大值则是用户可以控制的,这点要注意设置好】

Umbra’s runtime walks these cells linked by portals to figure out the visible volume. Visible objects are collected along the path. Compared to other kind of portal culling systems, Umbra generates a lot of portals. Runtime thus presents quite an engineering challenge: it needs to be fast and optimized on each platform to handle high portal count. Common method for portal culling is narrowing the view frustum to fit encountered portals. Umbra doesn’t do this – Umbra rasterizes the portals instead. In the end, a low-resolution depth buffer is generated.【运行时遍历cell之间的关系链收集到可见物体,如果遇到太多的物体,一般做法是缩小可见区域来删减需要处理的物体的量,umbra的做法则是采用更粗的粒度来参与剔除】

Some visibility approaches are based on hand made occluder models – a representation of what’s solid in the world. You could see Umbra as a method for modelling empty space instead. This approach has the benefit that all inaccuracy can always be countered by growing the empty space (i.e. portals). Sources of inaccuracy include voxelization and low rasterization resolution. Conversely this allows us make the runtime faster by using lower database and runtime resolutions, while always retaining error-free output.【一般做法是手动建遮挡体模型,umbra则是对空区域建模,核心思路就是标记不被遮挡的cell,看有哪些看得见的,看不见的不就不用管了,这样voxel越粗只会影响精度,不会导致错误】

My previous blog post discussed global nature of visibility. Umbra is divided into a local offline operation and a global runtime one. Unlike PVS systems, this allows fast reuse of partial results from previous precomputations – and thus fast local updates to the database.【回顾第一篇的内容,umbra将操作分为本地操作和全局操作,这方法的好处是可以重用预计算结果】

There’s also benefits in having a separate set of static data on which the runtime algorithm operates. It makes the system independent from anything else: it’s parallelizable and easier to integrate. An application can begin to execute Umbra’s runtime as soon as camera’s position is known for a frame.【还有的好处就是数据独立,可差值】

 
 

https://umbra3d.com/blog/occlusion-culling-solving-the-visibility-problem/

【这边没啥重要内容要看】

There’s an early and simple solution to the visibility problem aptly called the painter’s algorithm ( 画家算法
https://en.wikipedia.org/wiki/Painter’s_algorithm ). Like an old renaissance master, the painter’s algorithm has you drawing everything in back-to-front order. While the result looks correct, it causes pentimenti – overdraw that wastes computing cycles. Computer scientists knew how to solve the visibility problem in the 70s like the artists did in the 1700s. The challenge is in solving it in a fast and practical fashion. The art of doing so is called occlusion culling, and we’ve thought about it quite a bit here at Umbra.【画家算法,会导致很多的画的部分最终看不到,overdraw

Modern rendering uses z-buffers – sort of the same thing as painter’s algorithm but for each point instead of each surface. While that does solve the problem, it does that very late in the rendering pipeline. By the time we get around rejecting anything, we’ve wasted a lot of time processing things hiding behind walls. A good visibility solution should know what the hidden objects are very early on, so that we can avoid even sending them for rendering. Avoiding spending time on hidden objects allows for more budget with visible ones.【现代渲染采用zbuffer来考虑深度信息,通过这个让后面的不画,但是如果可以很早的知道哪些对象不画,可以在渲染阶段避免很多的查询计算量,提高渲染效率】

Of course you could solve visibility by doing manual work. And indeed, many occlusion culling systems are based on hand tuning portals or occluders. Less manual work means better workflow for anyone creating content for the system. The bad end of automation spectrum would require tagging all visibility manually, a huge task that will quickly become impractical. An ideal solution would be in the other end – not require manual work and also not place restrictions on content.【基于手动的遮挡剔除管用,但是我们希望这一步可以自动化】

It’s worth mentioning a couple obvious things. A visibility solution should also produce correct results. No visible object should be reported hidden. To guarantee visually correct result, any inaccuracy should be on the conservative side: reporting hidden objects visible. It should also be fast: too slow occlusion culling would negate the benefits.【不管怎样简化都一定要保证最终的渲染结果是正确的,宁可有些应该不可见的没被裁掉】

One big practical issue with visibility is that it is a global problem that involves everything – a small change in visibility can affect the other side of a 3D world. This must somehow be reflected in the computation we perform to solve it. Reducing something into a local problem would allow to divide and conquer: easy parallelization and reuse of partial results. A generic visibility solution doesn’t really lend itself to this. This also means amount of work performed by the algorithm is at least relative to visible volume. We can compute some parts beforehand offline however and some parts at runtime. This is a choice we do get to make, and it allows us to find some extent of localization.【把global问题转化为local的问题好处在于可以并行运算】

 
 

再补充些信息:umbra 3的PPT

https://www.slideshare.net/sampol1/nexon-developer-conference

This is the high level overview of the workflow.

From the original geometry we want to create a data structure that can be used for efficient visibility queries. And the whole process should be both fast and automatic. (and allow at least some amount of dynamic changes to the data)【预计算获得后面用于可见性查询的数据结构,同时支持数据变化】

In runtime visibility query the basic operation is figuring out which objects are visible from specific view points.

These are the requirements we had, we need to create a data structure and algorithm within these bounds.【设计数据结构和算法需要如下的支持】

  • For any input 3D scene, memory usage and processing time of the visibility system must be bounded and independent of the complexity of the input.【够快】
  • The whole process should be automatic.【全自动化】
  • It should allow fast content iteration. 【可写】
  • And it should set no limits on the size of the game worlds.【支持足够大的地图】

So what is visibility?

It’s the set of all the lines in space which do not intersect occluding geometry.

 
 

Creating data strucure for accurate visibility presentation from polygon soup is theoretically possible, but in practice the data strucure would be so large and so slow to compute it’s unusable.【基于可见几何体的数据结构,对于大场景太过庞大,没有实用性】

Instead we need to use conservative visibility, where we have bounded time and space and within these bounds give the best results we can.

Visibility is ultimately reasoning about space. Interesting thing here to note is that once we have a topological data structure like this it can be used for other things as well, not just visibility.

【可以基于空间时间放大数据结构节点单位,保守的处理可见性,这个数据结构还可能可以用于其他的地方】

This is the high level overview of the workflow.

From the original geometry we want to create a data structure that can be used for efficient visibility queries. And the whole process should be both fast and automatic. (and allow at least some amount of dynamic changes to the data)【我们需要根据可见对象预计算建立实用的可见性查询数据结构】

In runtime visibility query the basic operation is figuring out which objects are visible from specific view points.【运行时查询相关数据获得可见性】

Here you’re looking at top down view of an example scene.【场景顶视图】

The requirement was that there should be no manual markup or other requirements on the input geometry. So what we take as input is all the level geometry as it is.

So we really don’t have any other information besides the long list of polygons, which are possibly grouped into objects.

【输入数据没有手动标识的可见性数据,只有几何体数据】
Doing geometric operations with polygons directly has all kinds of difficulties related to floating point accuracy. Also, almost all real life game levels contain some small modeling errors such as t-vertices and cracks between objects. We need to be able to work with this kind of input as well.

【我们还需要考虑能够允许模型的一些错误,不会导致我们的做法行不通】

The approach we chose is to create a cell-and-portal graph by grouping the voxels together based on proximity and connectivity. Cells are created from groups of voxels. Portals are then created on the boundaries of these groups.We chose to create portals because in the past they have been proven to be an efficient way to represent visibility, and we solve the issues of manually placed portals by generating them automatically.

【自动化的通过链接和邻接关系将voxel组合获得cell】

In constrast to manually placed portals, we might generate thousands of portals which allows us to have accurate visibility in outdoor spaces as well.

By controlling the number of output cells and portals we can choose the output resolution of the visibility data so that it meets the memory and performance requirements.

【通过控制输出的cell量,我们可以控制可见性操作的粒度】

In addition to the cell and portal graph we need to be able to figure out in which cell the viewpoint is located when making a visibility query.【可见性查询的时候我们还需要知道viewport位于哪个cell】

We create a KD-tree from the voxels, which can be packed very efficiently with only a few bits per node. The leaf nodes of the tree contain indices to cells in the graph.【voxels建立KD-tree】

The final data stucture contains the cell-and-portal graph, the view tree, and a list of statis objects contained in each cell. 【最终的数据结构】

We call the output data structure a tome .. Tome of visibility .. It contains all the information about topological structure need for visibility determination in a 3D scene.

每个子区域一个Tome

 
 

然后我们来看怎么使用这个数据结构:

选择

合并

The basic operation is the visibility query, which uses the tome data to figure out what is visible from a specific view point inside the scene.【根据当前所在场景位置去找出可见性对象】

The goal is to create a depth buffer, or occlusion buffer, from the tome, which can be then used to very efficiently test the visibility of objects.

First step is to figure out in which cell the view point is located in. This is done by traversing the view tree based on the query origin. This is a simple tree traversal and it’s a fast operation.【第一步找出视点在哪个cell

There are several options on how we could use the portal data. We could do a traditional recursive portal traversal, clipping the view frustum as we go through each portal. Or we could do ray-tracing or cone-tracing in the graph.

The approach we choose is to rasterize the portals using a custom software rasterizer optimized for this purpose. With rasterizer we need to touch each portal only once, as opposed to recursive portal traversal which can suffer from exponential blowup if there’s a lot of portal intersections in the screenspace.

(We could also traverse other kind of queries for connectivity.)【通过软光栅来处理可见性,好处是每个portal只需要处理一次】

 
 

Also really useful property of the rasterizer is that it produces a depth buffer as output, which is almost optimal data structure for doing further visibility tests on the query output.

Also with rasterization we can choose the output resolution based on the platform and accuracy requirements.

Since we’re rasterizing portals instead of occluders, it’s trivial to implement conservative rasterization, which is a requirement for getting correct results in lower resolutions.

【光栅化输出深度信息,再用来可见性判断,portals instead of occluders 的好处是光栅化粒度可控】

After we have generated the depth buffer, testing objects for visibility is quite straightforward bounding box vs. depth buffer visibility test.

If we keep the depth buffer resolution small enough to fit into the CPU cache, this test is super-super fast, and also trivial to parallelize.

boundingboxdepth buffer比较做可见性判断】

 
 

继续补充umbra

https://gulu-dev.com/post/2016-02-01-occlusion-culling【别人的整理】

 
 

比较详细的早期umbra做法【2000

SurRender Umbra: A Visibility Determination Framework for Dynamic Environments

https://gulu-dev.com/post/2016-02-01-occlusion-culling

Efficient Algorithms for occlusion culling and shadows (Jan2005)

dPVS: An Occlusion Culling System for Massive Dynamic Environments (2004)

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

Occlusion Culling – oxel

全局讲述:

http://www.nickdarnell.com/oxel/

 
 

  1. Hierarchical Z-Buffer Occlusion Culling

http://www.nickdarnell.com/hierarchical-z-buffer-occlusion-culling/

 
 

Hierarchical Z-Buffer Culling Steps

  1. Bake step – Have your artists prepare occlusion geometry for things in the world that make sense as occluders, buildings, walls…etc. They should all be super cheap to render, boxes/planes. I actually ran across this paper, Geometric Simplification For Efficient Occlusion Culling In Urban Scenes ( http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.28.8189&rep=rep1&type=pdf ), that sounded like a neat way of automating the process.【生成遮挡体】
  2. CPU – Take all the occlusion meshes and frustum cull them.【遮挡体进入CPU
  3. GPU – Render the remaining occluders to a ‘depth buffer’. The depth buffer should not be full sized, in my code I’m using 512×256. There is a Frostbite paper that mentions using a 256×114-ish sized buffer for a similar solution to do occlusion culling. The ‘depth buffer’ should just be mip 0 in a full mip chain of render targets (not the actual depth buffer).【渲染depth buffer,不需要全尺寸】
  4. GPU – Now downsample the RT containing depth information filling out the entire mipchain. You’ll do this by rendering a fullscreen effect with a pixel shader taking the last level of the mip chain and down sampling it into the next, preserving the highest depth value in a sample group of 4 pixels. For DX11, you can just constrain the shader resource view so that you can both read and render from the same mip chain. For DX9 you’ll have to use StretchRect to copy from a second mip chain, since you can’t sample and render to the same mip chain in DX9. In my code I actually found a more optimized solution, by ping-ponging between 2 mip chains one containing even and the other odd levels, and a single branch in your shader code you can get around the overhead of having to do the StretchRect and just sample from a different mip chain based on the even/odd mip level you need.【对RT降采样获取深度信息】
  5. CPU – Gather all the bounding spheres for everything in your level that could possibly be visible.【收集所有的包围盒】
  6. GPU – DX11 send the list of bounds to a compute shader, which computes the screen space width of the sphere then uses the width to compute the mip level to sample from the HiZ map generated in step 4, such that the sphere covers no more than 2 pixels wide. So large objects in screen space will sample from very high values in the mip chain since they require a coarse view of the world. Whereas small objects in screen space will sample from very low values in the mip chain. In DX9 the process is basically the same, the only difference is that you’ll render a point list of vertices, that instead of a Float3 position are Float4 bounds (xyz = position, w = radius). You’ll also send down a stream of texcoords that will represent x/y pixel values of where to encode the results of the occlusion test for that bound. Instead of a compute shader you’ll process the vertices using a vertex shader, you’ll also need to use the pixel location provided in the texcoord stream to make sure the results of the test are written out to that point in a render target, and in a pixel shader you’ll need to do the sampling to test to see if it’s visible, and output a color like white for culled, black for visible.【将所有的包围盒,以及第四步生成的HiZ Map送入Compute Shader,来比较判断遮挡与否】
  7. CPU – Try to do some work on the CPU after the occluder rendering and culling process is kicked off, for me the entire process took about 0.74 ms of GPU time on a Radeon 5450, with 900 bounds. The overhead of generating the HiZ mip chain and dispatching the culling process is the real bottleneck though, there’s little difference between 900 bounds and 10,000 bounds.occluder rendering and culling 过程中的CPU工作开销】
  8. CPU – Read back the results. DX11 you’re just reading back a buffer output by a compute shader. For DX9 you’ll have to copy the render target rendered to in step 6 containing the checker pattern of black and white pixels and then iterate over the pixels on the CPU to know what is visible and what is hidden.【读取结果】

     
     

Hierarchical Z-Buffer Downsampling Code

 
 

 
 

Hierarchical Z-Buffer Culling Code:

这里列出一下主要的函数实现,完整的见原网页。

 
 

数据结构

点到平面距离:

球裁剪条件:

后面是主函数,不截图了。

 
 

例子部分:

红色表示遮挡的,白色表示显示的,方向键移动相机。

 
 

 
 

  1. Hierarchical Z-Buffer Occlusion Culling – Shadows

http://www.nickdarnell.com/hierarchical-z-buffer-occlusion-culling-shadows/

 
 

目的就是通过Z-buffer裁剪阴影,见paper:CC Shadow Volumes

http://gamma.cs.unc.edu/CCSV/ccsvEGSR04.pdf

 
 

步骤:

  1. Create a HZB from the point of view of the player.【从玩家视点创建HZB
  2. Create a HZB from the point of view of the shadow casting light.【从光源处创建HZB
  3. Cull the shadow casting objects using the lights HZB. 1. If the object is culled, mark it as culled.【裁剪objects
  4. If the object isn’t culled, we need to compute a rough approximation of the extents of shadow volume. You can think of this volume as being the minimum depth points on your bounding box (A-D) and the maximum depth sampled from where those points project onto the light’s HZB (E-H) (Figure 1). Now that you have a bounding volume for the shadow, you can transform that volume into the camera space of the player, and test it against the player’s HZB; just as if the extents were any other object you were attempting to cull with the HZB (Figure 2). If you can’t see the volume, you can write out that the caster is culled, otherwise write it as visible.【如果object没有被裁减,才需要计算shadow volume

 
 

 
 

  1. Hierarchical Z-Buffer Occlusion Culling – Generating Occlusion Volumes

 
 

occlusion volume 特征:

  1. Conservativeness – Doesn’t extend beyond the surface of the mesh
  2. Simplicity – The occlusion volume is made of very few triangles or is fast to render
  3. Volume Conservation – Closely matches the original mesh’s volume
  4. Dynamic – Some games have large movable occluders or destroyable walls

 
 

Normal methods of simplifying a mesh such as typical triangle simplification causes problems in both the area of conservativeness and volume conservation. In some cases you can use the physics collision mesh if available. One thing to be aware of is that when the physics volume is larger than the visual mesh it can cause false occlusions leading to things popping into view. Also not every mesh needs a physics volume nor are physics volumes always meshes. Will Vale’s talk from GDC 2011 covered using physics collision meshes pretty thoroughly for HZB-OC, you should check it out.【直接使用碰撞体会导致错误,原因是碰撞体一般都比对应的可见体大】

 
 

技术方法步骤:

 
 

  1. Find all the voxels completely inside a mesh

 
 

The voxelization algorithms I ended up using comes from this paper Fast Parallel Surface and Solid Voxelization on GPUs [Schwarz 2010]. ( http://research.michael-schwarz.com/publ/files/vox-siga10.pdf ) I used the 3.1 section for surface voxelization and the 4.1 section for solid voxelization. Both are required because a voxel is considered part of the solid volume if the center of the voxel is inside the mesh. We want the entire voxel to be inside the mesh though. So you have to remove the excess voxels from the solid set using the surface set.【方法来源】

 
 

 
 

My first implementation was in C++ and entirely on the CPU which took about 20 seconds to run on a mesh with ~10,000 triangles and a 503 voxel volume. In the version I’m presenting here I moved to C#, when I did that time went up to about 60 seconds. So I ended up porting a similar version of the CPU implementation to the GPU using OpenCL (Cloo https://sourceforge.net/projects/cloo/ ), on the same data it ran in about 6 seconds on my Nvidia 540M, with no attempt to optimize.CPU,GPU性能差异】

 
 

One unfortunate limitation of Schwarz’s solid voxelization algorithm it that it requires watertight meshes. The reason for this is you’re shooting a ray down a column of voxels until you intersect with a triangle. You’re then xor’ing the bit for each voxel signaling that this column of voxel’s is inside or outside. Voxels that are inside a mesh will be Xor’ed an odd number of times, where as outside voxels will be xored an even number of times. So if there is a hole in the mesh, you’d leave a incomplete trail of xor’s in that column, leaving every voxel from the last triangle to the bounding box marked as ‘inside’.【算法的问题】

 
 

Two papers I’ve run across that might contain solutions to this problem are An Automatic Hole-Filling Algorithm for Polygon Meshes and Complete Polygonal Scene Voxelization.【问题的解决方法】

 
 

  1. Find the voxel at the densest point in the volume

 
 

 
 

  1. Expand a box from this point until all sides collide with the mesh surface or another box

 
 

Once you’ve found the densest voxel you’re going to create a 1^3 voxel sized box at that location. You’ll then proceed to iteratively expand each side of the box in voxel space until you can’t expand any side of the box without entering an empty voxel or another box that has already been placed.【从最密的层向上找到最大的box】

 
 

  1. Repeat 2-3 until you’ve consumed X% of the total volume

 
 

 
 

  1. Filter small or useless boxes (Unimplemented)

 
 

 
 

  1. Use a Constructive Solid Geometry (CSG) algorithm to merge the boxes you create

 
 

 
 

 
 

  1. Oxel: A Tool for Occluder Generation

 
 

最终的工程: http://www.nickdarnell.com/oxel-a-tool-for-occluder-generation/

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

Android Terrain Rendering

从这篇开始讲起:

http://galfar.vevb.net/wp/2013/android-terrain-rendering-vertex-texture-fetch-part-1/

 
 

To my surprise, I found out that GPU (PowerVR SGX 540) in my venerable Nexus S (2010) supports vertex texture fetch (VTF). That is, accessing texture pixels in vertex shader — a very useful feature for terrain rendering. About a year ago, when I started investigating terrain rendering on Android devices, I did some searching for VTF support on Android devices and figured out it’s not there yet (similar to situation years ago when desktop OpenGL 2.0 was released with support for texture sampling in GLSL verter shaders but most GL implementation just reported GL_MAX_VERTEX_TEXTURE_IMAGE_UNITS to be zero). Now I don’t know how I missed it on my own phone, maybe there was some Android update with updated GPU drivers during last year? I have no idea how many other devices support it now. Hopefully, newest ones with OpenGL ES 3 support it all. I wouldn’t be surprised if among GLES 2 devices only PowerVR + Android 4+ combinations supported it.

opengl es 3.0 全面支持 vertex texture fetch(VTF)】

 
 

Overview

 
 

Anyway, let’s focus on terrain rendering – rough outline:

 
 

  • Put the entire height-map into a texture. 【准备好高度图纹理】
  • Have a small 2D grid (say 16×16 or 32×32) geometry ready for rendering terrain tiles.【准备好2D网格】
  • Build a quad tree over over the terrain. Root node covers entire terrain and each child then covers 1/4th of parent’s area.【建地形四叉树】
  • Now we can start rendering, do this every frame: 【渲染】
    • Traverse the quadtree starting from the root.【遍历四叉树】
    • For each child node test if the geometry grid provides sufficient detail for rendering the area covered by this node:【检查每一个节点,判断是否属于渲染区域】
      • YES it does, mark this node for rendering and end traversal of this subtree.【渲染节点,添加到子树】
      • NO it does not, continue traversal and test children of this node (unless we’re at leaf already).【继续遍历子节点】
    • Now take this list of marked nodes and render them. The same 2D grid is used to render each tile: it’s scaled according to tile’s covered area and its vertices are displaced by height values read from texture.【获得所有的标记节点并渲染他们】

       
       

This is basically what I originally wanted for Multilevel Geomipmapping years ago but couldn’t do in the end because of the state of VTF support on desktop at that time.【这是最初的想法,但是迫于VTF支持而一开始没有实现】

 
 

So what exactly is the benefit of VTF over let’s say geomipmapping here?【VTF能带来什么好处】

 
 

Main benefit is ability to get height of the terrain at any position (and multiple times) when processing each tile vertex. In traditional geomipmapping, even if you can move tile vertices around it’s no use since you have only one fixed height value available. With VTF, you can modify tile position of vertex as you like and still be able to get correct height value. 【好处就是可以很容易的获取和修改每一个节点的高度】

This greatly simplifies tasks like connecting neighboring tiles with different levels of detail. No ugly skirts or special stitching stripes of geometry are needed as you can simply move edge points around in the vertex shader. Also geomorphing solutions begin to be usable without much work. And you can display larger terrains as well. With geomipmapping you always have to draw fixed number of tiles (visible leaves) — a number that goes up fast when you enlarge the terrain. VTF may allow you draw fixed number of tiles — regardless of actual terrain size (as distant tiles cover much larger area compared to geomipmap tiles with fixed area). Another one, terrain normals can be calculated inside the shaders from neighboring height values.

【很好的支持大地图的LOD

Finally, since heightmap is now a regular texture you get filtering and access to compressed texture formats to stuff more of the terrain to the memory.

【还支持压缩】

 
 

 
 

然后看:

http://galfar.vevb.net/wp/2012/porting-glsoar-to-android-and-opengl-es-part-1/

http://galfar.vevb.net/wp/2012/porting-glsoar-to-android-and-opengl-es-part-2/

 
 

A few weeks ago I had some sort of a urge to try game and/or graphics development on Android. I’ve been doing some Android application development lately so I had all the needed tools setup already. However, what I wanted was some cross-platform engine or framework so that I could develop mostly on desktop. Working with emulators and physical devices is tolerable when doing regular app development and of course you want to use UI framework native for the platform. For games or other hardware stressing apps I really wouldn’t like to work like that. Emulator is useless as it doesn’t support OpenGL ES 2.0 and is generally very slow. And it’s no pleasure with physical device either, grabbing it all the time off the table, apk upload and installation takes time, etc. So I started looking for some solutions. 【手机平台的图形渲染太太太慢了】

 
 

The Search

 
 

Initially, I considered using something like Oxynege. It uses Object Pascal syntax, can compile to .NET bytecode and also Java bytecode (and use .NET and Java libs). However, I would still need to use some framework for graphics, input, etc. so I would be locked in .NET or Java world anyway. I’m somewhat inclined to pick up Oxygene in the future for something (there’s also native OSX and iOS target in the works) but not right now. Also, it’s certainly not cheap for just a few test programs. Then there is Xamarin – Mono for Android and iOS (MonoGame should work here?) but even more expensive. When you look Android game engines and frameworks specifically you end up with things like AndEngine, Cocos2D-X, and libGDX. Oxynege 功能不够】

http://www.elementscompiler.com/elements/oxygene/default.aspx

 
 

After a little bit of research, I have settled for libGDX. It is a Java framework with some native parts (for tasks where JVM performance maybe inadequate) currently targeting desktop (Win/OSX/Linux), Android, and HTML5 (Javascript + WebGL in fact). Graphics are based on OpenGL (desktop) and OpenGL ES (Android and web). Great thing is that you can do most of the development on desktop Java version with faster debugging and hot swap code. One can also use OpenGL commands directly which is a must (at least for me).【libGDX 基于 opengl 的跨平台游戏开发框架】

 
 

The Test

 
 

I wanted to do some test program first and I decided to convert glSOAR terrain renderer to Java and OpenGL ES (GLES). The original is written in Object Pascal using desktop OpenGL. The plan was to convert it to Java and OpenGL ES with as little modifications as possible. At least for 3D graphics, libGDX is relatively thin layer on top of GLES. You have some support classes like texture, camera, vertex buffer, etc. but you still need to know what’s a projection matrix, bind resources before rendering, etc. Following is the listing of a few ideas, tasks, problems, etc. during the conversion (focusing on Java and libGDX).【原来写的SOAR是基于Opengl的桌面地形渲染工具,这里的目标是修改它使得支持移动端】

 
 

PROJECT SETUP

 
 

Using Eclipse, I created the projects structure for libGDX based projects. It is quite neat actually, you have a main Java project with all the shared platform independent code and then one project for each platform (desktop, Android, …). Each platform project has just a few lines of starter code that instantiates your shared main project for the respective platform (of course you can do more here).【工程设置】

 
 

Two problems here though. Firstly, changes in main project won’t trigger rebuild of the Android project so you have to trigger it manually (like by adding/deleting empty line in Android project code) before running on Android. This is actually a bug in ADT Eclipse plugin version 20 so hopefully it will be fixed for v21 (you star this issue).【问题1:编译调试不方便】

 
 

Second issue is asset management but that is easy to fix. I want to use the same textures and other assets in version for each platform so they should be located in some directory shared between all projects (like inside main project, or few dir levels up). The thing is that for Android all assets are required to be located in assets subdirectory of Android project. Recommended solution for libGDX here is to store the assets in Android project and create links (link folder or link source) in Eclipse for desktop project pointing to Android assets. I didn’t really like that. I want my stuff to be where I want it to be so I created some file system links instead. I used mklink command in Windows to create junctions (as Total Commander wouldn’t follow symlinks properly):【问题2asset management 不方便】

 
 

d:\Projects\Tests\glSoarJava> mklink /J glSoarAndroid\assets\data Data

Junction created for glSoarAndroid\assets\data <> Data

d:\Projects\Tests\glSoarJava> mklink /J glSoarDesktop\data Data

Junction created for glSoarDesktop\data <> Data

 
 

Now I have shared Data folder at the same level as project folders. In future though, I guess there will be some platform specific assets needed (like different texture compressions, sigh).

 
 

JAVA

 
 

Too bad Java does not have any user defined value types (like struct in C#). I did split TVertex type into three float arrays. One for 3D position (at least value type vectors would be nice, vertices[idx * 3 + 1] instead of vertices[idx].y now, which is more readable?) and rest for SOAR LOD related parameters. Making TVertex a class and spawning millions of instances didn’t seem like a good idea even before I began. It would be impossible to pass positions to OpenGL anyway.【自定义数据类型用于SOAR LOD相关参数】

 
 

Things like data for VBOs, vertex arrays, etc. are passed to OpenGL in Java using descendants of Buffer (direct) class. Even stuff like generating IDs with glGen[Textures|Buffers|…] and well basically everything that takes pointer-to-memory parameters. Of course, it makes sense in environment where you cannot just touch the memory as you want. Still it is kind of annoying for someone not used to that. At least libGDX comes with some buffer utils, including fast copying trough native JNI code.【变量的存储】

 
 

PERFORMANCE

 
 

Roots of SOAR terrain rendering method are quite old today and come from the times when it was okay to spend CPU time to limit the number of triangles sent to GPUs as much as possible. That has been a complete opposite of the situation on PCs for a long time (if did not have old integrated GPU that is). I guess today that is true for hardware in mobile devices as well (at least the new ones). And there will also be some Java/Dalvik overhead…【原有的SOAR性能不行】

 
 

Anyway, this is just an exercise so it the end result may very well be useless and the experience gained during development is what counts.

 
 

【后面第二部分不再翻译下去了,讲的都是实现的时候遇到的一些小问题,不是算法和架构本身】

 
 

 
 

Implementation

 
 

I have added experimental VTF based terrain renderer to Terrain Rendering Demo for Android ( http://galfar.vevb.net/wp/2013/terrain-rendering-demo-for-android/
直接接下去看) testbed and it looks promising. Stitching of the tiles works flawlessly. More work is needed on selecting nodes for rendering (render node or split to children?). Currently, there’s only simple distance based metric but I want to devise something that takes classical “screen-space error” into account. And maybe some fiddling with geomorphing on top …【直接看demo】

 
 

https://bitbucket.org/galfar/tests-glterrainjava

 
 

 
 

Terrain Rendering Demo for Android

 
 

I finally got around to releasing Android terrain rendering demo I’ve been working on last few months (a few moments here and there). I did the bulk of work in November 2012, partly described in posts Porting glSOAR to Android and OpenGL ES, Part 1 and Porting glSOAR to Android and OpenGL ES, Part 2 – third part is still just a draft 【最终完成了这个地形demo】

 
 

Anyway, here’s the current version which supports Geomipmapping and SOAR terrain rendering methods. Gritty details about the internals will follow in some future post. There is also a nearly identical desktop version for reference, advantage of using LibGDX for this.【这个地形demo支持Geomipmapping和SOAR地形渲染】

 
 

 
 

【下载安装见原文】

http://galfar.vevb.net/wp/2013/terrain-rendering-demo-for-android/

 
 

Controls【这部分讲demo操作,不赘述】

 
 

【后面的讲demo用法,也不细说了,见原文】

 
 

 
 

 
 

SOAR

 
 

glSOAR is ObjectPascal implementation of SOAR terrain rendering algorithm using OpenGL. It can be compiled using FPC or Delphi compiler and works in Windows and Linux (only benchmarking mode). No fancy OpenGL extensions are required, it should work with any OpenGL 1.2 and newer drivers.glSOAR是SOAR算法的Opengl实现】

 
 

SOAR algorithm was published by Peter Lindstrom and Valerio Pascucci in 2001. Papers and original C implementation can be found at https://computation.llnl.gov/casc/SOAR/SOAR.html. SOAR starts with coarse terrain mesh each frame and uses view-dependent refinement (standard longest edge bisection) to calculate mesh to be rendered – stress is on the CPU.【SOAR的基本思想就是基于视点的地形渲染优化】

 
 

【下载见源文件】

http://galfar.vevb.net/wp/projects/terrain-rendering/soar/

 
 

 
 

 
 

【SOAR官网】

https://computation.llnl.gov/casc/SOAR/

 
 

 
 

Terrain Simplification Simplified: A General Framework for View-Dependent Out-of-Core Visualization

 
 

Abstract—This paper describes a general framework for out-of-core rendering and management of massive terrain surfaces. The two key components of this framework are: view-dependent refinement of the terrain mesh; and a simple scheme for organizing the terrain data to improve coherence and reduce the number of paging events from external storage to main memory.【两大关键做法:基于视点的细化,数据存储优化】

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

Phase-Functioned Neural Networks for Character Control

理论部分:

 
 

http://theorangeduck.com/page/phase-functioned-neural-networks-character-control

 
 

视频内容:

讲者背景:角色动画机器学习方向的博士毕业,

动画系统,中间的就是黑盒,就是动画系统

用户输入动画系统的是按钮的指向信息,如图所示的平面的指向,是比较高级的指令,比如我们想走向哪个方向。

动画系统则像是巨大的状态机,不同的状态对应不同的动画,之间会有动画混合。这个复杂的系统直接写代码比较复杂,很难维护,

因此我们重新考虑能不能用一个简单的算法来实现这个复杂的动画交互系统。

我们希望动画系统的输入输出就变成一些参数而已。

我们再来看原来的复杂的动画系统,如果把输入输出看作是动画模型的参数,那么也是可以做到的,就像是在一个巨大的数据库里面挑东西输出。

因此我们希望做到的第二点就是直接输出下一个pose

当然可以,基本思想就是把动画系统当作是黑盒,我们给一些输入就有想要的输出,后面具体讲做法。

输入x:Trajectory Positions, Directions, Heights;Previous Joint Position, Velocities…

输出y:各关节的transform信息

要做训练我们首先需要数据,采用的是动作捕捉,每一段约十分钟,一共约两小时的非结构化的数据。

高度匹配的问题,我们捕捉的数据默认脚就是完全贴合地面的,以此来训练,我们是了大量的各种不同的地面来获得相关的数据。

然后我们来看怎么学习

一个神经网络事实上就是一个函数,

比如这个函数就可以让输入得到相应的输出

我们的函数的输入输出如图所示

而这个可以量化为vectors作为输入输出

如果是单层的最简单的NN举个例子可以像是这样,wb这两个参数就是我们需要学习得到的结果。

输入就是已知的xy

输出就是学习得到的结果wb,如这里所示

最终我们得到的这个函数叫做行为函数

这里可能涉及各种不同的函数,比如这个是非线性混合函数

这两个就是很类似的

如果是深度学习的多层的函数,其表现就是这样

这个例子就是一个三层的神经网络公式

训练的做法就是每次输入值,然后跟据得到的结果衡量错误,然后在调整网络的参数,这就是基本的思路

我们采用了GPU运算节省时间

Phase-functioned NN意思就是我们采用了一种特殊的NN方法,对于不同的motion采用不同的网络权重,避免混合motions,细节请看paper

这是我们最终获得的,简单的动画驱动模型来替代state machineg和blend tree

然后展示demo

性能展示

结论

 
 

首先完整看一遍PPT:

SIGGRAPH上面的演讲PPT

目标:游戏中的角色控制做到快速紧凑的表现力

最终结果展示

第一部分背景

前人的工作存在的可改进之处:

  1. 需要将整个数据库全存放于内存
  2. 需要手动处理数据
  3. 需要一些复杂的加速方法

NN可以带来什么帮助呢?

  1. 虚拟的无限制的数据容量(任意动作)
  2. 快速的实时的低内存使用

但是怎么来生成动作呢?

CNN:学习用户角色控制信号与角色行为的关系

demo

问题是什么?

存在歧义,会发生相同的输入得到不同的角色行为结果

实际上:

  1. 需要特殊处理解决掉歧义
  2. 一开始需要提供所有的输入轨迹情况
  3. 多层CNN对于游戏来讲还是太慢了

RNN:学习从前一帧到后一帧的对应关系

demo

RNN结果质量:

  1. 只能坚持10
  2. 无法避免漂浮
  3. 无法避免歧义

总结我们面对的问题:

  1. 我们怎么去处理大规模的数据
  2. 我们怎么解决歧义的问题
  3. 我们怎么样让生成的结果看上去不错

数据捕捉部分

非结构化的数据捕捉,一共补了差不多两小时的数据,每一段十分钟左右,摆放了很多桌子椅子来模拟复杂地形,使得尽量包含各种复杂的情况

demo

demo

地形匹配

  1. 我们希望地形数据和运动数据一起加入学习
  2. 但是同时捕捉运动和地形数据是比较麻烦的
  3. 制作一个高度图数据库,然后让一段运动匹配高度图中的一块

例子

参数设置:

  1. 最终效果不错
  2. 角色轨迹采用窗口模式
  3. 加上了步态,地形高度等信息

神经网络部分

PFNN:一个phase函数看作是权重的NN

phase是0-2pi的标量,表示的是当前locomotion cycle下当前的pose

图示:输入是当前帧pose,输出是下一帧pose,NN里面的参数是phase function

demo

特征:前回馈NN,有两个隐藏层,每层有512个影藏单元,ELU驱动函数

输出是NN的权重,循环三次方函数差值四个控制点,每个控制点由一组NN权重组成。

训练算法:

  1. 输入phase生成权重
  2. 使用权重和输入值到nn得到输出值
  3. 衡量输出错误
  4. 反向传播nn和phase函数更新控制点的值

结果

demo

结论

phase函数预计算:因为这个函数的计算对于游戏来说是比较耗时的

  1. 控制数值范围 0-2pi,在这个范围内可以预计算
  2. 运行时对于预计算的结果做差值
  3. 得到速度和内存之间的平衡

性能参数

缺点:

  1. 模型的训练时间非常耗时
  2. 对于美术的编辑和修改,不能直接得到正反馈
  3. 很难预测结果,有问题也不能直接知道为什么

优点:

  1. NN很容易处理大量的数据,可以得到万般种结果
  2. 语义分解解决了歧义的问题
  3. 简单的结构和参数化的使用方式很容易控制质量

 
 

 
 

实践部分两步走:

  1. 先看demo怎么实现的
  2. 再看network怎么处理的

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

AIAnimation使用代码分析

历尽艰难跑起来了,问题是没有办法操作,猜测Ubuntu和windows的操作代码不兼容,代码分析部分的主要目标之一就是把操作改掉:

 
 

代码结构分析

一切从main函数开始:

  • 首先是SDL初始化,Simple DirectMedia Layer(SDL) is a cross-platform development library designed to provide low level access to audio, keyboard, mouse, joystick, and graphics hardware via OpenGL and Direct3D. 可以说是一个常用的跨平台opengl支持库。

    这边就是想到应该不会是操作兼容性的问题导致不能动,一看代码原来是只支持手柄。

    可惜的是,windows平台上面跑起来好卡,感觉帧率小于10帧!

  • GLEW初始化

  • 资源加载

    • Options 里面定义的是一些我们可以修改的设置选项:

    • CameraOrbit 则是相机初始化设置

    • LightDirectional 场景光照初始化设置

      这里就是一堆GL设置

    • Character 角色初始化设置

      这部分是这里的角色的定义信息和加载,这部分很重要!

      首先我们来看一下存储的角色数据的文件包涵下面四个,坑爹的采用了二进制存储:

      顶点,三角形,父子关系,xforms信息分别四个文件。

      读文件的函数这边也很简单,将数据直接存进对应的容器,角色数据结构信息如下:

      这部分最后就是一个前向动力学的实现,这个很简单,就是子类跟着父类运动。

    • Trajectory 路径初始化设置

      这里就是定义了运动路径的数据结构,如下所示:

    • IK 初始化设置

      Ik数据结构如下所示:

      这里还提供了一个two_joint函数,这个后面用到再讲,因为暂时也看不出来其功能。

    • Shader 相关

      这部分就是加载shader的函数,并将其运用到opengl

    • Heightmap 高度图初始化设置

      这边主要来看一下高度数据的读取和存储

      高度文件数据示例,包括两个文件:

       
       

       
       

      Load()函数就是一个一个读取float存储在 vector<vector<float>> data 里面:

      xxx.txt 信息就是用来生出 data,xxx_ao.txt 信息则是用来生出 vbo/tbo(坐标,颜色等信息);vbo_data/tbo_data(索引坐标信息)。

    • Areas 区域初始化设置

      这部分的数据结构如下:

    • PFNN模型

      模型的加载和初始化,首先来看其数据结构:

      ArrayXf 这个数据结构是Eigen下面存储 float array的结构。Load函数底下就是加载的文件,很多很多文件啊!

      我们来看上图所示的文件结构就可以发现,pfnn这个网格模型相关的数据内容,主要包含的就是网络模型和角色。

    • 加载游戏世界

      load_world 这些函数,目前来看这些函数里面主要是在做地形标记,所以来说这程序跑起来需要做的地形标记?

  • Game Loop 部分
    • Input处理

      目前只支持手柄,SDL包含跨平台的输入交互模块,细节不解释,见下图

      但事实上不是所有的交互都在这里,在渲染那边很多的主要操作都是直接写在渲染的部分的,但都是用了SDL接口。

    • 渲染

      一共包含前处理,渲染处理,后处理三部分,我们分别来看。

 
 

前处理

  • 更新相机(直接按键确定)

    右手柄摇杆控制相机旋转,LR控制zoomin/zoomout,然后直接作用于相机参数。

  • 更新目标方向和速度(直接按键确定)

    这部分也是直接响应按键输入,按键就确定了用户期望的目标方向和速度。

  • 更新步态(算法数据前处理第一步)

    通过上一时刻的 trajectory 参数 和 options 参数来确定当前时刻 trajectory 的参数。

  • 预测未来的 Trajectory(算法数据前处理第二步)

    通过上一步获得的 trajectory 参数 和 character 参数,来混合获得 trajectory_positions_blend 这个对象

  • 碰撞处理(算法数据前处理第三步)

    根据 areas 的 walls 的信息,来调整 trajectory_positions_blend 的值。

    在这里,又做了一步将 trajectory_positions_blend 的值写回 trajectory

  • 跳跃(算法数据前处理第四步)

    根据 areas 的 jump 的信息,来调整 trajectory 的值。

  • Crouch 区域(算法数据前处理第五步)

    根据 areas 的 crouch_pos 的信息,来调整 trajectory 的值。

  • 墙(算法数据前处理第六步)

    根据 areas 的 walls 的信息,来直接调整 trajectory 的值。

  • Trajectory 旋转(算法数据前处理第七步)

    trajectory->rotations 的值调整

  • Trajectory 高(算法数据前处理第八步)

    根据 heightmap 的值来调整 trajectory 的值

  • 输入的 Trajectory 位置方向(pfnn输入内容第一步)

    Trajectory 信息来获得 pfnn->Xp

  • 输入的 Trajectory 步态(pfnn输入内容第二步)

    Trajectory 信息来获得 pfnn->Xp

  • 输入的 当前的 Joint 位置速度和旋转角度(pfnn输入内容第三步)

    Trajectory 信息来获得 pfnn->Xp

  • 输入的 Trajectory 高度(pfnn输入内容第四步)

    Trajectory 信息来获得 pfnn->Xp

  • Perform Regression 【核心步骤:模型predict

    上面在设置的是pfnn的参数,而这里还需要设置的是predict函数的传入参数,是character->phase

  • 时间处理,这一步就是计算一下predict时间,debug用。
  • Build Local Transformpfnn输出)

    这一步就是运用pfnn的输出结果,来获得角色每个关节的position/velocity/rotation

    这里还需要的一步就是上面得到的关节数据是世界坐标,要转换到局部坐标。

  • IK 处理

    这一步就是对上面获得的关节数据,一个一个的应用到角色的IK关节!

 
 

渲染处理

  • Render Shadow
  • Render Terrain
  • Render Character
  • Render the Rest
  • Render Crouch Area
  • Render Jump Areas
  • Render Walls
  • Render Trajectory
  • Render Joints
  • UI Elements
  • PFNN Visual
  • Display UI

这里都是opengl使用,和AI数据的使用无关,就不在赘述。

 
 

后处理

  • Update Past Trajectory

    Trajectory 数据传递更新

  • Update Current Trajectory

    Trajectory数值计算更新

  • Collide with walls

    Trajectory 碰撞更新

  • Update Future Trajectory

    Trajectory 依据 pfnn结果来做更新

  • Update Phase

  • Update Camera

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

AI4Animation 工程

这里尝试非常多次还是没有办法对于里面的demo信息完整打开且没有报错,可以跑的demo。

因此我们在AIAnimation的基础上首先来看看这个工程怎么使用起来。

 
 

首先来看重点在角色,我们来看其角色的构造:

上面是两个demo的使用结构,可以看到就一个重要的csharp文件,我们来对比分析。

Original对应的是Siggraph17的内容,Adam对应的是Siggraph18的内容,我们先来看17。

 
 

首先看大的结构:

第二个类继承Editor对象,作用是在editor里面形成层级菜单Animation,其余的三个则是分别由另外的三个类来完成。

这三个对象也分别形成了三个子标签菜单项,如上面所示的图。


  • NeuralNetwork 这个类,里面只做了一件事情,就是让用户选择NN模型,也就是说这个类处理的是交互UI表现和逻辑,没有其他。NN里面应该包含的信息都在Model这个类里面。下图就是model里面存储的数据结构:

    然后我们来看接口函数:

    这边是为了兼容和扩展多种的NN方法设置的接口。

    剩下的就是一些Tensor相关的函数,Tensor是对Eigen数据的封装,其真实的计算实现都是由Eigen实现的,这边同时提供了一堆的数据结构关联操作的方法。

    最后model里面涉及的比较重要的内容就是Parameters,这边unity里面主要做的就是加载读取和存储方法。

  • Controller 这个类,处理的是Input,主要就是WSADQE. 还有一个很重要的变量是Styles数组,记录的应该是状态权重。
  • Character 这里做的就是驱动骨架运动。

而作为核心的中介数据 Trajectory 这个类,其就是一组数据点数组,并且包含对这个数组,单个点的操作方法。单个点的数据内容很丰富,包括各种变换和状态等:

 
 

所有的核心使用方法就是在Update函数里面,这边的做法应该是和AIAnimation里面是一模一样的,我们可以对比一下:

  • 只有存在NN模型的情况下,才会执行下面的所有内容。
  • Update Target Direction / Velocity

    这里做的就是:

    TargetDirection = TargetDirection Trajectory定义的当前位置 跟据 TargetBlending 权重混合。

    TargetVelocity = TargetVelocity Controller输入速度 跟据 TargetBlending 权重混合。

  • Update Gait

    Trajectory.Points[RootPointIndex] 的每一个Style的值 = 当前值 和 用户是否点选了要成为该Style 跟据 GaitTransition 权重混合。

  • Predict Future Trajectory

    预测的位置 = 当前位置和前一个位置的差值的延续 和 TargetPosition 差值获得

    预测的style = 延续当前的style

    预测的方向 = 当前的方向 和 TargetDirection 差值获得

  • Avoid Collisions

    保证新的位置可靠,也就是考虑了碰撞。

  • Input Trajectory Positions / Directions

    给NN.Model喂数据,Trajectory的每一个Point的位置和方向(都是xz轴值)

  • Input Trajectory Gaits

    给NN.Model喂数据,Trajectory的每一个Point的Style数组

  • Input Previous Bone Positions / Velocities

    给NN.Model喂数据,Joints的每一个关节的位置和速度

  • Input Trajectory Heights

    给NN.Model喂数据,Trajectory的每一个Point的高度信息(y轴值)

  • Predict【利用模型运算】

  • Update Past Trajectory (轨迹上 i < RootPointIndex 的点)

    更新Trajectory.Points[i] 的每一个点的信息:i位置=i+1位置的值(意思就是向前取一个点)

  • Update Current Trajectory(轨迹上 RootPointIndex 所在的点)

    跟据NN的输出结果来构建一个新的点Trajectory.Points[RootPointIndex]的信息,设置其位置方向

  • Update Future Trajectory(轨迹上 RootPointIndex+1 < i < Trajectory.Points.Length的点)

    每个点新的位置 = 每个点原来位置 + 当前方向 与 跟据模型输出值混合得到的距离和方向 差值(这边做了多重的影响差值考虑)

  • Avoid Collisions

    同 5 做法一致

  • Compute Posture

    positionsrotations两个数组存每一个joint的变换;

    每个 positions[i] = NN返回结果 * 0.5 + 上一个位置按照上一个方向到达的这一个应该在的位置 * 0.5;

    每个 Velocities[i] = NN返回的结果

  • Update Posture

    每个joint的position,rotation直接取上一步数组中对应的值

  • Map to Character

    transform应用在角色上面