Category: Advanced Game Tech

Terrain Raymarching


Arround 2002 I made my first attempts to visualize fractal mountains and test procedural texturing. At that time I was not able to raytrace polygons and my rasterization engine was really bad. So I chose to do a “raymarcher” to create some pictures cause it’s a really simple technique. What I implemented was a quite brute force thing and was very slow. But it was very simple to code, something I could afford, and allowed me to do some nice images, with reflections and some participating media too.【作者做地形生成的时候发现性能是个大问题,所以有了如下的内容】


That’s in fact the beauty of the technique – an incredibly simple code that produces interesting images. It’s perfect for small size demos, that’s why it has been used so much (think in Hymalaya/TBC or Ixaleno/rgba – the image on the right of this text, which is created from a 4 kilobytes executable). In fact you can code a basic prototype in few minutes (Ixaleno was pretty much done in one day, coded from scratch during 2007 Christmas; then of course I had to spend some extra days tunning the colors etc, but that’s another story.【这是一个非常简单的方法,你几分钟就可以搞定代码】


The basic idea is to have a height function y = f(x,z) that defines for each 2d point in the plane (x,z) the height of the terain at that point.【这个方法作用于高度图地形,这是高度图的定义函数】



The purpose of this article is not to generate such a function so that one gets an interesting landscape (see articles like the one on advanced value noise), but how to render those with the simple raymarching algorithm.【这里不讨论生成有趣的地形(见这里 ),讨论怎么去渲染地形】


Once you have such a function f(x,z), the goal is to use a raytracing setup to render the image and do other effects as shadows or reflections. That means that for a given ray with some starting point in space (like the camera position) and a direction (like the view direction) we want to compute the intersection of the ray and the terrain f. The simplest approach is to slowly advance in the ray direction in small steps, and at each stepping point determine if we are above of below the terrain level. The image below shows the process. 【对于高度图地形,如果我们需要做光照效果,我们要做的就是计算光源位置点和地形的关系,如下图。】


We start at some point in the ray close to the camera (the leftmost blue point in the image). We evaluate the terrain function f at the current x and z coordinates to get the altitude of the terrain h:h = f(x,z). Now we compare the altitude of the blue sampling point y with h, and we realize that y > h, or in other words, that the blue point is above the mountains. So, we step to the next blue point in the ray, and we repeat the process. At some point, perhapse, one of the sampling point will fall below the terrain, like the yellow point in the image. When that happens, y < h, and we know our ray crossed the terrain surface. We can just stop here and mark the current yellow point as the intersection point (even if we know it’s slightly further than the real intersection point), or perhaps take the last blue point as interesection point (slightly closer than the real intersection) or the average of the last blue and yellow points.【我们找光线上的一个点(x, y, z),同时去高度图f(x, y)获得其h,比较z,h。如果z>h则表示在地上,否则地下,地上地下的切换点就是相交位置。】





The mint, maxt and delt constants should be adapted for every scene. The first one is the distance to the near clipping plane, you can set it to zero. The second one is the maximun distance the ray is allowed to travel, ie, the visibility distance. The third one is the step size, and it directly influences the rendering speed and the quality of the image. The bigger it is, the faster of course, but the lower the terrain sampling quality will be.dolt,mint.maxt每帧都要调整,mint表示近裁剪面(ray开始),maxt表示的是远裁剪面(ray结束),dolt表示步长】


As you can see the code is terribly simple. There are many optimizations and improvements possible of course. For example, the accuracy of the intersection can be done more accurately by doing a linear approximation of the terrain altitudes between the blue and yellow points and compute the analytical intersection between the ray and the idealized terrain.【这个过程可以做很多优化,比如蓝色点和黄色点之间可以继续做差值计算来提升精度】


The other optimization is to note that as the marching moves the potential intersection point further and further (the bigger t becomes), the less important the error becomes, as geometric details get smaller in screen space as they get further from the camera. In fact, details decay inverse-linearly with the distance, so we can make our error or accuracy delt linear to the distance. This saves lot of rendering time and gives more uniform artifactas than the naive approach. This trick is described in several places, for example in the “Texturing And Modeling – A Procedural Approach” book.【另一个优化是依据越来越远地潜在交点,误差越不重要。因为随着距离相机越远,屏幕空间占比越小,所以我们可以使我们的误差与距离成线性关系。】



So the complete algorithm to build an image is simple. For every pixel of the screen construct a ray that starts at the camera position that passes through the pixel location as if the screen was right in front of the viewer, and cast the ray. Once the intersection point is found, the color of the terrain must be gathered, the shaded, and a color must be returned. That’s what the terrainColor() functions must do. If there is no intersection with the terrain, then the right color must be computed for the sky. So, the main code looks like this:【因此完整的image build算法就是,对于每一个pixel发出一条射线去找地形的交点,取地形交点颜色填充pixel




Usually terrainColor() will need to compute first the intersection point p, then compute the normal to the surface n, do some lighting/shading s based on that normal (even cast some shadow ray by using the castRay() function for doing some shadows), decide the color of the terrain surface m at the intersection point, combine the shading information with it, and probably do some fog calculations. Something like this:【terrain要做的工作见下面的代码】














软光 – a practical implementation

Improving the Rasterization Algorithm


All the techniques we presented in the previous chapters are really the foundation of the rasterization algorithm. Though, we have only implemented these techniques in a very basic way. The GPU rendering pipeline and other rasterization based production renderers use the same concepts but they used highly optimized version of these algorithms. Presenting all the different tricks that are used to speed up the algorithm goes way beyond the scope of an introduction. We will just review some of them quickly now but we plan to devote a lesson to this topic in the future.【前几章讲的所有的技术都是光栅化的基础算法,这边的实现都是实现的基础方法。基于GPU的做法在概念上与这边基础算法是一致的,只是做了实现上的优化。】


  • Aliasing and Anti-Aliasing


    First, let’s consider one basic problem with 3D rendering. If you zoom up on the image of the triangle that we renderered in the previous chapter, you will notice that the edges of the triangle are not regular (in fact, it is not specific to the edge of the triangle, you can also see that the checkerboard pattern is irregular on the edge of the squares). The steps that you can easily see in figure 1, are called jaggies. These jagged edges or stair-stepped edges (depending on how you prefer to call them) are not really an artifact. This is just the result of the fact that the triangle is broken down into pixels.【如果你放大看三角形的图像,你会发现他的边不再是连续的了,这个就是jaggies,存在的原因就是三角形光栅化为了一个个像素块显示】



    What we do with the rasterization process is breaking down a continuous surface (the triangle) into discrete elements (the pixels), a process that we already mentioned in the introduction to rendering. The problem is similar to trying to represent a continuous curve or surface with Lego bricks. You simply can’t, you will always see the bricks (figure 2). The solution to this problem in rendering is called anti-aliasing (also denoted AA). Rather that rendering only 1 sample per pixel (check if the pixel overlaps the triangle by testing if the point in the center of the pixel covers the triangles), we split the pixel into sub-pixels and repeat the coverage test for each sub-pixels. Of course each sub-pixel is nothing else than another brick thus this doesn’t solve the problem entirely. Nonetheless it allows to capture edges of the objects with slightly more precision. Pixels are most of the time divided into a N by N number of sub-pixels where N is generally a power of 2 (2, 4, 8, etc.), though it can technically take on any value greater or equal to 1 (1, 2, 3, 4, 5, etc.). There are in fact different ways of addressing this aliasing issue. The method we described belong to the category of sampling-based anti-aliasing methods.【我们怎样去break连续的surface到离散的pixel的,这就是anti-aliasing的过程。具体的方法有很多种】







    The pixel final color is computed as the sum of all the sub-pixels color divided by the total number of sub-pixels. Let’s take an example (as with pixels, if a sub-pixel or sample covers a triangle, it then takes on the color of that triangle, otherwise it takes on the background color which is usully black). Imagine that the triangle is white. If only 2 or the 4 samples overlap the triangle, the final pixel color will be equal to 0+0+1+1/4=0.5. The pixel won’t be completely white, but it won’t be complete black either. Thus rather than having a “binary” transition between the edge of the triangle and the background, that transition is more gradual, which has for effect to visually reduce the stair-stepped pixels artifact. This is what we call anti-aliasing. The understand anti-aliasing you need to study signal processing theory, which again is a very large and pretty complex topic on its own.【pixel最终的颜色是所有sub-pixel的集合,每个sub-pixel则是部分覆盖当前的pixel位置的对象颜色,最终取一个均值,这就是抗锯齿的做法】


    The reason why it is best to choose N as a power of 2, is because most processors these days, can run several instructions in parallel and the number of instruction run in parallel is also generally a power of 2. You can look on the Web for things such as SSE instruction sets which are specific to CPUs, but GPUs use the same concept. SSE is a feature that is available on most modern CPU and that can be used to run generally 4 or 8 floating point calculations at the same (in one cycle). All this means is that, for the price of 1 floating point operation, you actually get 3 or 7 for free. This can in theory speed up your rendering time by 4 or 8 (you can never reach that level of performance though because you need to pay a small penalty for setting these instructions up). You can use SSE instructions for example to render 2×2 sup-pixels for the cost of computing 1 pixel, and as a result you get smoother edges (the stair-stepped edges are less visible).【支持的最多的sub-pixel的数量最好是2的倍数】



  • Rendering Blocks of Pixels


    Another common technique to accelerate rasterization is to render blocks of pixels, but rather than testing all pixels contained in the block, we first start to test pixels at the corners of the block. GPUs algorithms can use blocks of 8×8 pixels. In the fact, the technique that is being used is more elaborate and is based on a concept of tiles, but we won’t detail it here. If all four corners of that 8×8 grid cover the triangle, then necessarily, the other pixels of the grid also cover the rectangle (as shown in figure 7). In that case, there is no need to test all the other pixels which saves obviously a lot of time. They can just be filled up with the triangle colors. If vertex attributes needs to interpolated across the pixels block, this is also straightforward because if you have computed them at the block’s corners, then all you need to do is linearly interpolate them in both direction (horizontally and vertically). This optimisation only works when triangles are close to the screen and thus large in screen space. Small triangles don’t benefit from this technique.【加速光栅化的一种方式,把pixel打包成block,先测试对象与block的四角,如果不同再细化处理。这种方法对大屏幕】



  • Optimizing the Edge Function


    The edge function too can be optimized. Let’s have a look at the function implementation again:【edge函数的实现如下,也可以被优化】



    Recall that a and b in this function are the triangle vertices and that c is the pixel coordinates (in raster space). Once interesting to note is that this function is going to be called for each pixel that is contained within the bounding box of the triangle. Though while we iterate over multiples pixels only c changes. The variable a and b stay the same. Suppose we evaluate the equation one time and get a result w0:【a,b是三角形顶点,c是pixel坐标,对于bounding box里面每一个pixel都需要跑这个函数,我们是否可以根据现有的去更新新的w0,假设当前w0是】



    然后考虑c的数值变化为s(the per pixel step),新的w0就是:









    The edge function uses 2 mults and 5 subs by with this trick it can be reduced to a simple addition (of course you need to compute a few initial values). This technique is well documented on the internet. We won’t be using it in this lesson but we will study it in more detail and implement it in another lesson devoted to advanced rasterization techniques.【结果是我们把2乘法和5减法合并成一个加法,可以大量提升性能。】



    Fixed Point Coordinates


    Finally and to conclude this section, we will briefly talk about the technique that consists of converting vertex coordinates which are initially defined in floating-point format to fixed-point format just before the rasterization stage. Fixed-point is the fancy word (in the fact the correct technical term) for integer. When vertex coordinates are converted from NDC to raster space, they are then also converted from floating point numbers to fixed point numbers. Why do we do that? There is no easy and quick answer to this question. But to be short, let’s just say that GPUs use fixed point arithmetic because from a computing point of view, manipulating integers is easier and faster than manipulating floats or doubles (it only requires logical bit operations). Again this is just a very generic explanation. The conversion from floating point to integer coordinates and how is the rasterization process implemented using integer coordinates is a large and complex topic which is not documented on the Internet (you will find very little information about this topic which is very strange considering that this very process is central to the way modern GPUs work).float point format 替代 fixed point format是因为 int 的加减乘数都比较快,但是这个的实现比较复杂】



    The conversion step involves to round off the vertex coordinates to the nearest integer. Though if you only do so, then you sort of snap the vertex coordinates to the nearest pixel corner coordinates. This is not so much an issue when you render a still image, but it creates visual artefacts with animation (vertices are snapped to different pixels from frame to frame). 【转换步骤包括将顶点坐标四舍五入到最接近的整数。但是如果你只这样做,当你渲染一个静止图像时,是可以的,但是对于不止一帧的连续渲染,它会带来视觉假象(顶点被捕捉到帧与帧之间的不同像素)。】

    The workaround is to actually convert the number to the smallest integer value but to also reserve some bits to encode the sub-pixel position of the vertex (the fractional part of the vertex position). GPUs typical use 4 bits to encode sub-pixel precision (you can search graphics APIs documentation for the term sub-pixel precision). In other words, on a 32 bits integer, 1 bit might is used to encode the number’s sign, 27 bits are used to encode the vertex integer position, and 4 bits are used to encode the fractional position of the vertex within the pixel. This means that the vertex position is “snapped” to the closest corner of a 16×16 sub-pixel grid as shown in figure 8 (with 4 bits, you can represent any integer number in the range [1:15]).

    【解决方法是实际将数字转换为最小的整数值,但也保留一些位来对顶点的子像素位置进行编码。 GPU通常使用4位来编码子像素精度。换句话说,在一个32位的整数上,1位可能被用来编码数字的符号,27位被用来编码顶点整数位置,而4位被用来编码像素内顶点的分数位置。这意味着顶点位置被”捕捉”到16×16子像素网格的最近角落,如图8所示(4位,可以表示[1:15]范围内的任何整数)】

    Somehow the vertex position is still snapped to some grid corner, but snapping in this case is less of a problem than when the vertex is snapped to the pixel coordinates. This conversion process leads to many other issues one of which is integer overflow (overflow occurs when the result of an arithmetic operation produces a number that is greater than the number that can be encoded with the number of available bits). This may happen because integers cover a smaller range of values than floats. Things gets also sort of complex when anti-aliasing is thrown into the mix. It would take a lesson on its own to explore the topic in detail.【有时候顶点位置仍然会被捕捉到某个网格角点,在这种情况下出现的问题要比顶点与像素坐标对齐时的问题小。 这个转换过程导致了许多其他问题,其中之一是整数溢出(当算术运算的结果产生的数字大于可用可用比特数编码的数字时发生溢出)。 这可能是因为整数覆盖的值小于浮点数。 当反锯齿投入混合时,情况也变得复杂。 这将需要一个教训来详细探讨这个话题。】


    Fixed points coordinates allow to to speed up the rasterization process and the edge function even more. This is one of the reasons for converting vertex coordinates to integers. These optimisation techniques will be presented in another lesson.【但是使用fixed point coordinates能有效提升效率】



    Notes Regarding our Implementation of the Rasterization Algorithm



    Finally, we are going to quickly review the code provided in the source code chapter. Here is a description of its main components:


    • We will use the function computeScreenCoordinates to compute the screen coordinates using the method detailed in the lesson devoted to the pinhole camera model. This is essentially needed to be sure that our render output can be compared to a render in Maya which is also using a physically-based camera model.【】



































软光 – perspective correct interpolation and vertex attributes

Interpolating Vertex Attributes


All we need to get a basis rasterizer working is to know how to project triangles onto the screen, convert the projected coordinates to raster space, then rasterize triangles, and potentially use a depth-buffer to solve the visibility problem. That would already be enough to create images of 3D scenes, which would be both perspective correct, and in which the visibility problem would be solved (objects that are hidden by others, are indeed not appearing in front of objects that are supposed to occlude them). This is already a good result. The code we have presented to do this is functional, but could be optimized greatly; however, optimizing the rasterization algorithm is not something we will look into in this lesson.【到这一章之前,你都解决所有的创建图像的准备工作,代码已经可以work了,但是还有些优化工作可以做】


Perspective Correct Vertex Attribute Interpolation


So, what are we going to talk about in this chapter? In the chapter devoted to rasterization, we talked about using barycentric coordinates to interpolate vertex data or vertex attribute (which is the most common name). We can use barycentric coordinates to interpolate depth values of course, and this is one of their main function, but they can also be used to interpolate vertex attributes; vertex attributes play a very important role in rendering, especially when it comes to lighting and shading. We will provide more information on how vertex attributes are used in shading when we get to shading in this section. But you don’t need to know anything about shading to actually understand the concept of perspective correct interpolation and vertex attributes.【这一章节讲什么?首先讲使用重心坐标差值vertex数据和属性,属性对于渲染时的光照阴影处理很关键】


As you already know, we need to store the z-coordinates of the original vertices (camera space vertices) in the z-coordinate of our projected vertices (screen space vertices). This is required so that we can compute the depth of points lying across the surface of projected triangle. Depth, as explained in the last chapter, is required to solve the visibility problem and is computed by linearly interpolating the reciprocal of the triangle vertices z-coordinates using barycentric coordinates. Though the exact same technique can be used to interpolate any other variable we want as long as the value of this variable is defined at the triangle’s vertices, similarly to the way we’ve stored in the projected point z-coordinates, the vertices original z-coordinates. For example it is very common to store at the triangle vertices a color. The other two most common variables or attributes (which is the proper term in CG) stored at the triangle vertices are texture coordinates and normals. Texture coordinates are 2D coordinates used for texturing (a technique we will study in this section). Normals are used in shading and define the orientation of the surface (check the lessons on shading to learn more about normals and smooth shading in particular). In this lesson we will more specifically use color and texture coordinates to illustrate the problem of perspective correct interpolation.【然后我们还会讲颜色和纹理坐标的差值】


As mentioned in the chapter on the rasterization stage, we can specify colors or anything else we want for the triangle vertices. These attributes can be interpolated using barycentric coordinates to find what the value of these attribute should be for any point inside the triangle. In other words, vertex attributes must be interpolated across the surface of a triangle when it is rasterized. The process is as follows:【如果使用barycentric coordinates来差值顶点属性,步骤如下】


  • You can assign as many vertex attributes to the triangle’s vertices as you want. They are defined on the original 3D triangle (in camera space). In our example, we will assign two vertex attributes, one for color and one for texture coordinates.【收集camera space的三角形顶点的属性】
  • The triangle is projected onto the screen (the triangle’s vertices are converted from camera space to raster space).【三角形投影到屏幕】
  • While in screen space, the triangle is “rasterized”. If a pixel overlaps the triangle, the barycentric coordinates of that pixel are computed.【三角形在屏幕空间光栅化,计算barycentric coordinates
  • Colors (or texture coordinates) defined at the triangle corners or vertices are interpolated using the previously computed barycentric coordinates using the following formula:【使用计算好的barycentric coordinates对属性差值】

    Where λ0λ0, λ1λ1 and λ2λ2 are the pixel’s barycentric coordinates, and C0C0, C1C1 and C2C2 are the colors for the triangle vertices. The result CPCP is assigned to the current pixel in the frame-buffer. The same can be done to compute the texture coordinates of the point on the triangle that the pixel overlaps.【纹理也是使用barycentric coordinates方式差值】

    These coordinates are used for texturing (check the lesson on Texture Mapping in this section if you wish to learn about texture coordinates and texturing).



This technique though just doesn’t work. 【这个方法不可行】


To understand why let’s see what happens to a point located in the middle of a 3D quad. As you can see in the top view of figure 2, we clearly have a quad and point P, is clearly in the middle of that quad (P is located at the intersection of the quad’s diagonals). Though, when we look at this good from a random viewpoint it is easy to see that depending on the quad’s orientation with respect to the camera, P doesn’t appear to be in the centre of the quad anymore. This is due to perspective projection which as mentioned before, preserves lines but doesn’t preserve distances. Though remember that barycentric coordinates are computed in screen space. 【例如我们有一个3D立方体,P是立方体的中心点,但是因为perspective projection我们看到的P不在屏幕上立方体中心位置,尽管barycentric coordinates下P是在中心位置】


Imagine that the quad is made of two triangles. In 3D space, P is located at equal distance between V1-V2 thus somehow it’s barycentric coordinates in 3D space are (0,0.5,0.5). Though, in screen space, since P is closer to V1 than it is to V2, then λ1λ1 is greater than λ2λ2 (and \lambda_0 is equal to 0). The problem though is that these are the coordinates that are used to interpolate the triangle’s vertex attributes. If V1 is white and V2 is black then the color at P should be 0.5. But if λ1λ1 is greater than λ2λ2 then we will get a value greater than 0.5. Thus clearly the technique we are using to interpolate vertex attributes doesn’t work. Let’s assume like in figure 1 that λ1λ1 and λ2λ2 are equal to 0.666 and 0.334 respectively. If we interpolate the triangle’s vertex colors, we get:【数字化前面的例子,有下面的差值公式】



We get 0.666 for the color of P and not 0.5 as we should. There is a problem and this problem relates to some extent to what we learned in the previous chapter regarding the interpolation of the vertices z-coordinates.【得到的结果是不对的】




Hopefully finding the right solution is not hard. Let’s imagine that we have a triangle with two z-coordinates Z0Z0 and Z1Z1 on each side of the triangle as shown in figure 3. If we connect these two points we can interpolate the z-coordinate of a point on this line using linear interpolation. We can do the same thing with two values of a vertex attributes C0C0 and C1C1 defined at the same positions on the triangle than Z0Z0 and Z1Z1 respectively. Technically, because both ZZ and CC are computed using linear interpolation, we can write the following equality (equation 1):【找到正确答案不难,可以在投影线上面找到01两个点,会差值得到C,Z


We also know from the last chapter that (equation 2):【从上一章我们已知,Z的等式】

The first thing we are going to do is substitute the equation for Z (equation 2) in the left-hand side of equation 1. The trick to simplify the resulting equation is to multiply the numerator and denominator of equation 2 by Z0Z1Z0Z1 to get rid of the 1/Z01/Z0 and 1/Z11/Z1 terms (this is of course not explained anywhere but here on Scratchapixel):【合并上面两式子,推导如下,可以解出C】


If we now multiply the numerator and the denominator by 1/Z0Z11/Z0Z1, we can then extract a factor of ZZ from the right-hand side of the equation:【再化简】

It took a while to get to that result, but this a very fundamental equation in rasterization (which by the way is almost never explained anywhere. It is sometimes explained but the steps to get to that result are almost never provided) because it is used to interpolate vertex attributes which is a very important and common feature in rendering. 【得到上式表示的C值,这个用来处理顶点特征差值的等式是光栅化里面的基础公式之一。】


What the equation says is that to interpolate a vertex attribute correctly, we first need to divide by the vertex attribute value by the z-coordinate of the vertex it is defined to, then linearly interpolate them using q (which in our case, will be the barycentric coordinates of the pixel on the 2D triangle), and then finally multiply the result by ZZ, which is the depth of the point on the triangle, that the pixel overlaps (the depth of the point in camera space where the vertex attribute is being interpolated). Here is another version of the code we used in chapter three, that shows an example of perspective correct vertex attribute interpolation:【这个等式的意思是,我们首先要将顶点属性除以Z值,然后根据q做线性差值,最后在乘回这个点的z值就是最终的差值结果,下面是代码实现】



Computing the sample death requires to use the reciprocal of the vertices z-coordinates. For this reason, we can pre-compute these values before we loop over all pixels (line 52). If we decide to use perspective correct interpolation, then the vertex attribute values are divided by the z-coordinate of the vertex they are associated to (lines 48-50). 【我们会在pixel着一处理之前,计算Z值倒数】


The following image shows on the left, an image computed without perspective correct interpolation, an image with (middle) and the content of the z-buffer (as a greyscale image. The closer the object is to the screen, the brighter). The difference is subtle though you can see in the left image how each color seems to roughly fill up the same area. This is due to the fact that colors in this case are interpolated within the “space” of the 2D triangle (as if the triangle was a flat surface parallel to the plane of the screen). However if you inspect the triangle vertices (and the depth buffer), you will notice that the triangle is not at all parallel to the screen (but oriented with a certain angle). In fact, because the vertex “painted” in green is closer to the camera than the other two, this part of the triangle fills up a larger part of the screen which is visible in the middle image (the green area is larger than the blue or red area). The image in the middle shows the correct interpolation and what you will get if you render this triangle with a graphics API such as OpenGL or Direct3D.【结果的差别如下图的左边两幅对比,左图是没有采用正确的差值,中间是采用了正确的差值,右边是Z-buffer。左图中可以看出每种颜色填充区域差不多大,那是因为采用了2D差值。中间则明显绿色区域比较多,因为采用了3D差值】



The difference between correct and incorrect perspective interpolation is even more visible when applied to texturing. In the next example, we assigned texture coordinates to the triangle vertices as vertex attributes, and use these coordinates to create a checkerboard patter to the triangle. Rendering the triangle with or without perspective correct interpolation is left as an exercise. In the image below you can see the result with. As you can see, it also matches an image of the same triangle with the same pattern rendered in Maya. Hopefully, our code so far seems to do the right thing. As with color, all you need to do (and this is true of all vertex attributes) is to divide the texture coordinates (which are generally denoted ST coordinates) by the z-coordinate of the vertex they are associated to, and then later in the code, multiply the texture coordinate interpolated value by Z. Here are the changes we made to the code:【正确不正确的效果在纹理贴图上面效果更明显】































软光 – the visibility problem

In the second chapter of this lesson, we learned that in the third coordinate of the projected point (the point in screen space) we store the original vertex z-coordinate (the z-coordinate of the point in camera space):【从第二章我们知道,我们存储了额外的z值】



Finding the z-coordinate of a point on the surface of the triangle is useful when a pixel overlaps more than one triangle. And the way we find that z-coordinate, is by interpolating the original vertices z-coordinates using the barycentric coordinates that we learned about in the previous chapter. In other words, we can treat the z-coordinates of the triangle vertices as any other vertex attribute, and interpolate them the same way we interpolated colors in the previous chapter. Before we look into the details of how this z-coordinate is computed, let’s start to explain why we need to do so.【在很多三角形overlap的时候,我们可以通过z值比较来确定可见性,这里我们来看怎么处理z值】



The Depth-Buffer or Z-Buffer Algorithm and Hidden Surface Removal


When a pixel overlaps a point, what we actually see through that pixel is a small area on the surface of a triangle, which for simplification we will reduce to a single point (denoted P in figure 1). Thus to each pixel covering a triangle, corresponds a point on the surface of that triangle. Of course, if a pixel covers more than one triangle, we then have several of these points. The problem when this happens is to find which one of these points is actually visible? We have illustrated this concept in 2D in figure 2. We could test triangles from back to front (this technique would require to sort triangles by decreasing depth first) but this doesn’t always work when triangle intersects each other (figure 2, bottom). The only reliable solution is to compute the depth of each triangle a pixel overlaps, and then compare these depth values to find out which one is the closest to camera. If you look at figure 2, you can see that a pixel in the image overlaps two triangle in P1 and P2. However P1 z-coordinate (Z1) is lower than P2 z-coordinate (Z2) thus we can deduce that P1 is in front of P2. Note that this technique is needed because triangles are tested in “random” order. As mentioned before we could sort out triangles in decreasing depth order but this not good enough. Generally, they are just tested in in the order they are specified in the program, and for this reason, a triangle T1 that is closer to camera can be tested before a triangle T2 that is further away. If we were not comparing these triangles depth, then we would end up in this case seeing the triangle which was tested last (T2) when in fact we should be seeing T1. As mentioned many times before, this is called the visibility problem or hidden surface problem. Algorithms for ordering objects so that they are drawn correctly are called visible surface algorithms or hidden surface removal algorithms. The depth-buffer or z-buffer algorithm that we are going to study next, belongs to this category of algorithms.【当一个p对应于一个pixel的时候,直接光栅化显示(图一);如果多个p对应于一个pixel的时候,比较好的方法就是比较这些p的深度,画最近的(图二)】




One solution to the visibility problem is to use a depth-buffer or z-buffer. A depth-buffer is nothing more than a two-dimensional array of floats that has the same dimension than the frame-buffer and that is used to store the objects depth as the triangles are being rasterized. When this array is created, we initialize each pixel in the array with a very large number. If we find that a pixel overlaps the current triangle, we do as follows:visibility的解法是采用depth-bufferdepth-bufferframe-buffer一一对应,存储当前pixel显示的内容的深度,初始化为最大值,如果一个triangle与pixel overlap,则见下面步骤】

  • We first compute the z-coordinate or depth of the point on the triangle that the pixel overlaps.【计算三角形每个点的深度】
  • We then compare that current triangle depth with the value stored in the depth buffer for that pixel.【比较这个三角形上面P的深度和其depth-buffer的深度】
  • If we find that the value stored in the depth-buffer is greater than the depth of the point on the triangle, then the new point is closer to the observer or the camera than the point stored in the depth buffer at that pixel location. The value stored in the depth-buffer is then replaced with the new depth, and the frame-buffer is updated with the current triangle color. On the other hand, if the value stored in the depth-buffer is smaller than the current depth sample, then the triangle that the pixel overlaps is hidden by the object whose depth is currently stored in the depth-buffer.【新的小于depth-buffer则替换显示,否则就是被遮挡】




Finding Z by Interpolation


Hopefully the principle of the depth-buffer is simple and easy to understand. All we need to do now, is explained how depth values are computed. First let’s repeat one more time what that depth value is. When a pixel overlaps a triangle, it actually overlaps a small surface on the surface of the triangle, which as mentioned in the introduction we will reduce to a point for simplification (point P in figure 1). What we want to find here, is this point z-coordinate. As also mentioned earlier in this chapter, if we know the triangle vertices’ z-coordinate (which we do, they are stored in the projected point z-coordinate), all we need to do is interpolate these coordinates using P’s barycentric coordinates (figure 4):【depth buffer算法本身比较简单易于理解,我们还需要考虑的是P的深度值怎么计算,我们通过差值来获得,比如考虑重心坐标差值方式可行么?】




Technically this sounds reasonable, though unfortunately it doesn’t work. Let’s see why. The problem is not in the formula itself which is perfectly fine. The problem is that once the vertices of a triangle are projected onto the canvas (once we have performed the perspective divide), then z, the value we want to interpolate, doesn’t vary linearly anymore across the surface of the 2D triangle. This is easier to demonstrate with a 2D example.【不可行,因为投影过后的2D三角形的Z值不是线性的】


The secret lies in figure 4. Imagine that we want to find the “image” of a line defined in 2D space by two vertices V0 and V1. The canvas is represented by the horizontal green line. This line is one unit away (along the z-axis) from the coordinate system origin. If we trace lines from V0 and V1 to the origin, then we intersect the green lines in two points (denoted V0′ and V1′ in the figure). The z-coordinate of these point is 1 since they lie on the canvas which is 1 unit away from the origin. The x-coordinate of the points can easily be computed using perspective projection. We just need to divide the original vertex x-coordinates by their z-coordinate. We get:【如图四,我们要考虑的是原始的线段P所在位置的比例,这个直接插值就是对的】



The goal of the exercise is to find the z-coordinate of P, a point on the line defined by V0 and V1. In this example, all we know about P is the position of its projection P’, on the green line. The coordinates of P’ are {0,1}. The problem is similar to trying to find the z-coordinate of a point on the triangle that a pixel overlaps. In our example, P’ would be the pixel and P would be the point on the triangle that the pixel overlaps. What we need to do now, is compute the “barycentric coordinate” of P’ with respect to V0′ and V1′. Let’s call the resulting value λλ. Like our triangle barycentric coordinates, λλ is also in the range [0,1]. To find λλ, we just need to take the distance between V0′ and P’ (along the x-axis), and divide this number by the distance between V0′ and V1′. If linearly interpolating the z-coordinates of the original vertices V0 and V1 using λλ to find the depth of P works, then we should get the number 4 (we can easily see by just looking at the illustration that the coordinates of P are {0,4}). Let’s first compute λλ:【我们的目标是找到P的z坐标,这里我们还是可以求重心坐标比率lamda,公示如下】



If we now linearly interpolate V0 and V1 z-coordinate to find P z-coordinate we get:【另外P的线性差值表示如下】



Clearly this is not the value we expect! Interpolating the original vertices z-coordinates, using P’s “barycentric coordinates” or λλ in this example, to find P z-coordinate doesn’t work. Why? The reason is simple. Perspective projection preserves lines, but does not preserve distances. It’s quite easy to see in figure 4, that the ratio of the distance between V0 and P over the distance between V0 and V1 (0.666) is not the same than the ratio of the distance between V0′ and P’ over the distance between V0′ and V1′ (0.833). If λλ was equal to 0.666 it would work fine, but here is the problem, it’s equal to 0.833 instead! So, how do we find the z-coordinate of P?【很明显如图四,到此为止得不到我们想要的P的z值,我们怎么去处理?】


The solution to the problem is to compute the inverse of P z-coordinate by interpolating the inverse of the vertices V0 and V1 z-coordinates using λλ. In other words, the solution is:【解决方法是去计算P的z坐标的倒数,一下就可以得到下面的公式】




If now take the inverse of this result, we get for P z-coordinate the value 4. Which is the correct result! As mentioned before, the solution is to linearly interpolate the vertices z-coordinates using barycentric coordinates, and invert the resulting number to find the depth of P (its z-coordinate). In the case of our triangle, the formula is:【然后你会发现,直接代入值,比例居然是对的。就如前面所说,解决方案是使用重心坐标对顶点Z坐标进行线性内插,并将结果数字倒置以找到P的深度。三角形的情况公式如下】



Let’s now look into this problem more formally. Why do we need to interpolate the vertices inverse z-coordinates? The formal explanation is a bit complicated and you can skip it if you want. Let’s consider a line in camera space defined by two vertices whose coordinates are denoted (X0,Z0)(X0,Z0) and (X1,Z1)(X1,Z1). The projection of these vertices on the screen are denoted S0S0 and S1S1 respectively (in our example, we will assume that the distance between the camera origin and the canvas is 1 as shown in figure 5). Let’s call S a point on the line defined by S0S0 and S1S1. S has a corresponding point P on the 2D line whose coordinates are (X,Z = 1) (we assume in this example that the screen or the vertical line on which the points are projected is 1 unit away from the coordinate system origin). Finally, the parameters tt and qq are defined such that:【我们来思考为什么需要inverse z坐标。如果在XZ平面内,我们来考虑camera space 的一条线,其对应到screen就是s这条线,一些参数定义如下图所示】







































软光 – the rasterization stage


Rasterization: What Are We Trying to Solve?


In the previous chapter, we learned how to performed the first step of the rasterization algorithm in a way, which is to project the triangle from 3D space onto the canvas. This definition is not entirely accurate in fact, since what we actually really did was to transform the triangle from camera space to screen space, which as mentioned in the previous chapter, is also a three-dimensional space. However the x- and y-coordinates of the vertices in screen-space correspond to the position of the triangle vertices on the canvas, and by converting them from screen-space to NDC space and then finally from NDC-space to raster-space, what we actually get in the end are the vertices 2D coordinates in raster space. Finally, we also know that the z-coordinates of the vertices in screen-space holds the original z-coordinate of the vertices in camera space (inverted so that we deal with positive numbers rather than negatives ones).【上一章节讲的是将三角形从三维空间投影到canvas,最终得到裁剪空间的2D坐标和z值】


What we need to do next, is to loop over the pixel in the image and find out if any of these pixels overlap the “projected image of the triangle” (figure 1). In graphics APIs specifications, this test is sometimes called the inside-outside test or the coverage test. If they do, we then set the pixel in the image to the triangle’s color. The idea is simple but of course, we now need to come up with a method to find if a given pixel overlaps a triangle. This is essentially what we will study in this chapter. We will learn about the method that is typically used in rasterization to solve this problem. It uses a technique known as the edge function which we are now going to describe and study. This edge function is also going to provide valuable information about the position of the pixel within the projected image of the triangle known as barycentric coordinates. Barycentric coordinates play an essential role in computing the actual depth (or the z-coordinate) of the point on the surface of the triangle that the pixel overlaps. We will also explain what barycentric coordinates are in this chapter and how they are computed.【下一步我们要做的是递归所有的overlap pixel。图形API中叫这一步叫做 inside-outside test,这边方法的主要思想第一章讲过,主要要注意的是边界的处理,这边使用的技术叫edge function来处理】

At the end of this chapter, you will be able to actually produce a very basic rasterizer. In the next chapter, we will actually look into the possible issues with this very naive implementation of the rasterization algorithm. We will list what these issues are as well as study how they are typically addressed.【这边还会给出基本的实现,以及一些常见问题的解决】


A lot of research has been done to optimize the algorithm. The goal of this lesson is not to teach you how to write or develop an optimized and efficient renderer based on the rasterization algorithm. The goal of this lesson is to teach the basic principles of the rendering technique. Don’t think though that the techniques we present in these chapters, are not actually used. They are used to some extent, but how they are actually implemented either on the GPU or in a CPU version of a production renderer, is just likely to be a highly optimized version of the same idea. What is truly important is to understand the principle and how it works in general. From there, you can study on your own about the different techniques which are used to speed up the algorithm. But the techniques presented in this lesson are generic and make up the foundations of any rasterizer.【有很多研究都是在优化这一步,但是本教材主要告诉你的是基本的方法,你可以自学其他的优化方法】



Keep in mind that drawing a triangle (since triangle is primitive we will use in this case), is a two steps problem:

  • We first need to find which pixels overlap the triangle.【找出三角形overlap哪些pixel
  • We then need to define which colors should the pixels overlapping the triangle be set to, a process that is called shading【决定pixel的颜色】



The Edge Function


As mentioned above, they are several possible methods to find if a pixel overlaps a triangle. It would be good to document older techniques, but in this lesson, will only present the method that is generally used today. This method was presented by Juan Pineda in 1988 and a paper called “A Parallel Algorithm for Polygon Rasterization” (see references in the last chapter).【这里我们所采用的方法】


Before we look into Pineda’s technique itself, we will first describe the principle of his method. Let’s say that the edge of a triangle can be seen as a line splitting the 2D plane (the plane of the image) in two (as shown in figure 2). The principle of Pineda’s method is to find a function which he called the edge function, so that when we test on which side of this line a given point is (the point P in figure 2), the function returns a negative number when it is to the left of the line, a positive number when it is to the right of this line, and zero, when the point is exactly on the line.【每一条边可以看作是划分plane成两部分的线,这个方法就是找出edge functions,然后测试point在这些线的哪一边,坐便输出positive number,右边输出 negative number。如下图所示】



In figure 2, we applied this method to the first edge of the triangle (defined by the vertices v0-v1. Be careful the order is important). If we now apply the same method to the two other edges (v1-v2 and v2-v0), we then can clearly see that there is an area (the white triangle) within which all points are positive (figure 3). If we take a point within this area, then we will find that this point is to the right of all three edges of the triangle. If P is in fact a point in the centre of a pixel, we can then use this method to find if the pixel overlaps the triangle. If for this point, we find that the edge function returns a positive number for all three edges, then the pixel is contained in the triangle (or may lie on one of its edges). The function Pinada uses also happens to be linear which means that it can be computed incrementally but we will come back on this point later.【上图只考虑一条边,这里考虑三条边如下图。顺时针描述线,则point的描述结果都为positive的话,就是在三角形内部】



Now that we understand the principle, let’s find out what that function is. The edge function is defined as (for the edge defined by vertices V0 and V1):【来看公式表述】



As the paper mentions, this function has the useful property that its value is related to the position of the point (x,y) relative to the edge defined by the points V0 and V1:【得到的结果与意义如下】

  • E(P) > 0 if P is to the “right” side
  • E(P) = 0 if P is exactly on the line
  • E(P) < 0 if P is to the “left ” side


In fact this function is equivalent in mathematics to the magnitude of the cross products between the vector (v1-v0) and (P-v0). We can also write these vectors in a matrix form (presenting this as a matrix has no other interest than just presenting the two vectors in a neat way):【三条边一起考虑,可以矩阵表示】



If we write that A=(PV0)A=(PV0) and B=(V1V0)B=(V1V0), then we can also write the vectors A and B as a 2×2 matrix:



The determinant of this matrix can be computed as:




Understanding what’s actually happening is easier when we look at the result of a cross product between two 3D vectors (figure 4). 【为了理解,我们来看图四】



In 3D, the cross-product returns another 3D vector which is perpendicular (or orthonormal) to the two original vectors. But as you can see in figure 4, the magnitude of that orthonormal vector also changes with the orientation of the two vectors with respect to each other. In figure 4, we assume a right-hand coordinate system. When the two vectors A (red) and B (blue) are either pointing exactly in the same direction or in opposite directions, the magnitude of the third vector C (in green) is zero. Vector A has coordinates (1,0,0) and is fixed. When vector B has coordinates (0,0,-1), then the green vector, vector C has coordinate (0,-1,0). If we were to find its “signed” magnitude, we would find that it is equal to -1. On the other hand, when vector B has coordinates (0,0,1), then C has coordinates (0,1,0) and its signed magnitude is equal to 1. In one case the “signed” magnitude is negative, and in the second case, the signed magnitude is positive. In fact, in 3D, the magnitude of a vector can be interpreted as the area of the parallelogram having A and B as sides as shown in figure 5 (read the Wikipedia article on the cross product to get more details on this interpretation):【这边想说的是上面的公式可以看作是坐标系变换,把ABC看作是一个坐标系,就可以标出一个positive空间】



Obviously an area should always be positive, though the sign of the above equation provides an indication about the orientation of the vectors A and B with respect to each other. When with respect to A, B is within the half plane defined by vector A and a vector orthogonal to A (let’s call this vector D; note that A and D form an 2D Cartesian coordinate system), then the result of the equation is positive. When B is within the opposite half plane, the result of the equation is negative (figure 6). Another way of explaining this result, is that the result is positive when the angle θθ is in the range ]0,π[]0,π[ and negative when θθ is in the range ]π,2π[]π,2π[. Note then when theta is exactly equals to 0 or ππ then the cross-product or the edge function returns 0.【上面公式的另一解释,半平面方面的理解(妈的一点也不直观,没看明白这东西怎么表示,不看没关系)】



To find if a point is inside a triangle, all we care about really is the sign of the function we used to compute the area of the parallelogram. However, the area itself also plays an important role in the rasterization algorithm; it is used to compute the barycentric coordinates of the point in the triangle, a technique we will study next. The cross product in 3D and 2D has the same geometric interpretation, thus the cross-product between two 2D vectors also returns the “signed” area of the parallelogram defined by the two vectors. The only difference is that in 3D, to compute the area of the parallelogram you need to use this equation:【为了判断point是否在三角形内,我们关心的是公式解的符号。但是其实三角形内部区域本身也很重要,它用来计算三角形中点的重心坐标,3D计算公式如下】


while in 2D, this area is given by the cross-product itself (which as mentioned before can also be interpreted as the determinant of a 2×2 matrix):2D计算公式】



From a practical point of view, all we need to do now, is test the sign of the edge function computed for each edge of the triangle and another vector defined by a point and the first vertex of the edge (figure 7).【实际实现就是对于每个点检测三条边公式如下】



If all three tests are positive or equal to 0, then the point is inside the triangle (or lie on one of the edges of the triangle). If any one of the test is negative, then the point is outside the triangle. In code we get:【如果结果都是positive则是在三角形内】




Alternative to the Edge Function


There are obviously other ways than the edge function method to find if pixels overlap triangles, however as mentioned in the introduction of this chapter, we won’t study them in this lesson. Just for reference though, the other common technique is called scanline rasterization. It is based on the Brenseham algorithm that is generally used to draw lines. GPUs use the edge method mostly because it is more generic than the scanline approach which is also more difficult to run in parallel that the edge method, but we won’t provide more information on this topic in this lesson.【其他替代方法自己看去,这里有reference】



Be Careful! Winding Order Matters


One of the things we have been talking about yet, but which has a great importance in CG, is the order in which you declare the vertices making up the triangles. They are two possible conventions which you can see illustrated n the figure 8: clockwise or counter-clockwise ordering or winding. Winding is important because it essentially defines one important property of the triangle which is the orientation of its normal. Remember that the normal of the triangle can be computed from the cross product of the two vectors A=(V2-V0) and B=(V1-V0). Let’s say that V0={0,0,0}, V1={1,0,0} and V2={0,-1,0} then (V1-V0)={1,0,0} and (V2-V0)={0,-1,0}. Let’s now compute the cross product of these two vectors:【这个算法非常重要的一点是要注意三角形顶点顺序,顺时针和逆时针会得到相反的结果】



As expected, the two normals are pointing in opposite directions. The orientation of the normal has a great importance for lots of different reasons, but one of the most important ones is called face culling. Most rasterizers and even ray-tracer for that matter may not render triangles whose normal is facing away from the camera. This is called back-face culling. Most rendering APIs such as OpenGL or DirectX give the option to turn back-face culling off, however you should still be aware that vertex ordering plays a role in what’s actually rendered, among many other things. And not surprisingly, the edge function is one of these other things. Before we get to explaining why it matters in our particular case, let’s say that there is not particular rule when it comes to choosing the order. In reality so many details in a renderer implementation may change the orientation of the normal that you can’t assume that by declaring vertices in a certain order, you will get the guarantee that the normal will be oriented a certain way. For instance rather that using the vectors (V1-V0) and (V2-V0) in the cross-product you could as have used (V0-V1) and (V2-V1) instead. It would have produced the same normal but flipped. Even if you use the vectors (V1-V0) and (V2-V0), remember that the order of the vectors in the cross-product changes the sign of the normal: A×B=B×AA×B=B×A. So the direction of your normal also depends of the order of the vectors in the cross-product. For all these reasons, don’t try to assume that declaring vertices in one order rather than the other will give you one result or the other. What’s important though, is that once you stick to the convention you have chosen. Generally, graphics APIs such as OpenGL and DirectX expect triangles to be declared in counter-clockwise order. We will also use counter-clockwise winding. Now let’s see how ordering impacts the edge function.【不同的顶点排列顺序得到的法线结果不一样,法线也很重要,正好可以用于背面剔除,OpenGL/DX都是这么用的,顺便更进一步就是用来做光栅化。】





Barycentric Coordinates (重心坐标)


Computing barycentric coordinates is not necessary to get the rasterization algorithm working. For a really naive implementation of the rendering technique, all you need is to project the vertices and use a technique like the edge function that we described above, to find if pixels are inside triangles. These are the only two necessary steps to produce an image. However the result of the edge function which as we explained above, can be interpreted as the area of the parallelogram defined by vector A and B can actually directly be used to compute these barycentric coordinates. Thus, it makes sense to study the edge function and the barycentric coordinates at the same time.【计算重心坐标在光栅化这一步不是必要的,但是可以在这一步顺便计算出来给其他方面使用】


Before we get any further though, let’s explain what these barycentric coordinates are. First, they come in a set of three floating point numbers which in this lesson, we will denote λ0λ0, λ1λ1 and λ2λ2. Many different conventions exist but Wikipedia uses the greek letter lambda as well (λλ) which is also used by other authors (the greek letter omega ωω is also sometimes used). This doesn’t matter, you can call them the way you want. In short, the coordinates can be used to define any point on the triangle in the following manner:【如果我们有三个点,则任意的点P可以表示为如下公示所示】


Where as usual, V0, V1 and V2 are the vertices of a triangle. These coordinates can take on any value, but for points which are inside the triangle (or lying on one of its edges) they can only be in the range [0,1] and the sum of the three coordinates is equal to 1. In other words:【如果三个点是三角形顶点,则P如果在三角形内,lamda的值都应该在[0,1]内且相加为1,如下所示】


This is a form of interpolation if you want. They are also sometimes defined as weights for the triangle’s vertices (which is why in the code we will denote them with the letter w). A point overlapping the triangle can be defined as “a little bit of V0 plus a little bit of V1 plus a little bit of V2”. Note that when any of the coordinates is 1 (which means that the others in this case are necessarily 0) then the point P is equal to one of the triangle’s vertices. For instance if λ2=1λ2=1 then P is equal to V2. Interpolating the triangle’s vertices to find the position of a point inside the triangle is not that useful. But the method can also be used to interpolate across the surface of the triangle any quantity or variable that has been defined at the triangle’s vertices. Imagine for instance that you have defined a color at each vertex of the triangle. Say V0 is red, V1 is green and V2 is blue (figure 12). What you want to do, is find how are these three colors interpolated across the surface of the triangle. If you know the barycentric coordinates of a point P on the triangle, then its color CPCP (which is a combination of the triangle vertices’ colors) is defined as:【这就是我们所期望的差值表示形式,这个差值可以直接用于颜色的平滑表示,如图12,点p的颜色结果就直接可以差值得到,公式如下】



This is a very handy technique which is going to be useful to shade triangles. Data associate with the vertices of triangles are called vertex attribute. This is a very common and very important technique in CG. The most common vertex attributes are colors, normals and texture coordinates. What this means in practice, is that generally when you define a triangle you don’t only pass on to the renderer the triangle vertices but also its associated vertex attributes. For example if you want to shade the triangle you may need a color and normal vertex attribute, which means that each triangle will be defined by 3 points (the triangle vertex positions), 3 colors (the color of the triangle vertices) and 3 normals (the normal of the triangle vertices). Normals too can be interpolated across the surface of the triangle. Interpolated normals are used in a technique called smooth shading which was first introduced by Henri Gouraud. We will explain this technique later when we get to shading.【smooth shading就是这么实现的】




Let’s see how it looks in code. We were already computing the edge functions before to test if points were inside triangles. Only, in our previous implementation we were just returning true or false depending on whether the result of the function was either positive or negative. To compute the barycentric coordinates, we need the actual result of the edge function. We can also use the edge function to compute the area (multiplied by 2) of the triangle. Here is a version of an implementation that tests if a point P is inside a triangle and if so, computes its barycentric coordinates:【代码,edge function的实现】




Interpolate vs. Extrapolate


One thing worth noticing is that the computation of barycentric coordinates works independently from its position with respect to the triangle. In other words, the coordinates are valid if the point is inside our outside the triangle. When the point is inside, using the barycentric coordinates to evaluate the value of a vertex attribute is called interpolation, and when the point is outside, we speak of extrapolation. This an important detail because in some cases, we will have to evaluate the value of a given vertex attribute for points that potentially don’t overlap triangles. To be more specific, this will be needed to compute the derivatives of the triangle texture coordinates for example. These derivatives are used to filter textures properly. If you are interested in learning more about this particular topic we invite you to read the lesson on Texture Mapping. In the meantime, all you need to remember is that barycentric coordinates are valid even when the point doesn’t cover the triangle. You also need to know about the difference between vertex attribute extrapolation and interpolation.【这里值得注意的是点p用三个顶点表示的方式,不管p在不在三角形内,都是成立的。在三角形内的时候叫做差值,在三角形外的时候叫做外推,这个外推在texture坐标对应的时候还是很有用的】



Rasterization Rules


In some special cases, a pixel may overlap more than one triangle. This happens when a pixel lies exactly on an edge shared by two triangles as shown in figure 17. Such pixel would pass the coverage test for both triangles. If they are semi-transparent, a dark edge may appear where the pixels overlap the two triangles as a result of the way semi-transparent objects are combined with each other (imagine two super-imposed semi-transparent sheets of plastic. The surface is more opaque and looks darker than the individual sheets). You would get something similar to what you can see in figure 18, which is a darker line where the two triangles share an edge.【考虑一个特殊情况,如图所示,两个三角形的一条边共线,这时候如果三角形都是半透明的,光栅化的结果会是中间有一条比较深的线,原因是这里两个三角形都绘制了】




The solution to this problem is to come up with some sort of rule that guarantees that a pixel can never overlap twice two triangles sharing an edge. How do we do that? Most graphics APIs such as OpenGL and DirectX define something which they call the top-left rule. We already know the coverage test returns true if a point is either inside the triangle or if it lies on any of the triangle edges. What the top-left rule says though, is that the pixel or point is considered to overlap a triangle if it is either inside the triangle or lies on either a triangle top edge or any edge that is considered to be a left edge. What is a top and left edge? If you look at figure 19, you can easily see what we mean by top and left edge.【解决方法是需要有一个排序做法排除这种情况。Opengl/DirectX 采用的做法叫做 top-left rule,就是overlap的边界如果处在左边和上面的边会绘制,判断方法如下】


  • A top edge is a edge that is perfectly horizontal and whose defining vertices are above the third one. Technically this means that the y-coordinates of the vector V[(X+1)%3]-V[X] is equal to 0 and that its x-coordinates is positive (greater than 0).
  • A left edge is essentially an edge that is going up. Keep in mind that in our case, vertices are defined in clockwise order. An edge is considered to go up if its respective vector V[(X+1)%3]-V[X] (where X can either be 0, 1, 2) has a positive y-coordinate.






Putting Things Together: Finding if a Pixel Overlaps a Triangle


Let’s test the different techniques we learned about in this chapter, in a program that produces an actual image. We will just assume that we have projected the triangle already (check the last chapter of this lesson for a complete implementation of the rasterization algorithm). We will also assign a color to each vertex of the triangle. Here is how the image is formed. We will loop over all the pixels in the image and test if they overlap the triangle using the edge function method. All three edges of the triangle are tested against the current position of the pixel, and if the edge function returns a positive number for all the edges then the pixel overlaps the triangle. We can then compute the pixel’s barycentric coordinates and use these coordinates to shade the pixel by interpolating the color defined at each vertex of the triangle. The result of the frame-buffer is saved to a PPM file (that you can read with Photoshop). The output of the program is shown in figure 20.【到此为止可以完成光栅化这一步了,可以使用颜色填充三角形,可以解决多个三角形overlap的问题,可以输出如图所示的结果】





Conclusion and What’s Next?


There are many interesting techniques and trivia related to the topics barycentric coordinates but this lesson is just an introduction to the rasterization algorithm thus we won’t go any further. One trivia that is interesting to know though, is that barycentric coordinates are constant along lines parallel to an edge (as shown in figure 21).【同样还有很多相关的技术来解决同样的问题,这里只讲基础的,其他的自己看】



In this lesson, we learned two important method and various concepts.

  • First, we learned about the edge function and how it can be used to find if a point P overlaps a triangle. The edge function is computed for each edge of the triangle, and a second vector defined by the edge first vertex and another point P. If for all three edges, the function is positive, then point P overlaps the triangle.【使用edge function解决pixel是否在三角形内的问题】
  • Furthermore, we also learned that the result of the edge function can also be used to compute the barycentric coordinates of point P. These coordinates can be used to interpolate vertex data or vertex attribute across the surface of the triangle. They can be interpreted as weights for the various vertices. The most common vertex attribute are color, normal and texture coordinates.edge function也可以用来计算中心坐标等,作用于颜色,法线和纹理的处理】























软光 – the projection stage

Quick Review


In the previous chapter, we gave a high-level overview of the rasterization rendering technique. It can be decomposed into two main stages: first, the projection of the triangle’s vertices onto the canvas, then the rasterization of the triangle itself. Rasterization really means in this case, “breaking apart” the triangle’s shape into pixels or raster element squares; this is how pixels used to be called in the past. In this chapter, we will review the first step. We have already described this method in the two previous lessons, thus we won’t explain it here again. If you have any doubt about the principles behind perspective projection, check these lessons again. However, in this chapter, we will study a couple of new tricks related to projection that are going to be useful when we will get to the lesson on the perspective projection matrix. We will learn about a new method to remap the coordinates of the projected vertices from screen space to NDC space. We will also learn more about the role of the z-coordinate in the rasterization alogrithm and how it should be handled at the projection stage.【上一篇讲了整体思路,主要步骤。这一篇主要讲的是第一步骤的做法】


Keep in mind as already mentioned in the previous chapter, that the goal of the rasterization rendering technique is to solve the visibility or hidden surface problem, which is to determine with parts of a 3D object are visible and which parts are hidden.【时刻记住软光栅是用来做可见性判断的】


Projection: What Are We Trying to Solve?


What are we trying to solve here at that stage of the rasterization algorithm? As explained in the previous chapter, the principle of rasterization is to find if pixels in the image overlap triangles. To do so, we first need to project triangles onto the canvas, and then convert their coordinates from screen space to raster space. Pixels and triangles are then defined in the same space, which means that it becomes possible to compare their respective coordinates (we can check the coordinates of a given pixel against the raster-space coordinates of a triangle’s vertices).

The goal of this stage, is thus to convert the vertices making up triangles from camera space to raster space.



Projecting Vertices: Mind the Z-Coordinate!


In the previous two lessons, we mentioned that when we compute the raster coordinates of a 3D point what we really need in the end are its x- and y-coordinates (the position of the 3D point in the image). As a quick reminder, recall that these 2D coordinates are obtained by dividing the x and y coordinates of the 3D point in camera space, by the point respective z-coordinate (what we called the perspective divide), and then remapping the resulting 2D coordinates from screen space to NDC space and then NDC space to raster space. Keep in mind that because the image plane is positioned at the near clipping plane, we also need to multiply the x- and y-coordinate by the near clipping plane. Again, we explained this process in great detail in the previous two lessons.


Note that so far, we have been considering points in screen space as essentially 2D points (we didn’t need to use the points’ z-coordinate after the perspective divide). From now on though, we will declare points in screen-space, as 3D points and set their z-coordinate to the camera-space points’ z-coordinate as follow:




It is best at this point to set the projected point z-coordinate to the inverse of the original point z-coordinate, which as you know by now, is negative. Dealing with positive z-coordinates will make everything simpler later on (but this is not mandatory).【注意z值取反】


Keeping track of the vertex z-coordinate in camera space is needed to solve the visibility problem. Understanding why is easier if you look at figure 1. Imagine two vertices v1 and v2 which when projected onto the canvas, have the same raster coordinates (as shown in figure 1). If we project v1 before v2 then v2 will be visible in the image when it should actually be v1 (v1 is clearly in front of v2). However if we store the z-coordinate of the vertices along with their 2D raster coordinates, we can use these coordinates to define which point is closest to camera independently of the order in which the vertices are projected (as shown in the code fragment below).【保持z值存储就是为了解决可见性判断的问题,方法就是与z-buffer里面的比较看遮挡关系,如图】




What we want to render though are triangles not vertices. So the question is, how does the method we just learned about apply to triangles? In short, we will use the triangle vertices coordinates to find the position of the point on the triangle that the pixel overlaps (and thus its z-coordinate). This idea is illustrated in figure 2. If a pixel overlaps two or more triangles, we should be able to compute the position of the points on the triangles that the pixel overlap, and use the z-coordinates of these points as we did with the vertices, to know which triangle is the closest to the camera. This method will be described in detail in chapter 4 (The Depth Buffer. Finding the Depth Value of a Sample by Interpolation).【如何比较Z值的问题,就是采用z-buffer方式,前面讲过,细节看第四章节】


Screen Space is Also Three-Dimensional


To summarize, to go from camera space to screen space (which is the process during which the perspective divide is happening), we need to:【总结一下】

  • Perform the perspective divide: that is dividing the point in camera space x- and y-coordinate by the point z-coordinate.xy值由坐标转换得到】
  • But also set the projected point z-coordinate to the original point z-coordinate (the point in camera space).【z值由原来的z得到】


Practically, this means that our projected point is not a 2D point anymore, but in fact a 3D point. Or to say it differently, that screen space is not two- by three-dimensional. In his thesis Ed-Catmull writes:【由上面可以看出screen space也是一个3D空间】


Screen-space is also three-dimensional, but the objects have undergone a perspective distortion so that an orthogonal projection of the object onto the x-y plane, would result in the expected perspective image (Ed-Catmull’s Thesis, 1974).


You should now be able to understand this quote. The process is also illustrated in figure 3. First the geometry vertices are defined in camera space (top image). Then, each vertex undergoes a perspective divide. That is, the vertex x- and y-coordinates are divided by its z-coordinate, but as mentioned before, we also set the resulting projected point z-coordinate to the inverse of the original vertex z-coordinate. This by the way, infers a change of direction in the z-axis of the screen space coordinate system. As you can see, the z-axis is now pointing inward rather than outward (middle image in figure 3). But the most important thing to notice, is that the resulting object is a deformed version of the original object but nonetheless a three-dimensional object. Furthermore what Ed-Catmull means when he writes “an orthogonal projection of the object onto the x-y plane, would result in the expected perspective image”, is that once the object is in screen space, if we trace lines perpendicular to the x-y image plane from the object to the canvas, then we get an perspective representation of that object (as shown in figure 4). This is an interesting observation because it means that the image creation process can be seen as a perspective projection followed by an orthographic projection. Don’t worry if you don’t understand clearly the difference between the perspective and orthographic projection. It is the topic of the next lesson. However, try to remember this observation, as it will become handy later.【如图三和四,展示的是投影】



Remapping Screen Space Coordinates to NDC Space


In the previous two lessons, we explained that once in screen space, the x- and y-coordinates of the projected points need to be remapped to NDC space. In the previous lessons, we also explained that in NDC space, points on the canvas had their x- and y-coordinates contained in the range [0,1]. In the GPU world though, coordinates in NDC space are contained in the range [-1,1]. Sadly, this is one of these conventions again, that we need to deal with. We could have kept the convention [0,1] but because GPUs are the reference when it comes to rasterization, it is best to stick to the way the term is defined in the GPU world.screen space 首先要转换到 NDC space,坐标范围可以是[0,1],GPU上面则通常为[-1, 1]】


Thus once the points have been converted from camera space to screen space, the next step is to remap them from the range [l,r] and [b,t] for the x- and y-coordinate respectively, to the range [0,1]. The term l, r, b, t here denotes the left, right, bottom and top coordinates of the canvas. By re-arranging the terms, we can easily find a equation that performs the remapping we want:pointscamera space to screen space,要做的就是xy坐标由[l, r][b, t]转换到[11],推导如下所示】








This is a very important equation because the red and green term of the equation in the middle of the formula will become coefficients of the perspective projection matrix. We will study this matrix in the next lesson. But for now, we will just apply this equation to remap the x-coordinate of a point in screen space to NDC space (any point that lies on the canvas has its coordinates contained in the range [-1.1] when defined in NDC space). If we apply the same reasoning to the y-coordinate we get:【上面中间部分直接就可以转换为坐标转换的矩阵参数,结果就是从屏幕空间坐标转换到了NDC space。同样的方法用到y轴结果如下】




Putting Things Together


At the end of this lesson, we now can perform the first stage of the rasterization algorithm which you can decompose into two steps:【这节讲的两个步骤如下】


  • Convert a point in camera space to screen space. It essentially projects a point onto the canvas, but keep in mind that we also need to store the original point z-coordinate. The point in screen-space is tree-dimensional and the z-coordinate will be useful to solve the visibility problem later space to screen space

  • We then convert the x- and y-coordinates of these points in screen space to NDC space using the following formulas:screen space to NDC space


From there, it is extremely simple to convert the coordinates to raster space. We just need to remap the x- and y-coordinates in NDC space to the range [0,1] and multiply the resulting number by the image width and height respectively (don’t forget that in raster space the y-axis goes down while in NDC space it goes up. Thus we need to change y’s direction during this remapping process). In code we get:






























软光 – An Overview of the Rasterization Algorithm


  1. OpenGL
  2. OpenGL
  3. Opengl ES 2.0




Rasterization: a Practical Implementation




An Overview of the Rasterization Algorithm


The rasterization rendering technique is surely the most commonly used technique to render images of 3D scenes, and yet, that is probably the least understood and the least properly documented technique of all (especially compared to ray-tracing).【光栅化是3D图形渲染的基础技术,大家都需要了解】


Why this is so, depends on different factors. First, it’s a technique from the past. We don’t mean to say the technique is obsolete, quite the contrary, but that most of the techniques that are used to produce an image with this algorithm, were developed somewhere between the 1960s and the early 1980s. In the world of computer graphics, this is middle-ages and the knowledge about the papers in which these techniques were developed tends to be lost. Rasterization is also the technique used by GPUs to produce 3D graphics. Hardware technology changed a lot since GPUs were first invented, but the fondamental techniques they implement to produce images haven’t changed much since the early 1980s (the hardware changed, but the underlying pipeline by which an image is formed hasn’t). In fact these techniques are so fondamental and consequently so deeply integrated within the hardware architecture, that no one pays attention to them anymore (only people designing GPUs can tell what they really do, and this is far from being a trivial task; but designing a GPU and understanding the principle of the rasterization algorithm are two different things; thus explaining the latter should actually not be that hard!).【原因:有历史的技术不代表过时,光栅化同样是需要GPU来渲染图像,因为其技术直接被硬件实现了,所以大家的关注度不高,但确确实实是最基础的图形渲染技术】


Regardless, we thought it was urgent and important to correct this situation. With this lesson, we believe to be the first ressource that provides a clear and complete picture of the algorithm as well as a complete and full practical implementation of the technique. If you found in this lesson the answers you have been desperately looking for anywhere else, please consider making a donation! This work is provided to you for free and requires many hours of hard work.【这里提供光栅化的最完整的讲解】




Rasterization and ray tracing try to solve the visibility or hidden surface problem but in a different order (the visibility problem was introduced in the lesson Rendering an Image of a 3D Scene, an Overview). Both algorithms have in common that they essentially use techniques from geometry to solve that problem. In this lesson, we will describe briefly how the rasterization (you can write rasterisation if you prefer UK English to US English) algorithm works. Understanding the principle is quite simple but implementing it requires to use a series of techniques notably from the field of geometry, that you will also find explained in this lesson.【光栅化和光线追踪都是从几何体信息出发,解决surface可见性的问题,但是order不一样,具体见后面的章节,这里主要是算法讲解。】


The program we will develop in this lesson to demonstrate how rasterization works in practice is important, because we will use it again in the next lessons to implement the ray-tracing algorithm as well. Having both algorithms implemented in the same program will allow us to more easily compare the output produced by the two rendering techniques (they should both produce the same result at least before shading is applied) and performances. It will be great way of better understanding the pros and cons of both algorithms.【这里的关注点是去了解光栅化是怎么工作的,但是我们同时会实现光线追踪,并比较其优劣性能和特点】


The Rasterization Algorithm


In fact there is not one but multiple rasterization algorithms, but to go straight to the point, let’s say that all these different algorithms though are based upon the same overall principle. In other words, all these algorithms are just variants of the same idea. It is this idea or principle, we will refer to when we actually speak of rasterization in this lesson.【算法有很多种,但是overall principle是一样的】


What is that idea? In the previous lessons, we already talked about the difference between rasterization and ray-tracing. We also suggested that the rendering process can essentially be decomposed into two main tasks: visibility and shading. Rasterization to say things quickly, is essentially a method to solve the visibility problem. Visibility consists of being able to tell which parts of 3D objects are visible to the camera. Some parts of these objects can be bidden because they are either outside the camera’s visible area or hidden by others objects.【渲染过程可以看作是两个主要步骤:visibility and shading,光栅化解决的是可视化的问题,就是相机可以看见哪些对象,看不见的隐藏。】


Solving this problem can be done in essentially two ways. You can either trace a ray through every pixel in the image to find out the distance between the camera and any object this ray intersects (if any). The object visible through that pixel is the object with the smallest intersection distance (generally denoted t). This is the technique used in ray tracing. Note that in this particular case, you create an image by looping over all pixels in the image, tracing a ray for each one of these pixels, and then finding out if these rays intersect any of the objects in the scene. In other words, the algorithm requires two main loops. The outer loop, iterates over the pixel in the image and the inner loop iterates over the objects in the scene:【解决visible的问题有两种做法:第一种是光线追踪,从相机位置向每一个像素位置射出一条线,找到其相交的对象做处理,这个算法需要两个loop,首先迭代每一个像素,然后迭代每一个object】


Note that in this example, the objects are actually considered to be made of triangles (and triangles only). Rather than iterating other the objects, we just consider the objects as a pool of triangles and iterate other the triangles instead. For reasons we have already explained in the previous lessons, the triangle is often used as the basic rendering primitive both in ray tracing and in rasterization (GPUs requires the geometry to be triangulated).【不管是光线追踪还是光栅化,三角形都作为是基础几何体来做】


Ray tracing is the first possible approach to solve the visibility problem. We say the technique is image centric because we shoot rays from the camera into the scene (we start from the image) as opposed to the other way around, which is the approach we will be using in rasterization.【我们把基于ray tracing的方法叫做image centric


Rasterization takes the opposite approach. To solve for visibility, it actually “projects” triangles onto the screen, in other words, we go from a 3D representation to a 2D representation of that triangle, using perspective projection. This can easily be done by projecting the vertices making up the triangle onto the screen (using perspective projection as we just explained). The next step in the algorithm is to use some technique to fill up all the pixels of the image that are covered by that 2D triangle. These two steps are illustrated in figure 2. From a technical point of view, they are very simple to perform. The projection steps only requires a perspective divide and a remapping of the resulting coordinates from image space to raster space, a process we already covered in the previous lessons. Finding out which pixels in the image the resulting triangles cover, is also very simple and will be described later.【第二种是光栅化,采用了和ray tracing相反的做法,其第一步是使用perspective projection将三角形投影到屏幕,也就是说用2D来表示三角形而不是3D。第二步是在这个屏幕上填充这个三角形覆盖的区域,见图二】



What does the algorithm look like compared to the ray tracing approach? First, note that rather than iterating over all the pixels in the image first, in rasterization, in the outer loop, we need to iterate over all the triangles in the scene. Then, in the inner loop, we iterate over all pixels in the image, and find out if the current pixel is “contained” within the “projected image” of the current triangle (figure 2). In other words, the inner and outer loops of the two algorithms are swapped.【这个算法也需要两个loop,首先迭代每一个object,然后迭代每一个像素】


This algorithm is object centric because we actually start from the geometry and walk our way back to the image as opposed to the approach used in ray tracing where we started from the image and walked our way back into the scene.【我们把基于光栅化的方法叫做object centric


Both algorithms are simple in their principle, but they differ slightly in their complexity when it comes to actually implementing them and finding solutions to the different problems they require to solve. In ray tracing, actually generating the rays is simple but finding the intersection of the ray with the geometry can reveal itself to be difficult (depending on the type of geometry you deal with) and is also potentially computationally expensive. But let’s ignore ray tracing for now. In the rasterization algorithm we need to project vertices onto the screen which is simple and fast, and we will see that the second step which requires to actually find out if a pixel is contained within the 2D representation of a triangle has an equally simple geometric solution. In other words, computing an image using the rasterization approach relies on two very simple and fast techniques (the perspective process and finding out if a pixel lies within a 2D triangle). Rasterization is a good example of an “elegant” algorithm. The techniques it relies on have simple solutions; they are also easy to implement and produce predictable results. For all these reasons, the algorithm is very well suited for the GPU and is actually the rendering technique applied by GPUs to generate images of 3D objects (it can also easily be run in parallel).【理论上两种方法框架类似,但是实现的时候的性能复杂度很不同。Ray tracing的时候去找物体的计算复杂度非常高,光栅化的时候投影的计算复杂度则低得多得多,而且容易实现,事实上已经被GPU硬件实现】


In summary:

  • Converting geometry to triangles makes the process simpler. If all primitives are converted to the triangle primitive, we can write fast and efficient functions to project triangles onto the screen and check if pixels lie within these 2D triangles【全部使用三角形来处理】
  • Rasterization is object centric. We project geometry onto the screen and determine their visibility by looping over all pixels in the image.object centric
  • It relies on mostly two techniques: projecting vertices onto the screen and finding out if a given pixel lies within a 2D triangle.【两步骤】
  • The rendering pipeline run on GPUs is based on the rasterization algorithm.【GPU采用的是光栅化技术】


Hopefully at this point of the lesson, you have understood the way the image of a 3D scene (made of triangles) is generated using the rasterization approach. Of course what we described so far is the simplest form of the algorithm. First it can be optimized greatly but furthermore we haven’t explained yet what happens when two triangles projected onto the screen overlap the same pixels in the image. When that happens, how do we define which one of these two (or more) triangles is visible to the camera. We will now answer these two questions.【辛运的是你已经了解了怎样从3D场景做光栅化的基本做法,下面我们讲如果两个三角形投影到平面会发生什么,怎么确定可见性,怎么优化】



Optimizing: 2D Triangles Bounding Box


The problem with the naive implementation of the rasterization algorithm we gave so far, is that it requires in the inner loop to iterate over all pixels in the image, even though only a small number of these pixels may be contained within the triangle (as shown in figure 3). Of course this depends on the size of the triangle in the screen. But considering we are not interested in rendering one triangle but an object made up of potentially from a few hundreds to a few millions triangles, it is unlikely that in a typical production example, these triangles will be very large in the image.【上面我们给出的基本光栅化做法,对于每一个object需要迭代图像中的每一个像素,即使是这个对象很小,只占非常少的像素点。】


There are different ways of minimising the number of tested pixels, but the most common one consists of computing the 2D bounding box of the projected triangle, and iterating over the pixels contained in that 2D bounding box rather than the pixels of the entire image. While some of these pixels might still lie outside the triangle, at least on average, it can already considerably improve the performance of the algorithm. This idea is illustrated in figure 3.【最常用的优化做法是计算三角形的2D包围盒,至迭代处理包围盒内的像素点】




Computing the 2D bounding box of a triangle is actually very simple. We just need to find the minimum and maximum x- and y-coordinates of the three vertices making up the triangle in raster space. This is illustrated in the following pseudo code:【计算2D包围盒只需要知道原来三个点的最大最小xy值,很简单】



Once we calculated the 2D bounding box of the triangle (in raster space), we just need to loop over the pixel defined by that box. But you need to be very careful about the way you convert the raster coordinates, which in our code are defined as floats rather than integers. First note that one or two vertices may be projected outside the boundaries of the canvas. Thus, their raster coordinates may be lower than 0 or greater than the image size. We solve this problem by clamping the pixel coordinates to the range [0, Image Width – 1] for the x coordinate, and [0, Image Height – 1] for the y coordinate. Furthermore we will need to round off the minimum and maximum coordinates of the bounding box to the nearest integer value (note that this works fine when we iterate over the pixels in the loop, because we initialize the variable to xmim or ymin and break from the loop when the variable x or y is lower or equal to xmas or ymax). All these tests needs to be applied before using the final fixed point (or integer) bounding box coordinates in the loop. Here is the pseudo-code:【有了2D包围盒,下面要做的就是遍历包围盒内的像素点。但是这里需要非常注意的是光栅化坐标位置,作者的实现使用的是float而不是int,范围X从是0到Width-1,Y从0到Height-1。你需要查找float坐标周围的所有int的范围】




The Image or Frame-Buffer


Our goal is to produce an image of the scene. We have two ways of visualizing the result of the program, either by displaying the rendered image directly to the screen or saving the image to disk, and using a program such as Photoshop to preview the image later on. But in both cases though, we somehow need to store the image that is being rendered while it’s being rendered and for that purpose, we use what we call in CG an image or frame-buffer.

It is nothing else than a two-dimensional array of colors that has the size of the image. Before the rendering process starts, the frame-buffer is created and the pixels are all set to black. At render time, when the triangles are rasterized, if a given pixel overlaps a given triangle, then we store the color of that triangle in the frame-buffer at that pixel location. When all triangles have been rasterized, the frame-buffer will contain the image of the scene. All that is left to do then is either displaying the content of the buffer to the screen, or save its content to a file. In this lesson, we will choose the latter option.【我们目标是用image展示场景,有两种visualize结果,一种是展示到显示器,一种是存到磁盘,其实不管哪种结果,在此之前结果是在frame buffer里面缓存的。初始化的时候,建立framebuffer并填黑,光栅化的时候,如果pixel与triangle overlap,则存储三角形的颜色到这个pixel,所有object都完成光栅化,则frame buffer存储的就是场景图像了,就可以输出存储或者显示了。】



When Two Triangles Overlap the Same Pixel: The Depth Buffer (or Z-Buffer)


Keep in mind that the goal of the rasterization algorithm is to solve the visibility problem. In order to display 3D objects, it is necessary to determine which surfaces are visible. In the early days of computer graphics two method were used to solve the “hidden surface” problem (the other name for the visibility problem): the Newell algorithm and the z-buffer. We only mention the Newell algorithm for historical reasons but we won’t study it in this lesson because it is actually not used anymore. We will only study the z-buffer method which is used by GPUs.【光栅化的作用是处理可见性问题,hidden surface 的问题现在的统一方法就只有 z-buffer 一种】


There is one last thing though that we need to do in order to get a basic rasterizer working. We need to account for the fact that more than one triangle may overlap the same pixel in the image (as shown in figure 5). When this happens, how do we decide which triangle is visible? The solution to this problem is actually very simple. We will use what we call a z-buffer which is also called a depth buffer, two terms that you may have heard or read about already quite often. A z-buffer is nothing more than another two-dimensional array which has the same dimension than the image, however rather than being an array of colors, it is simply an array of floating numbers. 【如果多个三角面投影到像素点发生重合,做法就是z-buffer解决】


Before we start rendering the image, we initialise each pixel in this array to a very large number. When a pixel overlaps a triangle, we also read the value stored in the z-buffer at that pixel location. As you maybe guessed, this array is used to store the distance from the camera to the nearest triangle that any pixel in the image overlaps. Since this value is initially set to infinity (or any very large number), then of course, the first time we find that a given pixel X overlaps a triangle T1, the distance from the camera to that triangle is necessarily lower than the value stored in the z-buffer. What we do then, is replace the value stored for that pixel with the distance to T1. Next, when the same pixel X is tested and that we find that it overlaps another triangle T2, we then compare the distance of the camera to this new triangle to the distance stored in the z-buffer (which at this point, stores to the distance to the first triangle T1). If this distance to the second triangle is lower than the distance to the first triangle, then T2 is visible and T1 is hidden by T2. Otherwise T1 is hidden by T2, and T2 is visible. In the first case, we update the value in the z-buffer with the distance to T2 and in the second case, the z-buffer doesn’t need to be updated since the first triangle T1 is still the closest triangle we found for that pixel so far. As you can see the z-buffer is used to store the distance of each pixel to the nearest object in the scence (we don’t really use the distance, but we will give the details further in the lesson). 【用最远值初始化z-buffer,然后一个三角形投影的时候,每一个overlap的想读位置都要去读取z-buffer的值,这个值存储的是当前这个像素最近面到相机的距离。如果新的triangle pixel值小于z-buffer,则替换,否则不动。】


In figure 5, we can see that the red triangle is behind the green triangle in 3D space. If we were to render the red triangle first, and the green triangle second, for a pixel that would overlap both triangles, we would have to store in the z-buffer at that pixel location, first a very large number (that happens when the z-buffer is initialized), then the distance to red triangle and then finally the distance to the green triangle.【图五所示】


You may wonder how we find the distance from the camera to the triangle. Let’s first look at an implementation of this algorithm in pseudo-code and we will come back to this point later (for now let’s just assume the function pixelContainedIn2DTriangle computes that distance for us):





What’s Next?


Obviously this is only a very high-level description of the algorithm (figure 6) but this should hopefully already give you an idea of what we will need in the program in order to produce an image.【图6是总体结构描述】



We will need:

  • An image-buffer (a 2D array of colors),
  • A depth-buffer (a 2D array of floats),
  • Triangles (the geometry making up the scene),
  • A function to project vertices of the triangles onto the canvas,
  • A function to rasterize the projected triangles,
  • Some code to save the content of the image buffer to disk.


In the next chapter we will see how are coordinates converted from camera to raster space. The method is of course identical to the one we studied and presented in the previous lesson, however we will present a few more tricks along the way. In chapter three, we will actually learn how to rasterize triangles. In chapter four, we will study in details how the z-buffer algorithm works. As usual, we will conclude this lesson with a practical example.【下一章节讲述坐标转换】
































Software Occlusion Culling – intel



The technique divides scene objects into occluders and occludees and culls occludees based on a depth comparison with the occluders that are software rasterized to the depth buffer. 【做法】

The sample code uses frustum culling and is optimized with Streaming SIMD Extensions (SSE) instruction set and multi-threading to achieve up to 8X performance speedup compared to a non-culled display of the sample scene.【好处】




If we can lower the computational cost of occluded objects, we can improve the application’s performance with no impact to the scene quality. 【把挡住的物体不送进GPU渲染,则可以提高性能】

Using the GPU-generated depth buffer on the CPU is one technique for early detection of occluded objects. However, doing occlusion culling in this manner may cause latency and overhead issues and introduce visual artifacts since the GPU may be several frames behind the CPU. To avoid these problems, we propose software rasterization of the depth buffer on the CPU.【GPU生成的depth buffer是一种方法,但是会影响GPU性能】


In this approach, we use the CPU to rasterize the depth buffer. Then, we use axis-aligned bounding box tests to determine if an object is occluded. Occluded objects are removed from the rendering pipeline to reduce overhead. We do not cull visible and partially visible objects; we submit them to the GPU for rendering. The sample code associated with the Software Occlusion Culling sample implements this technique. Additionally, in this sample, the software rasterizer is vectorized using SSE and multi-threaded to improve performance.【这里使用CPU来做,然后使用axis-aligned bounding box来测试剔除,剔除完提交GPU渲染】


Occluders: Objects that are sufficiently large to hide or occlude other objects in the scene.【遮挡其他物体的物体】

Occludees: Objects that are hidden, or occluded, by other objects in the scene.【被其他物体遮挡的物体】


Software Occlusion Culling has two steps: depth buffer rasterization and depth test culling. The next sections provide details about these steps.


Depth Buffer Rasterization


The occluders in the scene are rasterized to the depth buffer on the CPU. Figure 1 shows a screenshot of the Software Occlusion Culling sample. The castle walls and the ground plane are considered as occluders in the scene. The castle walls along with the attached tiny wooden trim and decorations are used as occluders to avoid special pre-processing of the art assets. Ideally, it would have been better to hand pick just the large castle walls as occluders, but we wanted to make sure that the software occlusion culling algorithm worked with a realistic scene without the need for expensive content changes.【墙可以挡住大部分的东西可以不画】


Our rasterizer divides the frame buffer into tiles and bins occluder triangles based on the intersection with the tiles. The rasterizer bins triangles into multiple tiles when the triangles cross tile boundaries. When processing a tile, a thread steps through the binned triangles and uses bounding box traversal to rasterize the triangles to the depth buffer.【这边是在说自动根据三角形获得包围盒?】 The rasterizer checks if the pixel under consideration is inside the triangle, and if so, it uses barycentric coordinates to interpolate the depth value at the pixel location. The depth buffer rasterization process updates the depth buffer at the pixel position if the newly computed depth is closer to the observer as compared to the already existing depth at the same pixel location.


Once the depth buffer has been rasterized on the CPU, all the occludees in the scene have to be depth tested against the depth buffer to determine which occludees are visible and which can be culled.【等rastered完成,测试被遮挡体是否被遮挡】


Depth Test Culling


We use object space axis-aligned bounding boxes (AABBs) to depth test the occludees in the scene against the CPU-generated depth buffer. The depth test algorithm treats all objects in the scene including the occluders (castle walls and ground plane) as occludees. Using AABB makes the depth tests more conservative. If the AABB is occluded, then the object contained inside it is also occluded and can be culled. If the AABB is visible, then the assumption is that the object contained inside might also be visible. However, as the bounding box is conservative, this assumption may not always be true, and we may have some false positives.【使用物体的AABB来测试】


To determine if an object is clipped by the near clip plane, we could use the homogeneous coordinate w, of the vertices that make up the bounding box. The near clip plane is set to 1.0 in the sample. Objects that are in front of the camera have a w> 1.0. So if any of the vertices that make up the bounding box has a w < 1.0, then the object is being clipped by the near clip plane.【近面裁剪】


However, we store 1/w instead of the homogenous coordinate w to avoid divide by w at multiple locations in the rasterizer.


So the bounding box of an occludee is being clipped if 1/w > 1.0 or 1/w < 0.0 for any vertex of the bounding box in which case we trivially accept the occludee as visible. 【使用1/W替代w来表示更容易处理】





找到了paper:Masked Software Occlusion Culling


Parallel Graphics in Frostbite –
Current & Future


同样的,battlefield3 也是手工建遮挡体


Umbra occlusion culling 也是手动处理出模型

Occlusion Culling in Alan Wake

For solving visibility from the cameras point of view, Alan Wake used a hierarchical hardware based occlusion query algorithm [Bittner et al. 2004], [Guthe et al. 2006]. The advantages of using a hardware based culling algorithm outweighed the content pipeline issues associated with using a typical software occlusion system based on either precomputed potentially visible sets (PVS) or artist generated occlusion geometry and software rasterizer [Aila and Miettinen 2004]. Occlusion culling minimized the overdraw and the bottleneck shifted to shadow map generation.


GPU-Driven Rendering Pipelines



Killzone 软光栅做法

【这里的做法是使用手工调整大小的physics mesh

We didn’t want artists to hand-make occluders for the entire level, so we looked for an automatic solution. We experimented with using visual meshes, but the good polygon reduction proved to be difficult to get right. Physics mesh is much better choice. It’s available for each mesh in the game and it’s cleaned and sufficiently low polygon count. Unfortunately the physics meshes can be slightly larger than visual meshes causing objects to disappear. Worst offenders have to be fixed manually.

Here’s an example of occluders generated automatically from physics mesh. You can notice there’s some unnecessary detail, but in general the quality is pretty good

Even using physics mesh there was too much occluder geometry.

We needed to reject occluders that were unlikely to contribute much to the occlusion buffer. Our simple heuristics rejects all small objects or objects where the name suggests that they are not good occluders. We also reject meshes whose surface area suggests that they are thin or with too many holes. Artists can of course step in and override the heuristics or provide their custom occluders for difficult cases and optimization.

【physics mesh太多,我们需要筛选去掉一些:形状奇怪的,有洞的,小的都不要】



Solving Visibility and Streaming in the The Witcher 3: Wild Hunt with Umbra 3

首先看 LOD0 的才作为遮挡体





Umbra 分析

umbra3,也就是unity的做法,采用的是PVS,基于区域的预计算方法来做occlusion culling

Umbra pre-processes the scene offline, voxelizes it, then structures it into portals and cells to create a spatial database used at runtime. Queries can then be performed in real-time and involve the generation of a low-resolution hierarchical depth-buffer of the scene, performed by an highly-optimized software rasterizer.


It makes sense on a lot of levels and recently Intel also published another example of a purely CPU-based solution — similar to the Umbra technique, occluders are software-rasterized to create a hierarchical depth buffer.intel的实现可以看作是和umbra类似,采用了分层的深度缓冲器】

Umbra’s culling system precomputes a simplified visibility database representing a 3D world. A fast runtime library then traverses the database in order to solve visibility for a view.umbra的方法是需要与计算获得一些信息,运行时使用这些数据做遮挡剔除】

We call these databases tomes. They consist of automatically placed portals and cells. These are common concepts in the world of visibility: a cell is a volume in 3D space, within which visibility is similar. The world is divided into cells. Portals are links between cells: openings in space leading to another cell that also have a shape. The difficult problem is finding good portals and cells in fast and practical fashion.【预计算做的是快速的自动场景体素化】

Umbra’s precomputation step, Umbra optimizer, solves this problem. It starts with voxelization – discretization of 3D space. The algorithm begins by generating small “raw” cells. These are then improved by combining them together into bigger final cells, so that loss in occlusion is minimal.【细分体素后,再重新组合形成最终的cell

Each raw cell is a continuous set of empty voxels not bigger than some user defined size. Portals are generated on open surfaces between these cells. User controls database accuracy by supplying the size for these raw cells. This “raw cell size” together with voxel size are two most important user controlled parameters in Umbra. They are publicly given more understandable names: “smallest occluder” (raw cell size) and “smallest hole” (voxel size). Controlling these parameters allows scaling the data for any environment and requirements. Several versions of data are generated for different levels of detail – occlusion near camera is more important than farther away.cell>=体素大小的块,cell的最大值则是用户可以控制的,这点要注意设置好】

Umbra’s runtime walks these cells linked by portals to figure out the visible volume. Visible objects are collected along the path. Compared to other kind of portal culling systems, Umbra generates a lot of portals. Runtime thus presents quite an engineering challenge: it needs to be fast and optimized on each platform to handle high portal count. Common method for portal culling is narrowing the view frustum to fit encountered portals. Umbra doesn’t do this – Umbra rasterizes the portals instead. In the end, a low-resolution depth buffer is generated.【运行时遍历cell之间的关系链收集到可见物体,如果遇到太多的物体,一般做法是缩小可见区域来删减需要处理的物体的量,umbra的做法则是采用更粗的粒度来参与剔除】

Some visibility approaches are based on hand made occluder models – a representation of what’s solid in the world. You could see Umbra as a method for modelling empty space instead. This approach has the benefit that all inaccuracy can always be countered by growing the empty space (i.e. portals). Sources of inaccuracy include voxelization and low rasterization resolution. Conversely this allows us make the runtime faster by using lower database and runtime resolutions, while always retaining error-free output.【一般做法是手动建遮挡体模型,umbra则是对空区域建模,核心思路就是标记不被遮挡的cell,看有哪些看得见的,看不见的不就不用管了,这样voxel越粗只会影响精度,不会导致错误】

My previous blog post discussed global nature of visibility. Umbra is divided into a local offline operation and a global runtime one. Unlike PVS systems, this allows fast reuse of partial results from previous precomputations – and thus fast local updates to the database.【回顾第一篇的内容,umbra将操作分为本地操作和全局操作,这方法的好处是可以重用预计算结果】

There’s also benefits in having a separate set of static data on which the runtime algorithm operates. It makes the system independent from anything else: it’s parallelizable and easier to integrate. An application can begin to execute Umbra’s runtime as soon as camera’s position is known for a frame.【还有的好处就是数据独立,可差值】


There’s an early and simple solution to the visibility problem aptly called the painter’s algorithm ( 画家算法’s_algorithm ). Like an old renaissance master, the painter’s algorithm has you drawing everything in back-to-front order. While the result looks correct, it causes pentimenti – overdraw that wastes computing cycles. Computer scientists knew how to solve the visibility problem in the 70s like the artists did in the 1700s. The challenge is in solving it in a fast and practical fashion. The art of doing so is called occlusion culling, and we’ve thought about it quite a bit here at Umbra.【画家算法,会导致很多的画的部分最终看不到,overdraw

Modern rendering uses z-buffers – sort of the same thing as painter’s algorithm but for each point instead of each surface. While that does solve the problem, it does that very late in the rendering pipeline. By the time we get around rejecting anything, we’ve wasted a lot of time processing things hiding behind walls. A good visibility solution should know what the hidden objects are very early on, so that we can avoid even sending them for rendering. Avoiding spending time on hidden objects allows for more budget with visible ones.【现代渲染采用zbuffer来考虑深度信息,通过这个让后面的不画,但是如果可以很早的知道哪些对象不画,可以在渲染阶段避免很多的查询计算量,提高渲染效率】

Of course you could solve visibility by doing manual work. And indeed, many occlusion culling systems are based on hand tuning portals or occluders. Less manual work means better workflow for anyone creating content for the system. The bad end of automation spectrum would require tagging all visibility manually, a huge task that will quickly become impractical. An ideal solution would be in the other end – not require manual work and also not place restrictions on content.【基于手动的遮挡剔除管用,但是我们希望这一步可以自动化】

It’s worth mentioning a couple obvious things. A visibility solution should also produce correct results. No visible object should be reported hidden. To guarantee visually correct result, any inaccuracy should be on the conservative side: reporting hidden objects visible. It should also be fast: too slow occlusion culling would negate the benefits.【不管怎样简化都一定要保证最终的渲染结果是正确的,宁可有些应该不可见的没被裁掉】

One big practical issue with visibility is that it is a global problem that involves everything – a small change in visibility can affect the other side of a 3D world. This must somehow be reflected in the computation we perform to solve it. Reducing something into a local problem would allow to divide and conquer: easy parallelization and reuse of partial results. A generic visibility solution doesn’t really lend itself to this. This also means amount of work performed by the algorithm is at least relative to visible volume. We can compute some parts beforehand offline however and some parts at runtime. This is a choice we do get to make, and it allows us to find some extent of localization.【把global问题转化为local的问题好处在于可以并行运算】


再补充些信息:umbra 3的PPT

This is the high level overview of the workflow.

From the original geometry we want to create a data structure that can be used for efficient visibility queries. And the whole process should be both fast and automatic. (and allow at least some amount of dynamic changes to the data)【预计算获得后面用于可见性查询的数据结构,同时支持数据变化】

In runtime visibility query the basic operation is figuring out which objects are visible from specific view points.

These are the requirements we had, we need to create a data structure and algorithm within these bounds.【设计数据结构和算法需要如下的支持】

  • For any input 3D scene, memory usage and processing time of the visibility system must be bounded and independent of the complexity of the input.【够快】
  • The whole process should be automatic.【全自动化】
  • It should allow fast content iteration. 【可写】
  • And it should set no limits on the size of the game worlds.【支持足够大的地图】

So what is visibility?

It’s the set of all the lines in space which do not intersect occluding geometry.


Creating data strucure for accurate visibility presentation from polygon soup is theoretically possible, but in practice the data strucure would be so large and so slow to compute it’s unusable.【基于可见几何体的数据结构,对于大场景太过庞大,没有实用性】

Instead we need to use conservative visibility, where we have bounded time and space and within these bounds give the best results we can.

Visibility is ultimately reasoning about space. Interesting thing here to note is that once we have a topological data structure like this it can be used for other things as well, not just visibility.


This is the high level overview of the workflow.

From the original geometry we want to create a data structure that can be used for efficient visibility queries. And the whole process should be both fast and automatic. (and allow at least some amount of dynamic changes to the data)【我们需要根据可见对象预计算建立实用的可见性查询数据结构】

In runtime visibility query the basic operation is figuring out which objects are visible from specific view points.【运行时查询相关数据获得可见性】

Here you’re looking at top down view of an example scene.【场景顶视图】

The requirement was that there should be no manual markup or other requirements on the input geometry. So what we take as input is all the level geometry as it is.

So we really don’t have any other information besides the long list of polygons, which are possibly grouped into objects.

Doing geometric operations with polygons directly has all kinds of difficulties related to floating point accuracy. Also, almost all real life game levels contain some small modeling errors such as t-vertices and cracks between objects. We need to be able to work with this kind of input as well.


The approach we chose is to create a cell-and-portal graph by grouping the voxels together based on proximity and connectivity. Cells are created from groups of voxels. Portals are then created on the boundaries of these groups.We chose to create portals because in the past they have been proven to be an efficient way to represent visibility, and we solve the issues of manually placed portals by generating them automatically.


In constrast to manually placed portals, we might generate thousands of portals which allows us to have accurate visibility in outdoor spaces as well.

By controlling the number of output cells and portals we can choose the output resolution of the visibility data so that it meets the memory and performance requirements.


In addition to the cell and portal graph we need to be able to figure out in which cell the viewpoint is located when making a visibility query.【可见性查询的时候我们还需要知道viewport位于哪个cell】

We create a KD-tree from the voxels, which can be packed very efficiently with only a few bits per node. The leaf nodes of the tree contain indices to cells in the graph.【voxels建立KD-tree】

The final data stucture contains the cell-and-portal graph, the view tree, and a list of statis objects contained in each cell. 【最终的数据结构】

We call the output data structure a tome .. Tome of visibility .. It contains all the information about topological structure need for visibility determination in a 3D scene.






The basic operation is the visibility query, which uses the tome data to figure out what is visible from a specific view point inside the scene.【根据当前所在场景位置去找出可见性对象】

The goal is to create a depth buffer, or occlusion buffer, from the tome, which can be then used to very efficiently test the visibility of objects.

First step is to figure out in which cell the view point is located in. This is done by traversing the view tree based on the query origin. This is a simple tree traversal and it’s a fast operation.【第一步找出视点在哪个cell

There are several options on how we could use the portal data. We could do a traditional recursive portal traversal, clipping the view frustum as we go through each portal. Or we could do ray-tracing or cone-tracing in the graph.

The approach we choose is to rasterize the portals using a custom software rasterizer optimized for this purpose. With rasterizer we need to touch each portal only once, as opposed to recursive portal traversal which can suffer from exponential blowup if there’s a lot of portal intersections in the screenspace.

(We could also traverse other kind of queries for connectivity.)【通过软光栅来处理可见性,好处是每个portal只需要处理一次】


Also really useful property of the rasterizer is that it produces a depth buffer as output, which is almost optimal data structure for doing further visibility tests on the query output.

Also with rasterization we can choose the output resolution based on the platform and accuracy requirements.

Since we’re rasterizing portals instead of occluders, it’s trivial to implement conservative rasterization, which is a requirement for getting correct results in lower resolutions.

【光栅化输出深度信息,再用来可见性判断,portals instead of occluders 的好处是光栅化粒度可控】

After we have generated the depth buffer, testing objects for visibility is quite straightforward bounding box vs. depth buffer visibility test.

If we keep the depth buffer resolution small enough to fit into the CPU cache, this test is super-super fast, and also trivial to parallelize.

boundingboxdepth buffer比较做可见性判断】





SurRender Umbra: A Visibility Determination Framework for Dynamic Environments

Efficient Algorithms for occlusion culling and shadows (Jan2005)

dPVS: An Occlusion Culling System for Massive Dynamic Environments (2004)















Occlusion Culling – oxel



  1. Hierarchical Z-Buffer Occlusion Culling


Hierarchical Z-Buffer Culling Steps

  1. Bake step – Have your artists prepare occlusion geometry for things in the world that make sense as occluders, buildings, walls…etc. They should all be super cheap to render, boxes/planes. I actually ran across this paper, Geometric Simplification For Efficient Occlusion Culling In Urban Scenes ( ), that sounded like a neat way of automating the process.【生成遮挡体】
  2. CPU – Take all the occlusion meshes and frustum cull them.【遮挡体进入CPU
  3. GPU – Render the remaining occluders to a ‘depth buffer’. The depth buffer should not be full sized, in my code I’m using 512×256. There is a Frostbite paper that mentions using a 256×114-ish sized buffer for a similar solution to do occlusion culling. The ‘depth buffer’ should just be mip 0 in a full mip chain of render targets (not the actual depth buffer).【渲染depth buffer,不需要全尺寸】
  4. GPU – Now downsample the RT containing depth information filling out the entire mipchain. You’ll do this by rendering a fullscreen effect with a pixel shader taking the last level of the mip chain and down sampling it into the next, preserving the highest depth value in a sample group of 4 pixels. For DX11, you can just constrain the shader resource view so that you can both read and render from the same mip chain. For DX9 you’ll have to use StretchRect to copy from a second mip chain, since you can’t sample and render to the same mip chain in DX9. In my code I actually found a more optimized solution, by ping-ponging between 2 mip chains one containing even and the other odd levels, and a single branch in your shader code you can get around the overhead of having to do the StretchRect and just sample from a different mip chain based on the even/odd mip level you need.【对RT降采样获取深度信息】
  5. CPU – Gather all the bounding spheres for everything in your level that could possibly be visible.【收集所有的包围盒】
  6. GPU – DX11 send the list of bounds to a compute shader, which computes the screen space width of the sphere then uses the width to compute the mip level to sample from the HiZ map generated in step 4, such that the sphere covers no more than 2 pixels wide. So large objects in screen space will sample from very high values in the mip chain since they require a coarse view of the world. Whereas small objects in screen space will sample from very low values in the mip chain. In DX9 the process is basically the same, the only difference is that you’ll render a point list of vertices, that instead of a Float3 position are Float4 bounds (xyz = position, w = radius). You’ll also send down a stream of texcoords that will represent x/y pixel values of where to encode the results of the occlusion test for that bound. Instead of a compute shader you’ll process the vertices using a vertex shader, you’ll also need to use the pixel location provided in the texcoord stream to make sure the results of the test are written out to that point in a render target, and in a pixel shader you’ll need to do the sampling to test to see if it’s visible, and output a color like white for culled, black for visible.【将所有的包围盒,以及第四步生成的HiZ Map送入Compute Shader,来比较判断遮挡与否】
  7. CPU – Try to do some work on the CPU after the occluder rendering and culling process is kicked off, for me the entire process took about 0.74 ms of GPU time on a Radeon 5450, with 900 bounds. The overhead of generating the HiZ mip chain and dispatching the culling process is the real bottleneck though, there’s little difference between 900 bounds and 10,000 bounds.occluder rendering and culling 过程中的CPU工作开销】
  8. CPU – Read back the results. DX11 you’re just reading back a buffer output by a compute shader. For DX9 you’ll have to copy the render target rendered to in step 6 containing the checker pattern of black and white pixels and then iterate over the pixels on the CPU to know what is visible and what is hidden.【读取结果】


Hierarchical Z-Buffer Downsampling Code



Hierarchical Z-Buffer Culling Code:












  1. Hierarchical Z-Buffer Occlusion Culling – Shadows


目的就是通过Z-buffer裁剪阴影,见paper:CC Shadow Volumes



  1. Create a HZB from the point of view of the player.【从玩家视点创建HZB
  2. Create a HZB from the point of view of the shadow casting light.【从光源处创建HZB
  3. Cull the shadow casting objects using the lights HZB. 1. If the object is culled, mark it as culled.【裁剪objects
  4. If the object isn’t culled, we need to compute a rough approximation of the extents of shadow volume. You can think of this volume as being the minimum depth points on your bounding box (A-D) and the maximum depth sampled from where those points project onto the light’s HZB (E-H) (Figure 1). Now that you have a bounding volume for the shadow, you can transform that volume into the camera space of the player, and test it against the player’s HZB; just as if the extents were any other object you were attempting to cull with the HZB (Figure 2). If you can’t see the volume, you can write out that the caster is culled, otherwise write it as visible.【如果object没有被裁减,才需要计算shadow volume



  1. Hierarchical Z-Buffer Occlusion Culling – Generating Occlusion Volumes


occlusion volume 特征:

  1. Conservativeness – Doesn’t extend beyond the surface of the mesh
  2. Simplicity – The occlusion volume is made of very few triangles or is fast to render
  3. Volume Conservation – Closely matches the original mesh’s volume
  4. Dynamic – Some games have large movable occluders or destroyable walls


Normal methods of simplifying a mesh such as typical triangle simplification causes problems in both the area of conservativeness and volume conservation. In some cases you can use the physics collision mesh if available. One thing to be aware of is that when the physics volume is larger than the visual mesh it can cause false occlusions leading to things popping into view. Also not every mesh needs a physics volume nor are physics volumes always meshes. Will Vale’s talk from GDC 2011 covered using physics collision meshes pretty thoroughly for HZB-OC, you should check it out.【直接使用碰撞体会导致错误,原因是碰撞体一般都比对应的可见体大】




  1. Find all the voxels completely inside a mesh


The voxelization algorithms I ended up using comes from this paper Fast Parallel Surface and Solid Voxelization on GPUs [Schwarz 2010]. ( ) I used the 3.1 section for surface voxelization and the 4.1 section for solid voxelization. Both are required because a voxel is considered part of the solid volume if the center of the voxel is inside the mesh. We want the entire voxel to be inside the mesh though. So you have to remove the excess voxels from the solid set using the surface set.【方法来源】



My first implementation was in C++ and entirely on the CPU which took about 20 seconds to run on a mesh with ~10,000 triangles and a 503 voxel volume. In the version I’m presenting here I moved to C#, when I did that time went up to about 60 seconds. So I ended up porting a similar version of the CPU implementation to the GPU using OpenCL (Cloo ), on the same data it ran in about 6 seconds on my Nvidia 540M, with no attempt to optimize.CPU,GPU性能差异】


One unfortunate limitation of Schwarz’s solid voxelization algorithm it that it requires watertight meshes. The reason for this is you’re shooting a ray down a column of voxels until you intersect with a triangle. You’re then xor’ing the bit for each voxel signaling that this column of voxel’s is inside or outside. Voxels that are inside a mesh will be Xor’ed an odd number of times, where as outside voxels will be xored an even number of times. So if there is a hole in the mesh, you’d leave a incomplete trail of xor’s in that column, leaving every voxel from the last triangle to the bounding box marked as ‘inside’.【算法的问题】


Two papers I’ve run across that might contain solutions to this problem are An Automatic Hole-Filling Algorithm for Polygon Meshes and Complete Polygonal Scene Voxelization.【问题的解决方法】


  1. Find the voxel at the densest point in the volume



  1. Expand a box from this point until all sides collide with the mesh surface or another box


Once you’ve found the densest voxel you’re going to create a 1^3 voxel sized box at that location. You’ll then proceed to iteratively expand each side of the box in voxel space until you can’t expand any side of the box without entering an empty voxel or another box that has already been placed.【从最密的层向上找到最大的box】


  1. Repeat 2-3 until you’ve consumed X% of the total volume



  1. Filter small or useless boxes (Unimplemented)



  1. Use a Constructive Solid Geometry (CSG) algorithm to merge the boxes you create




  1. Oxel: A Tool for Occluder Generation







































Android Terrain Rendering



To my surprise, I found out that GPU (PowerVR SGX 540) in my venerable Nexus S (2010) supports vertex texture fetch (VTF). That is, accessing texture pixels in vertex shader — a very useful feature for terrain rendering. About a year ago, when I started investigating terrain rendering on Android devices, I did some searching for VTF support on Android devices and figured out it’s not there yet (similar to situation years ago when desktop OpenGL 2.0 was released with support for texture sampling in GLSL verter shaders but most GL implementation just reported GL_MAX_VERTEX_TEXTURE_IMAGE_UNITS to be zero). Now I don’t know how I missed it on my own phone, maybe there was some Android update with updated GPU drivers during last year? I have no idea how many other devices support it now. Hopefully, newest ones with OpenGL ES 3 support it all. I wouldn’t be surprised if among GLES 2 devices only PowerVR + Android 4+ combinations supported it.

opengl es 3.0 全面支持 vertex texture fetch(VTF)】




Anyway, let’s focus on terrain rendering – rough outline:


  • Put the entire height-map into a texture. 【准备好高度图纹理】
  • Have a small 2D grid (say 16×16 or 32×32) geometry ready for rendering terrain tiles.【准备好2D网格】
  • Build a quad tree over over the terrain. Root node covers entire terrain and each child then covers 1/4th of parent’s area.【建地形四叉树】
  • Now we can start rendering, do this every frame: 【渲染】
    • Traverse the quadtree starting from the root.【遍历四叉树】
    • For each child node test if the geometry grid provides sufficient detail for rendering the area covered by this node:【检查每一个节点,判断是否属于渲染区域】
      • YES it does, mark this node for rendering and end traversal of this subtree.【渲染节点,添加到子树】
      • NO it does not, continue traversal and test children of this node (unless we’re at leaf already).【继续遍历子节点】
    • Now take this list of marked nodes and render them. The same 2D grid is used to render each tile: it’s scaled according to tile’s covered area and its vertices are displaced by height values read from texture.【获得所有的标记节点并渲染他们】


This is basically what I originally wanted for Multilevel Geomipmapping years ago but couldn’t do in the end because of the state of VTF support on desktop at that time.【这是最初的想法,但是迫于VTF支持而一开始没有实现】


So what exactly is the benefit of VTF over let’s say geomipmapping here?【VTF能带来什么好处】


Main benefit is ability to get height of the terrain at any position (and multiple times) when processing each tile vertex. In traditional geomipmapping, even if you can move tile vertices around it’s no use since you have only one fixed height value available. With VTF, you can modify tile position of vertex as you like and still be able to get correct height value. 【好处就是可以很容易的获取和修改每一个节点的高度】

This greatly simplifies tasks like connecting neighboring tiles with different levels of detail. No ugly skirts or special stitching stripes of geometry are needed as you can simply move edge points around in the vertex shader. Also geomorphing solutions begin to be usable without much work. And you can display larger terrains as well. With geomipmapping you always have to draw fixed number of tiles (visible leaves) — a number that goes up fast when you enlarge the terrain. VTF may allow you draw fixed number of tiles — regardless of actual terrain size (as distant tiles cover much larger area compared to geomipmap tiles with fixed area). Another one, terrain normals can be calculated inside the shaders from neighboring height values.


Finally, since heightmap is now a regular texture you get filtering and access to compressed texture formats to stuff more of the terrain to the memory.






A few weeks ago I had some sort of a urge to try game and/or graphics development on Android. I’ve been doing some Android application development lately so I had all the needed tools setup already. However, what I wanted was some cross-platform engine or framework so that I could develop mostly on desktop. Working with emulators and physical devices is tolerable when doing regular app development and of course you want to use UI framework native for the platform. For games or other hardware stressing apps I really wouldn’t like to work like that. Emulator is useless as it doesn’t support OpenGL ES 2.0 and is generally very slow. And it’s no pleasure with physical device either, grabbing it all the time off the table, apk upload and installation takes time, etc. So I started looking for some solutions. 【手机平台的图形渲染太太太慢了】


The Search


Initially, I considered using something like Oxynege. It uses Object Pascal syntax, can compile to .NET bytecode and also Java bytecode (and use .NET and Java libs). However, I would still need to use some framework for graphics, input, etc. so I would be locked in .NET or Java world anyway. I’m somewhat inclined to pick up Oxygene in the future for something (there’s also native OSX and iOS target in the works) but not right now. Also, it’s certainly not cheap for just a few test programs. Then there is Xamarin – Mono for Android and iOS (MonoGame should work here?) but even more expensive. When you look Android game engines and frameworks specifically you end up with things like AndEngine, Cocos2D-X, and libGDX. Oxynege 功能不够】


After a little bit of research, I have settled for libGDX. It is a Java framework with some native parts (for tasks where JVM performance maybe inadequate) currently targeting desktop (Win/OSX/Linux), Android, and HTML5 (Javascript + WebGL in fact). Graphics are based on OpenGL (desktop) and OpenGL ES (Android and web). Great thing is that you can do most of the development on desktop Java version with faster debugging and hot swap code. One can also use OpenGL commands directly which is a must (at least for me).【libGDX 基于 opengl 的跨平台游戏开发框架】


The Test


I wanted to do some test program first and I decided to convert glSOAR terrain renderer to Java and OpenGL ES (GLES). The original is written in Object Pascal using desktop OpenGL. The plan was to convert it to Java and OpenGL ES with as little modifications as possible. At least for 3D graphics, libGDX is relatively thin layer on top of GLES. You have some support classes like texture, camera, vertex buffer, etc. but you still need to know what’s a projection matrix, bind resources before rendering, etc. Following is the listing of a few ideas, tasks, problems, etc. during the conversion (focusing on Java and libGDX).【原来写的SOAR是基于Opengl的桌面地形渲染工具,这里的目标是修改它使得支持移动端】




Using Eclipse, I created the projects structure for libGDX based projects. It is quite neat actually, you have a main Java project with all the shared platform independent code and then one project for each platform (desktop, Android, …). Each platform project has just a few lines of starter code that instantiates your shared main project for the respective platform (of course you can do more here).【工程设置】


Two problems here though. Firstly, changes in main project won’t trigger rebuild of the Android project so you have to trigger it manually (like by adding/deleting empty line in Android project code) before running on Android. This is actually a bug in ADT Eclipse plugin version 20 so hopefully it will be fixed for v21 (you star this issue).【问题1:编译调试不方便】


Second issue is asset management but that is easy to fix. I want to use the same textures and other assets in version for each platform so they should be located in some directory shared between all projects (like inside main project, or few dir levels up). The thing is that for Android all assets are required to be located in assets subdirectory of Android project. Recommended solution for libGDX here is to store the assets in Android project and create links (link folder or link source) in Eclipse for desktop project pointing to Android assets. I didn’t really like that. I want my stuff to be where I want it to be so I created some file system links instead. I used mklink command in Windows to create junctions (as Total Commander wouldn’t follow symlinks properly):【问题2asset management 不方便】


d:\Projects\Tests\glSoarJava> mklink /J glSoarAndroid\assets\data Data

Junction created for glSoarAndroid\assets\data <> Data

d:\Projects\Tests\glSoarJava> mklink /J glSoarDesktop\data Data

Junction created for glSoarDesktop\data <> Data


Now I have shared Data folder at the same level as project folders. In future though, I guess there will be some platform specific assets needed (like different texture compressions, sigh).




Too bad Java does not have any user defined value types (like struct in C#). I did split TVertex type into three float arrays. One for 3D position (at least value type vectors would be nice, vertices[idx * 3 + 1] instead of vertices[idx].y now, which is more readable?) and rest for SOAR LOD related parameters. Making TVertex a class and spawning millions of instances didn’t seem like a good idea even before I began. It would be impossible to pass positions to OpenGL anyway.【自定义数据类型用于SOAR LOD相关参数】


Things like data for VBOs, vertex arrays, etc. are passed to OpenGL in Java using descendants of Buffer (direct) class. Even stuff like generating IDs with glGen[Textures|Buffers|…] and well basically everything that takes pointer-to-memory parameters. Of course, it makes sense in environment where you cannot just touch the memory as you want. Still it is kind of annoying for someone not used to that. At least libGDX comes with some buffer utils, including fast copying trough native JNI code.【变量的存储】




Roots of SOAR terrain rendering method are quite old today and come from the times when it was okay to spend CPU time to limit the number of triangles sent to GPUs as much as possible. That has been a complete opposite of the situation on PCs for a long time (if did not have old integrated GPU that is). I guess today that is true for hardware in mobile devices as well (at least the new ones). And there will also be some Java/Dalvik overhead…【原有的SOAR性能不行】


Anyway, this is just an exercise so it the end result may very well be useless and the experience gained during development is what counts.







I have added experimental VTF based terrain renderer to Terrain Rendering Demo for Android (
直接接下去看) testbed and it looks promising. Stitching of the tiles works flawlessly. More work is needed on selecting nodes for rendering (render node or split to children?). Currently, there’s only simple distance based metric but I want to devise something that takes classical “screen-space error” into account. And maybe some fiddling with geomorphing on top …【直接看demo】



Terrain Rendering Demo for Android


I finally got around to releasing Android terrain rendering demo I’ve been working on last few months (a few moments here and there). I did the bulk of work in November 2012, partly described in posts Porting glSOAR to Android and OpenGL ES, Part 1 and Porting glSOAR to Android and OpenGL ES, Part 2 – third part is still just a draft 【最终完成了这个地形demo】


Anyway, here’s the current version which supports Geomipmapping and SOAR terrain rendering methods. Gritty details about the internals will follow in some future post. There is also a nearly identical desktop version for reference, advantage of using LibGDX for this.【这个地形demo支持Geomipmapping和SOAR地形渲染】













glSOAR is ObjectPascal implementation of SOAR terrain rendering algorithm using OpenGL. It can be compiled using FPC or Delphi compiler and works in Windows and Linux (only benchmarking mode). No fancy OpenGL extensions are required, it should work with any OpenGL 1.2 and newer drivers.glSOAR是SOAR算法的Opengl实现】


SOAR algorithm was published by Peter Lindstrom and Valerio Pascucci in 2001. Papers and original C implementation can be found at SOAR starts with coarse terrain mesh each frame and uses view-dependent refinement (standard longest edge bisection) to calculate mesh to be rendered – stress is on the CPU.【SOAR的基本思想就是基于视点的地形渲染优化】









Terrain Simplification Simplified: A General Framework for View-Dependent Out-of-Core Visualization


Abstract—This paper describes a general framework for out-of-core rendering and management of massive terrain surfaces. The two key components of this framework are: view-dependent refinement of the terrain mesh; and a simple scheme for organizing the terrain data to improve coherence and reduce the number of paging events from external storage to main memory.【两大关键做法:基于视点的细化,数据存储优化】