软光 – a practical implementation

软光 – a practical implementation

Improving the Rasterization Algorithm

 
 

All the techniques we presented in the previous chapters are really the foundation of the rasterization algorithm. Though, we have only implemented these techniques in a very basic way. The GPU rendering pipeline and other rasterization based production renderers use the same concepts but they used highly optimized version of these algorithms. Presenting all the different tricks that are used to speed up the algorithm goes way beyond the scope of an introduction. We will just review some of them quickly now but we plan to devote a lesson to this topic in the future.【前几章讲的所有的技术都是光栅化的基础算法,这边的实现都是实现的基础方法。基于GPU的做法在概念上与这边基础算法是一致的,只是做了实现上的优化。】

 
 

  • Aliasing and Anti-Aliasing

     
     

    First, let’s consider one basic problem with 3D rendering. If you zoom up on the image of the triangle that we renderered in the previous chapter, you will notice that the edges of the triangle are not regular (in fact, it is not specific to the edge of the triangle, you can also see that the checkerboard pattern is irregular on the edge of the squares). The steps that you can easily see in figure 1, are called jaggies. These jagged edges or stair-stepped edges (depending on how you prefer to call them) are not really an artifact. This is just the result of the fact that the triangle is broken down into pixels.【如果你放大看三角形的图像,你会发现他的边不再是连续的了,这个就是jaggies,存在的原因就是三角形光栅化为了一个个像素块显示】

     
     

     
     

    What we do with the rasterization process is breaking down a continuous surface (the triangle) into discrete elements (the pixels), a process that we already mentioned in the introduction to rendering. The problem is similar to trying to represent a continuous curve or surface with Lego bricks. You simply can’t, you will always see the bricks (figure 2). The solution to this problem in rendering is called anti-aliasing (also denoted AA). Rather that rendering only 1 sample per pixel (check if the pixel overlaps the triangle by testing if the point in the center of the pixel covers the triangles), we split the pixel into sub-pixels and repeat the coverage test for each sub-pixels. Of course each sub-pixel is nothing else than another brick thus this doesn’t solve the problem entirely. Nonetheless it allows to capture edges of the objects with slightly more precision. Pixels are most of the time divided into a N by N number of sub-pixels where N is generally a power of 2 (2, 4, 8, etc.), though it can technically take on any value greater or equal to 1 (1, 2, 3, 4, 5, etc.). There are in fact different ways of addressing this aliasing issue. The method we described belong to the category of sampling-based anti-aliasing methods.【我们怎样去break连续的surface到离散的pixel的,这就是anti-aliasing的过程。具体的方法有很多种】

     
     

     
     

     
     

     
     

     
     

     
     

    The pixel final color is computed as the sum of all the sub-pixels color divided by the total number of sub-pixels. Let’s take an example (as with pixels, if a sub-pixel or sample covers a triangle, it then takes on the color of that triangle, otherwise it takes on the background color which is usully black). Imagine that the triangle is white. If only 2 or the 4 samples overlap the triangle, the final pixel color will be equal to 0+0+1+1/4=0.5. The pixel won’t be completely white, but it won’t be complete black either. Thus rather than having a “binary” transition between the edge of the triangle and the background, that transition is more gradual, which has for effect to visually reduce the stair-stepped pixels artifact. This is what we call anti-aliasing. The understand anti-aliasing you need to study signal processing theory, which again is a very large and pretty complex topic on its own.【pixel最终的颜色是所有sub-pixel的集合,每个sub-pixel则是部分覆盖当前的pixel位置的对象颜色,最终取一个均值,这就是抗锯齿的做法】

     
     

    The reason why it is best to choose N as a power of 2, is because most processors these days, can run several instructions in parallel and the number of instruction run in parallel is also generally a power of 2. You can look on the Web for things such as SSE instruction sets which are specific to CPUs, but GPUs use the same concept. SSE is a feature that is available on most modern CPU and that can be used to run generally 4 or 8 floating point calculations at the same (in one cycle). All this means is that, for the price of 1 floating point operation, you actually get 3 or 7 for free. This can in theory speed up your rendering time by 4 or 8 (you can never reach that level of performance though because you need to pay a small penalty for setting these instructions up). You can use SSE instructions for example to render 2×2 sup-pixels for the cost of computing 1 pixel, and as a result you get smoother edges (the stair-stepped edges are less visible).【支持的最多的sub-pixel的数量最好是2的倍数】

     
     

     
     

  • Rendering Blocks of Pixels

     
     

    Another common technique to accelerate rasterization is to render blocks of pixels, but rather than testing all pixels contained in the block, we first start to test pixels at the corners of the block. GPUs algorithms can use blocks of 8×8 pixels. In the fact, the technique that is being used is more elaborate and is based on a concept of tiles, but we won’t detail it here. If all four corners of that 8×8 grid cover the triangle, then necessarily, the other pixels of the grid also cover the rectangle (as shown in figure 7). In that case, there is no need to test all the other pixels which saves obviously a lot of time. They can just be filled up with the triangle colors. If vertex attributes needs to interpolated across the pixels block, this is also straightforward because if you have computed them at the block’s corners, then all you need to do is linearly interpolate them in both direction (horizontally and vertically). This optimisation only works when triangles are close to the screen and thus large in screen space. Small triangles don’t benefit from this technique.【加速光栅化的一种方式,把pixel打包成block,先测试对象与block的四角,如果不同再细化处理。这种方法对大屏幕】

     
     

     
     

  • Optimizing the Edge Function

     
     

    The edge function too can be optimized. Let’s have a look at the function implementation again:【edge函数的实现如下,也可以被优化】

     
     

     
     

    Recall that a and b in this function are the triangle vertices and that c is the pixel coordinates (in raster space). Once interesting to note is that this function is going to be called for each pixel that is contained within the bounding box of the triangle. Though while we iterate over multiples pixels only c changes. The variable a and b stay the same. Suppose we evaluate the equation one time and get a result w0:【a,b是三角形顶点,c是pixel坐标,对于bounding box里面每一个pixel都需要跑这个函数,我们是否可以根据现有的去更新新的w0,假设当前w0是】

     
     

     
     

    然后考虑c的数值变化为s(the per pixel step),新的w0就是:

     
     

     
     

    两者相减我们得到:

     
     

     
     

    这个等式的右边部分是恒定值,因为a,b,s都是固定的。因此我们就得到:

     
     

     
     

    The edge function uses 2 mults and 5 subs by with this trick it can be reduced to a simple addition (of course you need to compute a few initial values). This technique is well documented on the internet. We won’t be using it in this lesson but we will study it in more detail and implement it in another lesson devoted to advanced rasterization techniques.【结果是我们把2乘法和5减法合并成一个加法,可以大量提升性能。】

     
     

     
     

    Fixed Point Coordinates

     
     

    Finally and to conclude this section, we will briefly talk about the technique that consists of converting vertex coordinates which are initially defined in floating-point format to fixed-point format just before the rasterization stage. Fixed-point is the fancy word (in the fact the correct technical term) for integer. When vertex coordinates are converted from NDC to raster space, they are then also converted from floating point numbers to fixed point numbers. Why do we do that? There is no easy and quick answer to this question. But to be short, let’s just say that GPUs use fixed point arithmetic because from a computing point of view, manipulating integers is easier and faster than manipulating floats or doubles (it only requires logical bit operations). Again this is just a very generic explanation. The conversion from floating point to integer coordinates and how is the rasterization process implemented using integer coordinates is a large and complex topic which is not documented on the Internet (you will find very little information about this topic which is very strange considering that this very process is central to the way modern GPUs work).float point format 替代 fixed point format是因为 int 的加减乘数都比较快,但是这个的实现比较复杂】

     
     

    【这一段总的就是说,这种方式实现很复杂,这里不详细讨论和使用了】

    The conversion step involves to round off the vertex coordinates to the nearest integer. Though if you only do so, then you sort of snap the vertex coordinates to the nearest pixel corner coordinates. This is not so much an issue when you render a still image, but it creates visual artefacts with animation (vertices are snapped to different pixels from frame to frame). 【转换步骤包括将顶点坐标四舍五入到最接近的整数。但是如果你只这样做,当你渲染一个静止图像时,是可以的,但是对于不止一帧的连续渲染,它会带来视觉假象(顶点被捕捉到帧与帧之间的不同像素)。】

    The workaround is to actually convert the number to the smallest integer value but to also reserve some bits to encode the sub-pixel position of the vertex (the fractional part of the vertex position). GPUs typical use 4 bits to encode sub-pixel precision (you can search graphics APIs documentation for the term sub-pixel precision). In other words, on a 32 bits integer, 1 bit might is used to encode the number’s sign, 27 bits are used to encode the vertex integer position, and 4 bits are used to encode the fractional position of the vertex within the pixel. This means that the vertex position is “snapped” to the closest corner of a 16×16 sub-pixel grid as shown in figure 8 (with 4 bits, you can represent any integer number in the range [1:15]).

    【解决方法是实际将数字转换为最小的整数值,但也保留一些位来对顶点的子像素位置进行编码。 GPU通常使用4位来编码子像素精度。换句话说,在一个32位的整数上,1位可能被用来编码数字的符号,27位被用来编码顶点整数位置,而4位被用来编码像素内顶点的分数位置。这意味着顶点位置被”捕捉”到16×16子像素网格的最近角落,如图8所示(4位,可以表示[1:15]范围内的任何整数)】

    Somehow the vertex position is still snapped to some grid corner, but snapping in this case is less of a problem than when the vertex is snapped to the pixel coordinates. This conversion process leads to many other issues one of which is integer overflow (overflow occurs when the result of an arithmetic operation produces a number that is greater than the number that can be encoded with the number of available bits). This may happen because integers cover a smaller range of values than floats. Things gets also sort of complex when anti-aliasing is thrown into the mix. It would take a lesson on its own to explore the topic in detail.【有时候顶点位置仍然会被捕捉到某个网格角点,在这种情况下出现的问题要比顶点与像素坐标对齐时的问题小。 这个转换过程导致了许多其他问题,其中之一是整数溢出(当算术运算的结果产生的数字大于可用可用比特数编码的数字时发生溢出)。 这可能是因为整数覆盖的值小于浮点数。 当反锯齿投入混合时,情况也变得复杂。 这将需要一个教训来详细探讨这个话题。】

     
     

    Fixed points coordinates allow to to speed up the rasterization process and the edge function even more. This is one of the reasons for converting vertex coordinates to integers. These optimisation techniques will be presented in another lesson.【但是使用fixed point coordinates能有效提升效率】

     
     

     
     

    Notes Regarding our Implementation of the Rasterization Algorithm

     
     

    【最终我们来快速的review一下代码】

    Finally, we are going to quickly review the code provided in the source code chapter. Here is a description of its main components:

     
     

    • We will use the function computeScreenCoordinates to compute the screen coordinates using the method detailed in the lesson devoted to the pinhole camera model. This is essentially needed to be sure that our render output can be compared to a render in Maya which is also using a physically-based camera model.【】