Tag: Rendering



首先确认问题出在哪里:经过测试只有出现在 SolarMeshWidgetComponent 这个相关。






CameraVector. This is the world space camera direction. Since virtual reality glasses use 2 separate camera’s ingame, with both a slightly different direction, you won’t get the same result.



Fresnel 效果不一致:


LOD 效果不一致:






















Ue4 渲染流程

TranslucentRendering.h: Translucent rendering definitions.

TranslucentRendering.cpp: Translucent rendering implementation.





后面两个函数处理参数后都是调用的 DrawMesh() 实现,DrawMesh 实现框架如下:



这里关键点就是判断,只处理透明材质: if (IsTranslucentBlendMode(BlendMode))



然后来看可能是那里调用了上面的处理函数,TranslucentRendering.cpp 文件里面最后的几个实现方法是来自 DeferredShadingRenderer.h: Scene rendering definitions. 的,我们溯源会看到如下信息:


目测 DeferredShadingRenderer.h: Scene rendering definitions. 定义了完整的场景渲染方法。
















  • 场景的描述
  • 场景遍历和拣选
  • 渲染的执行






  • FScene 场景类
  • FPrimitiveSceneProxy 场景里的几何体类
  • FPrimitiveSceneInfo 场景里的结点(拥有几何体和状态信息)



  • FMaterial 材质接口类,提供材质属性的查询(eg. blend mode)shader查找。
  • FMaterialResoruce UMaterial实现的具体的FMaterial
  • FMaterialRenderProxy 渲染线程用的Material对象,提供FMaterial的访问和材质参数的访问(eg. scaler, vector, texture parameter等参数)








在UE4中,使用了不同于上面的方式进行处理,它对几何体进行分类处理(Static Primitive和Dynamic Primitive)。


  • Static Render Path

    在FScene对象中存在一些static draw list,在PrimitiveSceneProxy被插入场景中时,会通过调用FPrimitveSceneProxy::DrawStaticElements()来收集FStaticMeshElements数据。 然后创建相应的drawing policy对象实例放入到draw list中去。这个drawing policy对象是按照材质排序放入到draw list中的。


  • Dynamic Render Path



上面的两种渲染路径并不冲突,一个FPrimitiveSceneProxy可以实现DrawStaticElements()和DrawDynamicElements()来同时支持它们,也就是说这个SceneProxy既有Static FMeshElements又有Dynamic FMeshElements。






对于简单的渲染引擎,只会简单地对可见的几何体执行渲染(设置渲染状态、GPU Shader和参数、发射Draw指令),而UE4的渲染比较复杂,进行多pass绘制,下面列出它的各个pass顺序并逐一介绍.


  1. PASS_0: PrePass/Depth Only Pass

    该pass使用FDepthDrawingPolicy策略进行绘制,只绘制depth到Depth-Buffer,这个有利于减少后面的Base pass中的pixel填充,节省pixel-shader的执行。


  2. PASS_1: Base pass

    pass绘制不透明的和masked material的属性的几何体,输入材质属性到G-Buffer; 同时计算Lightmapsky lighting的贡献量到scene color buffer中。下面罗列出相关函数。


  3. PASS_2: Issue Occlusion Queries / BeginOcclusionTests



  4. PASS_3: ShadowMap(阴影计算)

    针对每个光源渲染相应的Shadowmap, 光源也被累积到translucency lighting volumes(这块不明白,理解估计有误)


  5. PASS_4: Lighting(光照计算)


  • Pre-lighting composition lighting stage:预处理组合型光照(eg. deferred decals, SSAO)
  • Render lights:光照计算


  1. PASS_5: Draw atmosphere



  2. PASS_6 Draw Fog



  3. PASS_7: Draw translucency


    Translucency is accumulated into an offscreen render target where it has fogging applied per-vertex so it can integrate into the scene. Lit translucency computes final lighting in a single pass to blend correctly.


  4. PASS_8: Post Processing












RayTracing – Adding Reflection and Refraction

The other advantage of ray-tracing is that, by extending the idea of ray propagation, we can very easily simulate effects like reflection and refraction, both of which are handy in simulating glass materials or mirror surfaces. In a 1979 paper entitled “An Improved Illumination Model for Shaded Display”, Turner Whitted was the first to describe how to extend Appel’s ray-tracing algorithm for more advanced rendering. Whitted’s idea extended Appel’s model of shooting rays to incorporate computations for both reflection and refraction.【扩展光线追踪,其模拟的方法很容易模拟反射折射】


In optics, reflection and refraction are well known phenomena. Although a whole later lesson is dedicated to reflection and refraction, we will look quickly at what is needed to simulate them. We will take the example of a glass ball, an object which has both refractive and reflective properties. As long as we know the direction of the ray intersecting the ball, it is easy to compute what happens to it. Both reflection and refraction directions are based on the normal at point of intersection and the direction of the incoming ray (the primary ray). To compute the refraction direction we also need to specify the index of refraction of the material. Although we said earlier that rays travel on a straight line, we can visualize refraction as the ray being bent. When a photon hits an object of a different medium (and thus a different index of refraction), its direction changes. The science of this will be discussed in more depth later. As long as we remember that these two effects depend of the normal vector and the incoming ray direction, and that refraction depends of the refractive index of the material we are ready to move on.【我们举个玻璃球的例子来看折射反射,一根光线射到玻璃球,折返射的方向都可根据物理规则知道,如下图】



Similarly, we must also be aware of the fact that an object like a glass ball is reflective and refractive at the same time. We need to compute both for a given point on the surface, but how do we mix them together? Do we take 50% of the reflection result and mix it with 50% of the refraction result? Unfortunately, it is more complicated than that. The mixing of values is dependent upon the angle between primary ray (or viewing direction) and both the normal of the object and the index of refraction. Fortunately for us, however, there is an equation that calculates precisely how each should be mixed. This equation is know as the Fresnel equation. To remain concise, all we need to know, for now, is that it exists and it will be useful in the future in determining the mixing values.【那么我们如何混合折返射,混合的比例和入射光线的角度相关,就是Fresnel函数表示的】


So let’s recap. How does the Whitted algorithm work? We shoot a primary ray from the eye and the closest intersection (if any) with objects in the scene. If the ray hits an object which is not a diffuse or opaque object, we must do extra computational work. To compute the resulting color at that point on, say for example, the glass ball, you need to compute the reflection color and the refraction color and mix them together. Remember, we do that in three steps. Compute the reflection color, compute the refraction color, and then apply the Fresnel equation.【算法流程就是,从眼睛发出射线射到第一个物体,如果物体不是不透明的,就需要拆分成折返射光线继续参与光线追踪,最后的结果做颜色比例混合。下面就是再说这三步】


  1. First we compute the reflection direction. For that we need two items: the normal at the point of intersection and the primary ray’s direction. Once we obtain the reflection direction, we shoot a new ray in that direction. Going back to our old example, let’s say the reflection ray hits the red sphere. Using Appel’s algorithm, we find out how much light reaches that point on the red sphere by shooting a shadow ray to the light. That obtains a color (black if it is shadowed) which is then multiplied by the light intensity and returned to the glass ball’s surface.
  2. Now we do the same for the refraction. Note that, because the ray goes through the glass ball it is said to be a transmission ray (light has traveled from one side of the sphere to other; it was transmitted). To compute the transmission direction we need the normal at the hit point, the primary ray direction, and the refractive index of the material (in this example it may be something like 1.5 for glass material). With the new direction computed, the refractive ray continues on its course to the other side of the glass ball. There again, because it changes medium, the ray is refracted one more time. As you can see in the adjacent image, the direction of the ray changes when the ray enters and leaves the glass object. Refraction takes place every time there’s a change of medium and that two media, the one the ray exits from and the one it gets in, have a different index of refraction. As you probably know the refraction index of air is very close to 1 and the refraction index of glass is around 1.5). Refraction has for effect to bend the ray slightly. This process is what makes objects appear shifted when looking through or at objects of different refraction indexes. Let’s imagine now that when the refracted ray leaves the glass ball it hits a green sphere. There again we compute the local illumination at the point of intersection between the green sphere and refracted ray (by shooting a shadow ray). The color (black if it is shadowed) is then multiplied by the light intensity and returned to the glass ball’s surface
  3. Lastly, we compute the Fresnel equation. We need the refractive index of the glass ball, the angle between the primary ray, and the normal at the hit point. Using a dot product (we will explain that later), the Fresnel equation returns the two mixing values.


Here is some pseudo code to reinforce how it works:


One last, beautiful thing about this algorithm is that it is recursive (that is also a curse in a way, too!). In the case we have studied so far, the reflection ray hits a red, opaque sphere and the refraction ray hits a green, opaque, and diffuse sphere. However, we are going to imagine that the red and green spheres are glass balls as well. To find the color returned by the reflection and the refraction rays, we would have to follow the same process with the red and the green spheres that we used with the original glass ball. This is a serious drawback of the ray tracing algorithm and can actually be nightmarish in some cases. Imagine that our camera is in a box which has only reflective faces. Theoretically, the rays are trapped and will continue bouncing off of the box’s walls endlessly (or until you stop the simulation). For this reason, we have to set an arbitrary limit that prevents the rays from interacting, and thus recursing endlessly. Each time a ray is either reflected or refracted its depth is incremented. We simply stop the recursion process when the ray depth is greater than the maximum recursion depth.【这个算法是递归的,这点要注意,最好设置合理的条件已产生合理的结果。】


























RayTracing – Implementing the Raytracing Algorithm



We have covered everything there is to say! We are now prepared to write our first ray-tracer. You should now be able to guess how the ray-tracing algorithm works.【我们开始来实现算法】


First of all, take a moment to notice that the propagation of light in nature is just a countless number of rays emitted from light sources that bounce around until they hit the surface of our eye. Ray-tracing is, therefore, elegant in the way that it is based directly on what actually happens around us. Apart from the fact that it follows the path of light in the reverse order, it is nothing less that a perfect nature simulator.【光线在自然界的传播只是从光源发射的无数光线,它们会反射到我们的眼睛表面。



The ray-tracing algorithm takes an image made of pixels. For each pixel in the image, it shoots a primary ray into the scene. The direction of that primary ray is obtained by tracing a line from the eye to the center of that pixel. Once we have that primary ray’s direction set, we check every object of the scene to see if it intersects with any of them. In some cases, the primary ray will intersect more than one object. When that happens, we select the object whose intersection point is the closest to the eye. We then shoot a shadow ray from the intersection point to the light (Figure 6, top). If this particular ray does not intersect an object on its way to the light, the hit point is illuminated. If it does intersect with another object, that object casts a shadow on it (figure 2).ray-tracing基于图片的pixel,对于每一个pixel,我们从眼睛所在位置向pixel位置发出射线,然后我们检查场景每一个物体与光线的相交关系。很多情况下会与多个物体相交,这时候我们处理离眼睛最近的那个对象。发射shadow light,如果这光线只与这对象相交,则是亮的,否则是其他物体投下的阴影区域】



If we repeat this operation for every pixel, we obtain a two-dimensional representation of our three-dimensional scene (figure 3).【遍历pixel获得图像结果】



Here is an implementation of the algorithm in pseudocode:【伪代码】



The beauty of ray-tracing, as one can see, is that it takes just a few lines to code; one could certainly write a basic ray-tracer in 200 lines. Unlike other algorithms, such as a scanline renderer, ray-tracing takes very little effort to implement.ray-trace的美妙在于,一个基本的实现就200行左右,如上图所示】


This technique was first described by Arthur Appel in 1969 by a paper entitled “Some Techniques for Shading Machine Renderings of Solids”. So, if this algorithm is so wonderful why didn’t it replace all the other rendering algorithms? The main reason, at the time (and even today to some extent), was speed. As Appel mentions in his paper:【这技术在1969年首次提出,但是在实际使用中没有推广的原因在于渲染时间还是很长】


“This method is very time consuming, usually requiring for useful results several thousands times as much calculation time as a wire frame drawing. About one half of of this time is devoted to determining the point to point correspondence of the projection and the scene.”


In other words, it is slow (but as Kajiya – one of the most influential researchers of all computer graphics history -once said: “ray tracing is not slow – computers are”). It is extremely time consuming to find the intersection between rays and geometry. For decades, the algorithm’s speed has been the main drawback of ray-tracing. However, as computers become faster, it is less and less of an issue. Although one thing must still be said: comparatively to other techniques, like the z-buffer algorithm, ray-tracing is still much slower. However, today, with fast computers, we can compute a frame that used to take one hour in a few minutes or less. In fact, real-time and interactive ray-tracers are a hot topic.【换句话说就是慢,射线求交慢,但是在硬件越来越好的情况下,这越来越不是问题。但是相对于光栅化来讲,还是非常慢,但是实时的光线追踪已经是很热门的研究课题。】


To summarize, it is important to remember (again) that the rendering routine can be looked at as two separate processes. One step determines if a point is visible at a particular pixel (the visibility part), the second shades that point (the shading part). Unfortunately, both of the two steps require expensive and time consuming ray-geometry intersection tests. The algorithm is elegant and powerful but forces us to trade rendering time for accuracy and vise versa. Since Appel published his paper a lot of research has been done to accelerate the ray-object intersection routines. By combining these acceleration schemes with the new technology in computers, it has become easier to use ray-tracing to the point where it has been used in nearly every production rendering software.【总结一下光线追踪渲染可以分为两步,首先决定这个对象对于这个像素是否可见,然后对于这个点调色。不过这两步都需要射线求交计算,都非常耗时。】































RayTracing – Raytracing Algorithm in a Nutshell



The phenomena described by Ibn al-Haytham explains why we see objects. Two interesting remarks can be made based on his observations: firstly, without light we cannot see anything and secondly, without objects in our environment, we cannot see light. If we were to travel in intergalactic space, that is what would typically happen. If there is no matter around us, we cannot see anything but darkness even though photons are potentially moving through that space. Ibn al-Haytham解释了我们为什么可以看到物体,是基于两个有趣的现象,首先是没有光线我们看不到任何东西,其次是没有物体的世界我们看不到光线。】



Forward Tracing


If we are trying to simulate the light-object interaction process in a computer generated image, then there is another physical phenomena which we need to be aware of. Compared to the total number of rays reflected by an object, only a select few of them will ever reach the surface of our eye. Here is an example. Imagine we have created a light source which emits only one single photon at a time. Now let’s examine what happens to that photon. It is emitted from the light source and travels in a straight line path until it hits the surface of our object. Ignoring photon absorption, we can assume the photon is reflected in a random direction. If the photons hits the surface of our eye, we “see” the point where the photon was reflected from (figure 1).【在模拟光照过程的时候,我们需要注意的是光线经过物体的反射,只有少部分光线进入眼睛,下图就是在说明这个事情。】



We can now begin to look at the situation in terms of computer graphics. First, we replace our eyes with an image plane composed of pixels. In this case, the photons emitted will hit one of the many pixels on the image plane, increasing the brightness at that point to a value greater than zero. This process is repeated multiple times until all the pixels are adjusted, creating a computer generated image. This technique is called forward ray-tracing because we follow the path of the photon forward from the light source to the observer.【我们来模拟这个过程,首先用Image代替眼睛,光线从光源出发,Image接收到光线就增加亮度,直到走完所有的光线。这个方法叫做forward ray-tracing。】


However do you see a potential problem with this approach?【但是你会发现这个方法存在问题】


The problem is the following: in our example we assumed that the reflected photon always intersected the surface of the eye. In reality, rays are essentially reflected in every possible direction, each of which have a very, very small probability of actually hitting the eye. We would potentially have to cast zillions of photons from the light source to find only one photon that would strike the eye. In nature this is how it works, as countless photons travel in all directions at the speed of light. In the computer world, simulating the interaction of that many photons with objects in a scene is just not practical solution for reasons we will now explain.【问题是我们只有投射足够量的光子,其中的一小部分才会真的与眼睛相交变成有效的画面的一部分】


So you may think: “Do we really need to shoot photons in random directions? Since we know the eye’s position, why not just send the photon in that direction and see which pixel in the image it passes through, if any?” That would certainly be one possible optimization, however we can only use this method for certain types of material. For reasons we will explain in a later lesson on light-matter interaction, directionality is not important for diffuse surfaces. This is because a photon that hits a diffuse surface can be reflected in any direction within the hemisphere centered around the normal at the point of contact. However, if the surface is a mirror, and does not have diffuse characteristics, the ray can only be reflected in a very precise direction; the mirrored direction (something which we will learn how to compute later on). For this type of surface, we can not decide to artificially change the direction of the photon if it’s actually supposed to follow the mirrored direction. Meaning that this solution is not completely satisfactory.【因此我们就想,我们怎样提高光子的投射效率,一种方法是人工干预方向,在每一次的折返射的时候摒弃掉一些方向的光线,但是这样的做法存在的问题是,对于镜子这样的对象你无法有效的处理】


Even if we do decide to use this method, with a scene made up of diffuse objects only, we would still face one major problem. We can visualize the process of shooting photons from a light into a scene as if you were spraying light rays (or small particles of paint) onto an object’s surface. If the spray is not dense enough, some areas would not be illuminated uniformly.【不用上述方法的另一个原因是对于场景中占大多数的diffuse的物体,你无法通过上述方法化简计算量】


Imagine that we are trying to paint a teapot by making dots with a white marker pen onto a black sheet of paper (consider every dot to be a photon). As we see in the image below, to begin with only a few photons intersect with the teapot object, leaving many uncovered areas. As we continue to add dots, the density of photons increases until the teapot is “almost” entirely covered with photons making the object more easily recognisable.【下图所示我们想绘制一个茶壶,这个方法的绘制过程表现就是一个一个随机的白点增加的过程】


But shooting 1000 photons, or even X times more, will never truly guarantee that the surface of our object will be totally covered with photons. That’s a major drawback of this technique. In other words, we would probably have to let the program run until we decide that it had sprayed enough photons onto the object’s surface to get an accurate representation of it. This implies that we would need watch the image as it’s being rendered in order to decide when to stop the application. In a production environment, this simply isn’t possible. Plus, as we will see, the most expensive task in a ray-tracer is finding ray-geometry intersections. Creating many photons from the light source is not an issue, but, having to find all of their intersections within the scene would be prohibitively expensive.【但问题在于实际实现的过程中,无论你发射了多少条有限的光线,你都很难把所有的茶壶中间的黑洞填白,这事情是不可控的,而且代价昂贵】



Conclusion: Forward ray-tracing (or light tracing because we shoot rays from the light) makes it technically possible simulate the way light travels in nature on a computer. However, this method, as discussed, is not efficient or practical. In a seminal paper entitled “An Improved Illumination Model for Shaded Display” and published in 1980, Turner Whitted (one of the earliest researchers in computer graphics) wrote:forward是一种计算机模拟的方式,但是这个方法不实用。An Improved Illumination Model for Shaded Display这篇写到:】


“In an obvious approach to ray tracing, light rays emanating from a source are traced through their paths until they strike the viewer. Since only a few will reach the viewer, this approach is wasteful. In a second approach suggested by Appel, rays are traced in the opposite direction, from the viewer to the objects in the scene”.forward这种方法太浪费了,我们是否反过来思考光线的走势】


We will now look at this other mode, Whitted talks about.


Backward Tracing


Instead of tracing rays from the light source to the receptor (such as our eye), we trace rays backwards from the receptor to the objects. Because this direction is the reverse of what happens in nature, it is fittingly called backward ray-tracing or eye tracing because we shoot rays from the eye position?(figure 2). This method provides a convenient solution to the flaw of forward ray-tracing. Since our simulations cannot be as fast and as perfect as nature, we must compromise and trace a ray from the eye into the scene. If the ray hits an object then we find out how much light it receives by throwing another ray (called a light or shadow ray) from the hit point to the scene’s light. Occasionally this “light ray” is obstructed by another object from the scene, meaning that our original hit point is in a shadow; it doesn’t receive any illumination from the light. For this reason, we don’t name these rays light rays?but instead shadow rays. In CG literature, the first ray we shoot from the eye into the scene is called a primary ray, visibility ray, or camera ray.【我们来看反向光线追踪,如下图所示,其做法就是光线从眼睛出发反向去传播,直到回传到光源。】





In computer graphics the concept of shooting rays either from the light or from the eye is called path tracing. The term ray-tracing can also be used but the concept of path tracing suggests that this method of making computer generated images relies on following the path from the light to the camera (or vice versa). By doing so in an physically realistic way, we can easily simulate optical effects such caustics or the reflection of light by other surface in the scene (indirect illumination). These topics will be discussed in other lessons.【在计算机图形学中,从光线或从眼睛射出射线的概念被称为路径追踪。
术语光线跟踪也可以使用,但路径跟踪的概念表明,这种制作计算机生成图像的方法依赖于从光源到相机的路径(反之亦然)。 通过物理上逼真的方式,我们可以很容易地模拟光学效应,如焦场或场景中其他表面的反射(间接照明)。 这些主题将在其他课程中讨论。】



























RayTracing – How Does It Work?



To begin this lesson, we will explain how a three-dimensional scene is made into a viewable two-dimensional image. Once we understand that process and what it involves, we will be able to utilize a computer to simulate an “artificial” image by similar methods. We like to think of this section as the theory that more advanced CG is built upon.【这课程我们首先来解释怎么从3D场景获得2D图像】


In the second section of this lesson, we will introduce the ray-tracing algorithm and explain, in a nutshell, how it works. We have received email from various people asking why we are focused on ray-tracing rather than other algorithms. The truth is, we are not. Why did we chose to focus on ray-tracing in this introductory lesson? Simply because this algorithm is the most straightforward way of simulating the physical phenomena that cause objects to be visible. For that reason, we believe ray-tracing is the best choice, among other techniques, when writing a program that creates simple images.【然后我们会介绍ray-tracing算法,仅仅因为这个算法是模拟引起物体可见的物理现象的最直接的方式。


To start, we will lay the foundation with the ray-tracing algorithm. However, as soon as we have covered all the information we need to implement a scanline renderer, for example, we will show how to do that as well.【在了解ray-tracing之前,我们首先回顾一下扫描线算法】



How Does an Image Get Created?


Although it seems unusual to start with the following statement, the first thing we need to produce an image, is a two-dimensional surface (this surface needs to be of some area and cannot be a point). With this in mind, we can visualize a picture as a cut made through a pyramid whose apex is located at the center of our eye and whose height is parallel to our line of sight (remember, in order to see something, we must view along a line that connects to that object). We will call this cut, or slice, mentioned before, the image plane (you can see this image plane as the canvas used by painters). An image plane is a computer graphics concept and we will use it as a two-dimensional surface to project our three-dimensional scene upon. Although it may seem obvious, what we have just described is one of the most fundamental concepts used to create images on a multitude of different apparatuses. For example, an equivalent in photography is the surface of the film (or as just mentioned before, the canvas used by painters).【根据图形学的概念渲染就是用2D Image来展示3D 场景】




Perspective Projection


Let’s imagine we want to draw a cube on a blank canvas. The easiest way of describing the projection process is to start by drawing lines from each corner of the three-dimensional cube to the eye. To map out the object’s shape on the canvas, we mark a point where each line intersects with the surface of the image plane. For example, let us say that c0 is a corner of the cube and that it is connected to three other points: c1c2, and c3. After projecting these four points onto the canvas, we get c0′c1′c2′, and c3′. If c0c1 defines an edge, then we draw a line from c0′ to c1′. If c0c2 defines an edge, then we draw a line from c0′ to c2′.【在image上绘制一个Cube,最简单的方法就是顶点投影,然后顶点之间的连线处理】


If we repeat this operation for remaining edges of the cube, we will end up with a two-dimensional representation of the cube on the canvas. We have then created our first image using perspective projection. If we continually repeat this process for each object in the scene, what we get is an image of the scene as it appears from a particular vantage point. It was only at the beginning of the 15th century that painters started to understand the rules of perspective projection.【重复上述方法到6个面,就画完了Cube,在重复用于场景每一个物体,就渲染完成。这就是15世纪,画家从这方法开始理解透视】




Light and Color


Once we know where to draw the outline of the three-dimensional objects on the two-dimensional surface, we can add colors to complete the picture.【上面画完线框,下面上色】


To summarize quickly what we have just learned: we can create an image from a three-dimensional scene in a two step process. The first step consists of projecting the shapes of the three-dimensional objects onto the image surface (or image plane). This step requires nothing more than connecting lines from the objects features to the eye. An outline is then created by going back and drawing on the canvas where these projection lines intersect the image plane. As you may have noticed, this is a geometric process. The second step consists of adding colors to the picture’s skeleton.【快速总结,创建Image分为两步:第一步是投影,第二步是上色】


An object’s color and brightness, in a scene, is mostly the result of lights interacting with an object’s materials. Light is made up of photons (electromagnetic particles) that have, in other words, an electric component and a magnetic component. They carry energy and oscillate like sound waves as they travel in straight lines. Photons are emitted by a variety of light sources, the most notable example being the sun. If a group of photons hit an object, three things can happen: they can be either absorbed, reflected or transmitted. The percentage of photons reflected, absorbed, and transmitted varies from one material to another and generally dictates how the object appears in the scene. However, the one rule that all materials have in common is that the total number of incoming photons is always the same as the sum of reflected, absorbed and transmitted photons. In other words, if we have 100 photons illuminating a point on the surface of the object, 60 might be absorbed and 40 might be reflected. The total is still 100. In this particular case, we will never tally 70 absorbed and 60 reflected, or 20 absorbed and 50 reflected because the total of transmitted, absorbed and reflected photons has to be 100.【物体的颜色和亮度,是物体材质和光照合力的结果,具体解释就是光学那套。】


In science, we only differentiate two types of materials, metals which are called conductors and dielectrics. Dielectris include things such a glass, plastic, wood, water, etc. These materials have the property to be electrical insulators (pure water is an electrical insulator). Note that a dielectric material can either be transparent or opaque. Both the glass balls and the plastic balls in the image below are dielectric materials. In fact, every material is in away or another transparent to some sort of electromagnetic radiation. X-rays for instance can pass through the body.【材质分类我们只关心透明和不透明,不透明的会挡住光线穿过】


An object can also be made out of a composite, or a multi-layered, material. For example, one can have an opaque object (let’s say wood for example) with a transparent coat of varnish on top of it (which makes it look both diffuse and shiny at the same time like the colored plastic balls in the image below).【还有一种是半透明,比如皮肤这种,可以看作是有多层材质】



Let’s consider the case of opaque and diffuse objects for now. To keep it simple, we will assume that the absorption process is responsible for the object’s color. White light is made up of “red”, “blue”, and “green” photons. If a white light illuminates a red object, the absorption process filters out (or absorbs) the “green” and the “blue” photons. Because the object does not absorb the “red” photons, they are reflected. This is the reason why this object appears red. Now, the reason we see the object at all, is because some of the “red” photons reflected by the object travel towards us and strike our eyes. Each point on an illuminated area, or object, radiates (reflects) light rays in every direction. Only one ray from each point strikes the eye perpendicularly and can therefore be seen. Our eyes are made of photoreceptors that convert the light into neural signals. Our brain is then able to use these signals to interpret the different shades and hues (how, we are not exactly sure). This a very simplistic approach to describe the phenomena involved. Everything is explained in more detail in the lesson on color (which you can find in the section Mathematics and Physics for Computer Graphics.【光照原理的例子,初中物理不解释】



Like the concept of perspective projection, it took a while for humans to understand light. The Greeks developed a theory of vision in which objects are seen by rays of light emanating from the eyes. An Arab scientist, Ibn al-Haytham (c. 965-1039), was the first to explain that we see objects because the sun’s rays of light; streams of tiny particles traveling in straight lines were reflected from objects into our eyes, forming images (Figure 3). Now let us see how we can simulate nature with a computer!【这哥们第一次解释我们看到物体是因为光照。下面我们开始讲解怎么用计算机模拟这个物理现象】
































































SIGGRAPH 15 – The Real-time Volumetric Cloudscapes of Horizon: Zero Dawn


Forslides with proper formatting and video/audio use the PPTX version.


The following was presented at SIGGRAPH 2015 as part of the Advances in Real-time rendering Course. http://advances.realtimerendering.com


Authors: Andrew Schneider –Principal FX Artist, Nathan Vos –Principal Tech Programmer



Thank you for coming.


Over the next half hour I am going to be breaking down and explaining the cloud system for Horizon Zero Dawn.

【接下来介绍cloud system】


As Natasha mentioned, my background is in Animated film VFX, with experience programming for voxel systems including clouds.

【作者原来是做动画电影特效的,有voxel system基础】


This was co-developed between myself and a programmer named Nathan Vos. He could not be here today, but his work is an important part of what we were able to achieve with this.


Horizon was just announced at E3 this year, and this is the first time that we are sharing some of our new tech with the community. What you are seeing here renders in about 2 milliseconds, takes 20 mbof ram and completely replaces our asset based cloud solutions in previous games.


Before I dive into our approach and justification for those 2 milliseconds, let me give you a little background to explain why we ended up developing a procedural volumetric system for skies in the first place.

【现讲一下使用procedural volumetric system for skies的背景】


In the past, Guerrilla has been known for the KILLZONE series of games, which are first person shooters .



FPS usually restrict the player to a predefined track, which means that we could hand place elements like clouds using billboards and highly detailed sky domes to create a heavily art directed sky.



These domes and cards were built in Photoshop by one artist using stock photography. As Time of day was static in the KILLZONE series, we could pre-bake our lighting to one set of images, which kept ram usage and processing low.



By animating these dome shaderswe could create some pretty detailed and epic sky scapesfor our games.


Horizon is a very different kind of game…

【Horizon 则是一款非常不一样的游戏】



Horizon trailer (Horizon 预告片)



So, from that you could see that we have left the world of Killzonebehind.




•Horizon is a vastly open world where you can prettymuch go anywhere that you see, including the tops of mountains.【超大自由世界随意走动,包括山顶】

•Since this is a living real world, we simulate the spinning of the earth by having a time of day cycle.【模拟的昼夜循环系统】

•Weather is part of the environment so it will be changing and evolving as well.【天气系统】

•There’s lots of epic scenery: Mountains, forests, plains, and lakes.【史诗般的风景:山,平原,湖泊,森林】

•Skies are a big part of the landscape of horizon. They make up half of the screen. Skies are also are a very important part of storytelling as well as world building.【天空是非常重要的一个部分,一般都占有了屏幕的一半来显示,也是非常重要的故事推进背景元素。】



They are used to tell us where we are, when we are, and they can also be used as thematic devices in storytelling.




For Horizon, we want the player to really experience the world we are building. So we decided to try something bold. We prioritized some goals for our clouds.




•Realistic Representing multiple cloud types【真实的描述多变的云的形状】

•Integrate with weather【整合天气】

•Evolve in some way【存在演变方式】

•And of course, they needed to be Epic!【美】



Realistic CG clouds are not an easy nut to crack. So, before we tried to solve the whole problem of creating a sky full them, we thought it would be good to explore different ways to make and light individual cloud assets.

【realistic CG云并不是一件容易啃的骨头。因此在开始处理这个问题前,我们首先浏览目前的所有云的制作方法。】



Our earliestsuccessful modeling approach was to use a custom fluid solver to grow clouds. The results were nice, but this was hard for artists to control if they had not had any fluid simulation experience. Guerrilla is a game studio after all.




We ended up modeling clouds from simple shapes,

Voxelizing them and then ?

Running them through our fluid solver ?

Until we got a cloud like shape .




And then we developed a lighting model that we used to pre-compute primary and secondary scattering,

•Ill get into our final lighting model a little later, but the result you see here is computed on the cpuin Houdini in 10 seconds.




We explored 3 ways to get these cloud assets into game.



•For the first, we tried to treat our cloud as part of the landscape, literally modeling them as polygons from our fluid simulations and baking the lighting data using spherical harmonics. This only worked for the thick clouds and not whispyones …




So, we though we should try to enhance the billboard approach to support multiple orientations and times of day . We succeeded but we found that we couldn’t easily re-produce inter cloud shadowing. So…




•We tried rendering all of our voxel clouds as one cloud set to produce sky domes that could also blend into the atmosphere over depth. Sort of worked.

【尝试把所有的voxel cloud按照深度排序被大气blend,当作一个整体看作是天空穹顶。】


•At this point we took a step back to evaluate what didn’t work. None of the solutions made the clouds evolve over time. There was not a good way to make clouds pass overhead. And there was high memory usage and overdraw for all methods.

【然后回过头来看,voxel clouds对最终结果没有做出贡献占大部分,pass overhead严重,性能非常不好,不是一种好的选择】


•So maybe a traditional asset based approach was not the way to go.




Well, What about voxel clouds?

OK we are crazy we are actually considering voxel clouds now…

As you can imagine this idea was not very popular with the programmers.



Volumetrics are traditionally very expensive

With lots of texture reads

Ray marches

Nested loops



However, there are many proven methods for fast, believable volumetric lighting

There is convincing work to use noise to model clouds . I can refer you to the 2012 Production Volume Rendering course.

Could we solve the expense somehow and benefit from all of the look advantages of volumetrics?




Our first test was to stack up a bunch of polygons in front of the camera and sample 3d Perlin noise with them. While extremely slow, This was promising, but we want to represent multiple clouds types not just these bandy clouds.




So we went into Houdini and generated some tiling 3d textures out of the simulated cloud shapes. Using Houdini’s GL extensions, we built a prototype GL shader to develop a cloud system and lighting model.

【然后我们采用Houdini来生成3D纹理,利用 Houdini’s GL extensions来开发一个cloud system和光照模型】

【Houdini软件介绍 https://zh.wikipedia.org/wiki/Houdini



In The end, with a LOT of hacks, we got very close to mimicking our reference. However, it all fell apart when we put the clouds in motion. It also took 1 second per frame to compute. For me coming from animated vfx, this was pretty impressive, but my colleagues were still not impressed.



So I thought, Instead of explicitly defining clouds with pre-determined shapes, what if we could develop some good noises at lower resolutions that have the characteristics we like and then find a way to blend between them based on a set of rules. There has been previous work like this but none of it came close to our look goals.




This brings us to the clouds system for horizon. To explain it better I have broken it down into 4 sections: Modeling, Lighting, Rendering and Optimization.

【cloud system工作流分成四个阶段:modeling, lighting, rendering, optimization】


Before I get into how we modeled the cloud scapes, it would be good to have a basic understanding of what clouds are and how they evolve into different shapes.




Classifying clouds helped us better communicate what we were talking about and Define where we would draw them.


The basic cloud types are as follows.【基本的云形状】

•The stratoclouds including stratus, cumulus and stratocumulus【云分类】

•The alto clouds, which are those bandy or puffy clouds above the stratolayer【层云(低)】

•And the cirroclouds those big arcing bands and little puffs in the upper atmosphere.【卷云(中)】

•Finally there is the granddaddy of all cloud types, the Cumulonimbus clouds which go high into the atmosphere.【积雨云(高)】

•For comparison, mount Everest is above 8,000 meters.【设定最高高度8000m】



After doing research on cloud types, we had a look into the forces that shape them. The best source we had was a book from 1961 by two meteorologists, called “The Clouds” as creatively as research books from the 60’s were titled. What it lacked in charm it made up for with useful empirical results and concepts that help with modeling a cloud system.



§Density increases at lower temperatures【低温下密度增加】

§Temperature decreases over altitude【海拔下降温度升高】

§High densities precipitate as rain or snow【高密度沉淀为雨雪】

§Wind direction varies over altitude【不同海拔高度的风】

§They rise with heat from the earth【保温作用】

§Dense regions make round shapes as they rise【密度决定形状】

§Light regions diffuse like fog【漫反射性质像雾一样】

§Atmospheric turbulence further distorts clouds.【大气湍流进一步扭曲了云】


These are all abstractions that are useful when modeling clouds




Our modeling approach uses ray marching to produce clouds.



We march from the camera and sample noises and a set of gradients to define our cloud shapes using a sampler




In a ray march you use a sampler to…


Build up an alpha channel….

And calculate lighting




There are many examples of real-time volume clouds on the internet. The usual approach involves drawing them in a height zone above the camera using something called fBm, Fractal Brownian Motion(分形布朗运动). This is done by layering Perlin noises of different frequencies until you get something detailed.

【网络上很多的体素云的例子,大部分是在相机的上半部分采用FBM绘制,就是分层perlin noise直到达到满意效果】




This noise is then usually combined somehow with a gradient to define a change in cloud density over height




This makes some very nice but very procedural looking clouds.

What’s wrong?

There are no larger governing shapes or visual cues as to what is actually going on here. We don’t feel the implied evolution of the clouds from their shapes.





By contrast, in this photograph we can tell what is going on here. These clouds are rising like puffs of steam from a factory. Notice the round shapes at the tops and whispyshapes at the bottoms.




This fBm approach has some nice whispy shapes, but it lacks those bulges and billows that give a sense of motion. We need to take our shader beyond what you would find on something like Shader Toy.




These billows, as Ill call them?

…are packed, sometimes taking on a cauliflower shape.

Since Perlin noise alone doesn’s cut it, we developed our own layered noises.

【云很多的时候,需要这种菜花状,这个perlin noise做不到,我们开发了自己的layered noises】



Worley noise was introduced in 1996 by Steven Worley and is often used for caustics and water effects. If it is inverted as you see here:

It makes tightly packed billow shapes.

We layered it like the standard Perlin fBm approach

【Worley noise 是这种紧凑的枕头形状,我们首先把它层次化了】


Then we used it as an offset to dilate Perlin noise. this allowed us to keep the connectedness of Perlin noise but add some billowy shapes to it.

We referred to this as Perlin-Worley noise

【然后混合Perlin noise做offset,效果如下】



In games, it is often best for performance to store noises as tiling 3d textures.

【游戏中一般都是用生成好的3D noise textures,为了性能】


You want to keep texture reads to a minimum?

And keep the resolutions as small as possible.

In our case we have compressed our noises to?

two 3d textures?

And 1 2d texture.




The first 3d Texture…


has 4 channels…

it is 128^3 resolution…

The first channel is the Perlin-Worley noise I just described.

The other 3 are Worley noise at increasing frequencies. Like in the standard approach, This 3d texture is used to define the base shape for our clouds.




Our second 3d texture…


has 3 channels…

it is 32^3 resolution…

and uses Worley noise at increasing frequencies. This texture is used to add detail to the base cloud shape defined by the first 3d noise.




Our 2D texture…


has 3 channels…

it is 128^2 resolution…

and uses curl noise. Which is non divergent and is used to fake fluid motion. We use this noise to distort our cloud shapes and add a sense of turbulence.




Recall that the standard solution calls for a height gradient to change the noise signal over altitude. Instead of 1, we use…

【回想下前面讲到的网络上的标准方法通过梯度改变noise signal来实现海拔的考虑。我们这边也是这么采用的】


3 mathematical presets that represent the major low altitude…

cloud types when we blend between them at the sample position.

We also have a value telling us how much cloud coverage we want to have at the sample position. This is a value between zero and 1.




What we are looking at on the right side of the screen is a view rotated about 30 degrees above the horizon. We will be drawing clouds per the standard approach in a zone above the camera.



First, we build a basic cloud shape by sampling our first 3dTexture and multiplying it by our height signal.

【首先我们绘制基本的云的形状通过 sampling 前面的3dtexture 乘上 高度信号,见PPT公式。】


The next step is to multiply the result by the coverage and reduce density at the bottoms of the clouds.




This ensures that the bottoms will be whispy and it increases the presence of clouds in a more natural way. Remember that density increases over altitude. Now that we have our base cloud shape, we add details.




The next step is to…


erode the base cloud shape by subtracting the second 3d texture at the edges of the cloud.

Little tip, If you invert the Worley noise at the base of the clouds you get some nice whispy shapes.

【通过第二层的3D texture来侵蚀云层的形状,小技巧说的是你可以直接取反来做侵蚀效果同样好。】


We also distort this second noise texture by our 2d curl noise to fake the swirly distortions from atmospheric turbulence as you can see here…

【我们同时使用2D 纹理噪音来模拟大气流动带来的云层扭曲】



Here’s that it looks like in game. I’m adjusting the coverage signal to make them thicker and then transitioning between the height gradients for cumulus to stratus.

【游戏中的效果,coverage调整的是云层的厚度,height gradient调整的是高度】


Now that we have decent stationary clouds we need to start working on making them evolve as part of our weather system.




These two controls, cloud coverage and cloud type are a FUNCTION of our weather system.



There is an additional control for Precipitation that we use to draw rain clouds.

【控制二:降水量值用来控制rain cloud的绘制量】



Here in this image you can see a little map down in the lower left corner. This represents the weather settings that drive the clouds over our section of world map. The pinkish white pattern you see is the output from our weather system. Red is coverage, Green is precipitation and blue is cloud type.



The weather system modulates these channels with a simulation that progresses during gameplay. The image here has Cumulus rain clouds directly overhead (white) and regular cumulus clouds in the distance. We have controls to bias the simulation to keep things art direct-able in a general sense.




The default condition is a combination of cumulus and stratus clouds. The areas that are more red have less of the blue signal, making them stratus clouds. You can see them in the distance at the center bottom of the image.




The precipitation signal transitions the map from whatever it is to cumulonimbus clouds at 70% coverage




The precipitation control not only adjusts clouds but it creates rain effects. In this video I am increasing the chance of precipitation gradually to 100%




If we increase the wind speed and make sure that there is a chance of rain, we can get Storm clouds rolling in and starting to drop rain on us. This video is sped up, for effect, btw. Ahhh… Nature Sounds.




We also use our weather system to make sure that clouds are the horizon are always interesting and poke above mountains.



We draw the cloudscapes with in a 35,000 meter radius around the player….

and Starting at a distance of 15,000 meters…

we start transitioning to cumulus clouds at around 50% coverage.

【我们绘制一个cloudscapes 半径为35000米绕在用户周围,距离用户15000米的时候开始过渡到50%覆盖率的积云】



This ensures that there is always some variety and ‘epicness’ to the clouds on the horizon.

So, as you can see, the weather system produces some nice variation in cloud type and coverage.




In the case of the e3 trailer, We overrode the signals from the weather system with custom textures. You can see the corresponding textures for each shot in the lower left corner. We painted custom skies for each shot in this manner.

【e3 trailer上面的例子的做法:自定义右下角云图】



So to sum up our modeling approach…



we follow the standard ray-march/ sampler framework but we build the clouds with two levels of detail

a low frequency cloud base shape and high frequency detail and distortion

Our noises are custom and made from Perlin, Worley and Curl noise

We use a set of presets for each cloud type to control density over height and cloud coverage

These are driven by our weather simulation or by custom textures for use with cut scenes and it is all animated in a given wind direction.



Cloud lighting is a very well researched area in computer graphics. The best results tend to come from high numbers of samples. In games, when you ask what the budget will be for lighting clouds, you might very well be told “Zero”. We decided that we would need to examine the current approximation techniques to reproduce the 3 most important lighting effects for us.

【cloud lighting是一个非常好的研究领域,因为可以得到很好的效果,但是大量的sample带来的计算量巨大,需要找到很好的近似方法来应用于游戏这样的real time rendering领域】



The directional scattering(散射) or luminous(发光) quality of clouds…

The sliver lining when you look toward the sun through a cloud…

And the dark edges visible on clouds when you look away from the sun.



The first two have standard solutions but the third is something we had to solve ourselves.




When light enters a cloud

The majority of the light rays spend their time refracting off of water droplets and ice inside of the cloud before heading to our eyes.




By the time the light ray finally exits the cloud it could have been out scattered absorbed by the cloud or combined with other light rays in what is called in-scattering.



In film vfx we can afford to spend time gathering light and accurately reproducing this, but in games we have to use approximations. These three behaviors can be thought of as probabilities and there is a Standard way to approximate the result you would get.




Beer’s law states that we can determine the amount of light reaching a point based on the optical thickness of the medium that it travels through. With Beers law, we have a basic way to describe the amount of light at a given point in the cloud.


If we substitute energy for transmittance ad depth in the cloud for thickness, and draw this out you can see that energy exponentially decreases over depth. This forms the foundation of our lighting model.

【Beer’s law:揭示的是云层厚度和能量损失的关系,这是我们采用的光照模型的基础】



but there is a another component contributing to the light energy at a point. It is the probability of light scattering forward or backward in the cloud. This is responsible for the silver lining in clouds, one of our look goals.




In clouds, there is a higher probability of light scattering forward. This is called Anisotropic scattering.

【光线进入云层时存在 各向异性散射】


In 1941, the Henyey-Greenstein model was developed to help astronomers with light calculations at galactic scales, but today it is used to reliably reproduce Anisotropy in cloud lighting.

【Henyey-Greenstein model: 最初用于天文学的测量,这里用于云的各向异性的亮度处理】



Each time we sample light energy, we multiply it by The Henyey-Greenstein phase function.

【每一时刻我们sample light energy,把它应用于Henyey-Greenstein phase function】



Here you can see the result. On the left is Just the beers law portion of our lighting model. On the right we have applied the Henyey-Greenstein phase function. Notice that the clouds are brighter around the sun on the right.

【效果展示:左边只是beer’s law的效果,右边加上Henyey-Greenstein phase function处理后的效果】



But we are still missing something important, something that is often forgotten. The dark edges on clouds. This is something that is not as well documented with solutions so we had to do a thought experiment to understand what was going on.




Think back to the random walk of a light ray through a cloud.



If we compare a point inside of the cloud to one near the surface, the one inside would receive more in scattered light. In other words, Cloud material, if you want to call it that, is a collector for light. The deeper you are in the surface of a cloud, the more potential there is for gathered light from nearby regions until the light begins to attenuate, that is.



This is extremely pronounced in round formations on clouds, so much so that the crevices appear…



to be lighter that the bulges and edges because they receive a small boost of in-scattered light.

Normally in film, we would take many many samples to gather the contributing light at a point and use a more expensive phase function. You can get this result with brute force. If you were in Magnus Wrenninge’s multiple scattering talk yesterday there was a very good example of how to get this. But in games we have to find a way to approximate this.




A former colleague of mine, Matt Wilson, from Blue Sky, said that there is a similar effect in piles of powdered sugar. So, I’ll refer to this as the powdered sugar look.




Once you understand this effect, you begin to see it everywhere. It cannot be un-seen.

Even in light whispyclouds. The dark gradient is just wider.




The reason we do not see this effect automatically is because our transmittance function is an approximation and doesn’t take it into account.



The surface of the cloud is always going to have the same light energy that it receives. Let’s think of this effect as a statistical probability based on depth.




As we go deeper in the cloud, our potential for in scattering increases and more of it will reach our eye.



If you combine the two functions you get something that describes this?



Effect as well as the traditional approach.

I am still looking for the Beer’s-Powder approximation method in the ACM digital library and I haven’t found anything mentioned with that name yet.




Lets visually compare the components of our directional lighting model

The beer’s law component which handles the primary scattering?

The powder sugar effect which produces the dark edges facing the light?

And their combination in our final result.




Here you can see what the beer’s law and combined beer’s law and powder effect look like when viewed from the light source. This is a pretty good approximation of our reference.

【混合Beer’s-Powder approximation得到了非常好的效果】



In game, it adds a lot of realism to the Thicker clouds and helps sell the scale of the scene.




But we have to remember that this is a view dependent effect. We only see it where our view vector approaches the light vector, so the powder function should account for this gradient as well.




Here is a panning camera view that shows this effect increasing as we look away from the sun.




The last part of our lighting model is that we artificially darken the rain clouds by increasing the light absorption where they exist.




So, in review our model has 3 components:


Beer’s Law


our powder sugar effect

And Absorption increasing for rain clouds




I have outlined How our sampler is used to model clouds and how our lighting algorithm simulates the lighting effects associated with them. Now I am going to describe how and where we take samples to build an image. And how we integrate our clouds into atmosphere and our time of day cycle.




The first part of rendering with a ray march is deciding where to start. In our situation, Horizon takes place on Earth and as most of you are aware… the earth ….. Is round.

The gases that make up our atmosphere wrap around the earth and clouds exists in different layers of the atmosphere.




When you are on a “flat” surface such as the ocean, you can clearly see how the curvature of the earth causes clouds to descend into the horizon.




For the purposes of our game we divide the clouds into two types in this spherical atmosphere.


•The low altitude volumetric stratoclass clouds between 1500 and 4000 meters…

•and the high altitude 2D alto and cirroclass clouds above 4000 meters. The upper level clouds are not very thick so this is a good area to reduce expense of the shaderby making them scrolling textures instead of multiple samples in the ray march.





By ray marching through spherical atmosphere we can?

ensure that clouds properly descend into the horizon.

It also means we can force the scale of the scene by shrinking the radius of the atmosphere.




In our situation we do not want to do any work or any expensive work where we don’t need to. So instead of sampling every point along the ray, we use our samplers two levels of detail as a way to do cheaper work until we actually hit a cloud.




Recall that the sampler has a low detail noise that make as basic cloud shape

And a high detail noise that adds the realistic detail we need.

The high detail noise is always applied as an erosion from the edge of the base cloud shape.




This means that we only need to do the high detail noise and all of its associated instructions where the low detail sample returns a non zero result.

This has the effect of producing an isosurface that surrounds the area that our cloud will be that could be.




So, when we take samples through the atmosphere, we do these cheaper samples at a larger step size until we hit a cloud isosurface. Then we switch to full samples with the high detail noise and all of its associated instructions. To make sure that we do not miss any high res samples, we always take a step backward before switching to high detail samples.




Once the alpha of the image reaches 1 we don’t need to keep sampling so we stop the march early.




If we don’t reach an alpha of one we have another optimization.

After several consecutive samples that return zero density, we switch back to the cheap march behavior until we hit something again or reach the top of the cloud layer.




Because of the fact that the ray length increases as we look toward the horizon, we start with

an initial potential 64 samples and end with a potential 128 at the horizon. I say potential because of the optimizations which can cause the march to exit early. And we really hope they do.

This is how we take the samples to build up the alpha channel of our image. To calculate light intensity we need to take more samples.




Normally what you do in a ray march like this is to take samples toward the light source, plug the sum into your lighting equation and then attenuate this value using the alpha channel until you hopefully exit the march early because your alpha has reached 1.




In our approach, we sample 6 times in a cone toward the sun. This smooth’s the banding we would normally get with 6 simples and weights our lighting function with neighboring density values, which creates a nice ambient effect. The last sample is placed far away from the rest in order to capture shadows cast by distant clouds.




Here you can see what our clouds look like with just alpha samples with our 5 cone samples for lighting and the long distance cone sample.

To improve performance of these light samples, we switched to sampling the cheap version of our shader once the alpha of the image reached 0.3. , this made the shader 2x faster




The lighting samples replace the lower case d, or depth in the beers law portion of our lighting model. This energy value is then attenuated(衰减) by the depth of the sample in the cloud to produce the image as per the standard volumetric ray-marching approach.

【能量公式,我们改掉了bear’s law的部分的能量实现方式】



The last step of our ray march was to sample the 2d cloud textures for the high altitude clouds




These were a collection of the various types of cirrus and alto clouds that were tiling and scrolling at different speeds and directions above the volumetric clouds.




In reality light rays of different frequencies are mixing in a cloud producing very beautiful color effects. Since we live in a world of approximations, we had to base cloud colors on some logical assumptions.

We color our clouds based on the following model:



Ambient sky contribution increases over height

Direct lighting would be dominated by the sun color

Atmosphere would occlude clouds over depth.


We add up our ambient and direct components and attenuate to the atmosphere color based on the depth channel.




Now, you can change the time of day in the game and the lighting and colors update automatically. This means no pre-baking and our unique memory usage for the entire sky is limited to the cost of 2 3d textures and 1 2d texture instead of dozens of billboards or sky domes.




To sum up what makes our rendering approach unique:



Sampler does “heap” work unless it is potentially in a cloud

64-128 potential march samples, 6 light samples per march in a cone, when we are potentially in a cloud.

Light samples switch from full to cheap at a certain depth




The approach that I have described so far costs around 20 milliseconds.

(pause for laughter)

Which means it is pretty but, it is not fast enough to be included in our game. My co-developer and mentor on this, Nathan Vos, Had the idea that…




Every frame we could use a quarter res buffer to update 1 out of 16 pixels for each 4×4 pixel block with in our final image.

We reproject the previous frame to ensure we have something persistent.




…and where we could not reproject, like the edge of the screen, We substitute the result from one of the low res buffers.

Nathan’s idea made the shader10x faster or more when we render this at half res and use filters to upscale it.

It is pretty much the whole reason we are able to put this in our game. Because of this our target performance is around 2 milliseconds, most of that coming from the number of instructions.




In review we feel that

We have largely achieved our initial goals. This is still a work in progress as there is still time left in the production cycle so we hope to improve performance and direct-ability a bit more. We’re also still working on our atmospheric model and weather system and we will be sharing more about this work in the future on our website and at future conferences.



All of this was captured on a playstation4

And this solution was written in PSSL and C++



A number of sources were utilized in the development of this system. I have listed them here.

I would like to thank My co-developer, Nathan vosmost of all



Also some other Guerrillas..

Elco–weather system and general help with transition to games

Michal –supervising the shader development with me and Nathan

Jan Bart, -for keeping us on target with our look goals

Marijn–for allowing me the time in the fxbudget to work on this and for his guidance

Maarten van der Gaagfor some optimization ideas

Felix van den Bergh for slaving away at making polygon clouds and voxel clouds in the early days

Vlad Lapotin, for his work testing out spherical harmonics

And to HermenHulst, manager of Guerrilla for hiring me and for allowing us the resources and time to properly solve this problem for real-time.



Are there any questions?



Peace out.


































SIGGRAPH 15 – Learning from Failure: a Survey of Promising, Unconventional and Mostly Abandoned Renderers for ‘Dreams PS4’, a Geometrically Dense, Painterly UGC Game’


this talk is about showing you some of the approaches that we tried and failed to make stick for our project. if you’re looking for something to take away, hopefully it could be inspiration or some points are places to start, where we left off. I also just think it’s interesting to hear about failures, and the lessons learnt along the way. it’s a classic story of the random walk of R&D…




spoiler section! 搅局部分


this is where we’re headed if you didnt see it at e3 {e3 trailer}



back to the beginning


it all began with @antonalog doing an experiment with move controllers, and a DX11 based marching cubes implementation.






here he is! this was on PC, using playstation move controllers. the idea was to record a series of add & subtraction using platonic shapes with simple distance field functions


方法: the idea was to record a series of add & subtraction using platonic shapes with simple distance field functions



we use (R to L) cubic strokes(笔触), cylinders, cones, cuboids, ellipsoids, triangular prisms, donuts, biscuits, markoids*, pyramids.

(markoids are named for our own mark z who loves them; they’re super ellipsoids with variable power for x,y,z)


here’s the field for the primitives…


we called each primitive an ‘edit’,

we support a simple list, not tree of CSG edits. 没有使用场景树

and models are made up of anything from 1 to 100,000 edits

with add, subtract or ‘color’ only, along with…


soft blend, which is effectively soft-max and soft-min functions.



here’s the field for the hard blend. 【硬混合】


and the soft. I’ll talk more about the function for this in a bit. note how nicely defined and distance-like it is, everywhere! 【软混合】


[timelapse of dad’s head, with randomised colours] he’s 8,274 edits.

(an side: MM artists Kareem, Jon B and Francis spent a LONG time developing artistic techniques like the ‘chiselled’ look you see above, with half made early versions of this tech. It’s their artistry which convinced us to carry on down this path. It can’t be understated how important it is when making new kinds of tools, to actually try to use them in order to improve them. Thanks guys!).




the compound SDF function , was stored in 83^3 fp16 volume texture blocks, incrementally(渐近,增量) updated as new edits arrived. each block was independently meshed using marching cubes on the compute shader;

at the time this was a pretty advanecd use of CS( as evidenced by frequent compiler bugs/driver crashes) – many of the problems stemmed from issues with generating index buffers dynamically on the GPU. (这是现在相当高级的使用方式,使用中会频繁的编译错误和驱动崩溃,原因是动态生成IB用于GPU)

the tech was based on histopyramids(历史金字塔), which is a stream compaction(压缩) technique where you count the number of verts/indices each cell needs, iteratively halve(一半的) the resolution building cumulative ‘summed area’ tables, then push the totals back up to full resolution, which gives you a nice way to lookup for each cell where in the target VB/IB its verts should go. there’s lots of material online, just google it.



the core idea of lists of simple SDF elements, is still how all sculptures are made in dreams, and is the longest living threads. this was the opposite of a failure! it was our first pillar in the game.



Anton worked with Kareem, our art director, to get some pretty cool gestural UI going too; there’s minimal UI intrusion so artists can get into flow state. I think he was planning to implement classic z-brush style pull/smear/bend modifications of the field – which is probably what some of you may have thought we did first- but luckily he didn’t. Why? welllll………..




some early animation tests were done around this time to see what could be achieved – whether with purely with semi- or fully rigid pieces, or some other technique. The results were varied in quality and all over the place in art style – we didn’t know what we wanted to do, or what was possible; so we imagined lots of futures:



rigid-ish pieces (low resolution FFD deformer over rigid pieces):



competing with that was the idea of animating the edits themselves. the results were quite compelling(引人注目) –


this was an offline render using 3DS Max’s blob mode to emulate soft blends. but it shows the effect.

【3ds max的软混合效果】


this was in Anton’s PC prototype, re-evaluating and re-meshing every frame in realtime.

【每一帧都需要re-evaluation & re-meshing】


and there was a visual high bar, which everyone loved, inspired by the work of legendary claymation animator & film maker jan svankmajer



here we made stop motion by scrubbing(擦洗) through the edit history, time lapse style (just like the earlier dad’s head). and on a more complex head model… pretty expensive to re-evaluate every frame though!




however to achieve this, the SDF would need to be re-evaluated every frame. in the first pc prototype, we had effectively added each edit one at a time to a volume texture – it was great for incremental edits, but terrible for loading and animation. the goal of dreams is for UGC to be minimal size to download, so we can’t store the SDF fields themselves anyway – we need a fast evaluator!

【但是为了达到效果,SDF就得那么做。在我们最开始的例子里面,我们可以有效的编辑 volume texture,但是对于动画和loading还不可以。我们需要一种快速的evaluator方法】



Nevertheless, a plan was forming! the idea was this

{‘csg’ edit list => CS of doom => per object voxels => meshing? => per object poly model => scene graph render! profit!



Before getting to rendering, I’d like to talk about the CS of doom, or evaluator as we call it. The full pipeline from edit list to renderable data is 40+ compute shaders in a long pipeline, but the “CS of doom” are a few 3000+ instruction shaders chained together that make the sparse SDF output. fun to debug on early PS4 hardware!

【先来看 Constructive solid of doom(构造solid的厄运),性能问题严重 】


here are some actual stats on dispatch counts(调度数) for the a model called crystal’s dad to be converted from an edit list to a point cloud and a filtered brick tree:

eval dispatch count: 60

sweep dispatch count: 91

points dispatch count: 459

bricker dispatch count: 73




We had limited the set of edits to exclude domain deformation or any non-local effects like blur (much to the chagrin of z-brush experienced artists), and our CSG trees were entirely right leaning, meaning they were a simple list. Simple is good!

so in *theory* we had an embarrassingly parallel problem(尴尬的并行问题) on our hands. take a large list of 100k edits, evaluate them at every point in a ~1000^3 grid, mesh the result, voila! one object!

【基本版问题:100K大小的操作量 * 场景大小 1000^3 依次evaluation = 100 billion】



alas, that’s 100 billion evaluations, which is too many.



anton wrote the first hierarchical prototype, which consisted of starting with a very coarse voxel grid, say 4x4x4


【改版一代:使用层次化的 voxel grid】



building a list of edits that could possibly overlap each voxel, and then iteratively refining the voxels by splitting them and shortening the lists for each.




empty cells and full cells are marked early in the tree; cells near the boundary are split recursively(递归) to a resolution limit. (the diagram shows a split in 2×2, but we actually split by 4x4x4 in one go, which fits GCN’s 64 wide wavefronts and lets us make coherent scalar branches on primitive type etc) the decision to split a given cell and when not to, is really tricky(狡猾).



if you err on the ‘too little split’ side, you get gaps in the model. most of the renderering backends we were trying required at least 1 to 1.5 voxels of valid data on each side of the mesh.

if you err on the ‘too much split’ side, you can easily get pathological cases where the evaluator ends up doing orders of magnitude too much work.


Also, the splits must be completely seamless(无缝). The quality constraints are much, much more stringent than what you’d need for something like sphere tracing.


Both Anton and I had a crack at various heuristic evaluators(破解各种启发式评估), but neither was perfect. And it was made worse by the fact that even some of our base primitives, were pretty hard to compute ‘good’ distances for!




an aside on norms. everyone defaults to the L2 distance (ie x^2+y^2+z^2) because it’s the length we’re used to.



the L2 norm for boxes and spheres is easy. but the ellipsoid… not so much. Most of the public attempts at ‘closest point on an ellipsoid’ are either slow, unstable in corner cases, or both. Anton spent a LONG time advancing the state of the art, but it was a hard, hard battle.

【距离衡量:L2 norm: X^2 + Y^2 + Z^2】


Ellipsoid: https://www.shadertoy.com/view/ldsGWX

Spline: https://www.shadertoy.com/view/XssGWl



luckily, anton noticed that for many primitives, the max norm was simpler and faster to evaluate.


Insight from “Efficient Max-Norm Distance Computation and Reliable Voxelization” http://gamma.cs.unc.edu/RECONS/maxnorm.pdf


  • Many non-uniform primitives have much simpler distance fields under max norm, usually just have to solve some quadratics!
  • Need to be careful when changing basis as max norm is not rotation-invariant, but a valid distance field is just a scaling factor away


So evaluator works in max norm i.e. d = max(|x|,|y|,|z|). The shape of something distance ‘d’ away from a central origin in max norm is a cube, which nicely matches the shape of nodes in our hierarchy. 🙂

【距离衡量:Max norm: 简单快速】




Soft blend breaks ALL THE CULLING, key points:

– Soft min/max needs to revert(还原) to hard min/max once distance fields are sufficiently far apart(一旦距离场有足够的相距甚远) (otherwise you can never cull either side)

  • Ours is for some radius r: soft_min(a, b, r) { float e = max(r – abs(a – b), 0); return min(a, b) – e*e*0.25/r; }, credit to Dave Smith @ media molecule 【radius计算】
  • Has no effect once abs(a – b) > r 【两对象没有接触】
  • Need to consider the amount of ‘future soft blend’ when culling, as soft blend increases the range at which primitives can influence the final surface (skipping over lots of implementation details!) 【考虑soft融合方式时候的影响范围】
  • Because our distance fields are good quality, we can use interval arithmetic for additional culling (skipping over lots of implementation details!) 【影响范围由距离来衡量】



this is a visualisation of the number of edits affecting each voxel; you can see that the soft blend increases the work over a quite large area.




however, compared to the earlier, less rigorous evaluators(缺少严格的评估), simon’s interval-arithmetic and careful-maxnorm-bounds was a tour-de-force of maths/engineering/long dependent compute shader chains/compiler bug battling.



thanks for saving the evaluator sjb!



STATS! for some test models, you can see a range of edits (‘elements’) from 600 – 53000 (the worst is around 120k, but thats atypical); this evaluates to between 1m and 10m surface voxels (+-1.5 of surface),



… the culling rates compared to brute force are well over 99%. we get 10m – 100m voxels evaluated per second on a ps4, from a model with tens of thousands of edits.



this is one of those models… (crystals dad, 8274 edits, 5.2m voxels)



…and this is a visualisation of the number of edits that touch the leaf voxels




moar (head40) (22k edits, 2.4m voxels)

note the colouring is per block, so the voxel res is much higher than the apparent color res in this debug view(voxel密度远高于颜色表示)



the meshes output from the blob prototype, as it was called, were generally quite dense – 2m quads at least for a large sphere, and more as the thing got more crinkly(皱巴巴). In addition, we wanted to render scenes consisting of, at very least, a ‘cloud’ of rigidly oriented blob-meshes.


at this point anton and I started investigating(调查) different approaches. anton looked into adaptive variants of marching cubes, such as dual marching cubes, various octree schemes, and so on. let’s call this engine – including the original histopyramids marching cubes, engine 1: the polygon edition.





here are some notes from the man himself about the investigations SDF polygonalization



【MC算法:网格太密,边没有棱角,slivers ,输出不对称代码不利于GPU实现】

Marching cubes: Well it works but the meshes are dense and the edges are mushy and there are slivers and the output makes for asymmetrical code in a GPU implementation.



I dont know if you can tell but that’s the wireframe!

oh no




Dual Contouring(双轮廓): Hey this is easy on GPU. Oh but it’s kind of hard to keep sharp edges sharp and smooth things smooth and it doesn’t really align to features for edge flow either.



‘Dual Contouring of Hermite Data’

Ju, Losasso, Schaefer and Warren



note the wiggly(扭动的) edge on the bottom left of the cuboid – really hard to tune the hard/soft heuristics when making animated deathstars.




more complex model….



the DC mesh is still quite dense in this version, but at least it preserves edges.



however it shows problems: most obviously, holes in the rotor due to errors in the evaluator we used at this stage (heuristic(启发式) culling -> makes mistakes on soft blend; pre simon eval!) – also occasionally what should be a straight edge ends up wobbly because it cant decide if this should be smooth or straight. VERY tricky to tune in the general case for UGC.





ALSO! Oh no, there are self intersections! This makes the lighting look glitched – fix em:



‘Intersection-free Contouring on An Octree Grid’

Tao Ju, Tushar Udeshi




Oh no, now it’s not necessarily manifold(合成), fix that.



Manifold Dual Contouring

Scott Schaefer, Tao Ju, Joe Warren




Oh no, it’s self intersecting again. Maybe marching cubes wasn’t so bad after all… and LOD is still hard (many completely impractical papers).



the ability to accumulate to an ‘append buffer’ via DS_ORDERED_COUNT * where the results are magically in deterministic order based on wavefront dispatch(调度) index* is …

magical and wonderful feature of GCN. it turns this…



(non deterministic vertex/index order on output from a mesher, cache thrashing(抖动) hell:)




into this – hilbert ordered dual contouring! so much better on your (vertex) caches.

we use ordered append in a few places. it’s a nice tool to know exists!

【hilbert 顺序的DC,非常有用】



back to the story! the answer to Isla’s question is,




no, I do not like polygons.



I mean, they are actually pretty much the best representation of a hard 2D surface embedded in 3D, especially when you consider all the transistors(晶体管) and brain cells(脑细胞) dedicated(专用) to them.



but… they are also very hard to get right automatically (without a human artist in the loop), and make my head hurt. My safe place is voxels and grids and filterable representations.



Plus, I have a real thing for noise, grain, ‘texture’ (in the non texture-mapping sense), and I loved the idea of a high resolution volumetric representation being at the heart of dreams. it’s what we are evaluating, after all. why not try rendering it directly? what could possibly go wrong?



so while anton was researching DC/MC/…, I was investigating alternatives(调查替代方案).



there was something about the artefacts(工艺品) of marching cubes meshes that bugged me.

I really loved the detailed sculpts, where polys were down to a single pixel and the lower res / adaptive res stuff struggled(挣扎) in some key cases.

so, I started looking into… other techniques.





since the beginning of the project, I had been obsessed by this paper:


by Philippe Decaudin, Fabrice Neyret.


it’s the spiritual(精神) precursor to gigavoxels, SVOs, and their even more recent work on prefiltered voxels. I became convinced(相信) around this time that there was huge visual differentiation to be had, in having a renderer based not on hard surfaces, but on clouds of prefiltered, possibly gassy looking, models. and our SDF based evaluator, interpreting the distances around 0 as opacities, seemed perfect. this paper still makes me excited looking at it. look at the geometric density, the soft anti-aliased look, the prefiltered LODs. it all fitted!



the paper contributed a simple LOD filtering scheme based on compositing ‘over’ along each axis in turn, and taking the highest opacity of the three cardinal directions. this is the spiritual precursor to ‘anisotropic’ voxels used in SVO. I love seeing the lineage of ideas in published work. ANYWAY.




the rendering was simple too: you take each rigid object, slice it screen-aligned along exponentially spaced z slices, and composite front to back or back to front. it’s a scatter-based, painters algorithm style volume renderer. they exploit the rasterizer to handle sparse scenes with overlapping objects. they also are pre-filtered and can handle transparent & volumetric effects. this is quite rare – unique? – among published techniques. it’s tantalising. I think a great looking game could be made using this technique.



I have a small contribution – they spend a lot of the paper talking about a complex Geometry shader to clip the slices to the relevant object bounds. I wish it was still 2008 so I could go back in time and tell them you don’t need it! 😉 well, complex GS sucks. so even though I’m 7 years late I’m going to tell you anyway 😉




to slice an object bounded by this cube…



pick the object axis closest to the view direction, and consider the 4 edges of the cube along this axis.



generate the slices as simple quads with the corners constrained to these 4 edges,



some parts of the slice quads will fall outside the box. that’s what the GS was there for! but with this setup, we can use existing HW:



just enable two user clipping planes for the front and back of the object. the hardware clipping unit does all the hard work for you.




ANYWAY. this idea of volumetric billboards stuck with me. and I still love it.


fast forward a few years, and the french were once again rocking it.


Cyril Crassin, Fabrice Neyret, Sylvain Lefebvre (note: neyret is the secondary author on VBs) had put out gigavoxels.


this is the next precursor to SVOs. seen through the lens of the earlier VB work, I loved that it kept that pre-filtered look, the geometric density from having a densely sampled field. it layered on top a heirachical, sparse representation – matching very well the structure of our evaluator. hooray! however it dispensed with the large number of overlapping objects, which makes it less immediately applicable to Dreams/games. But I did implement a quick version of gigavoxels, here are some shots.

【原来版本主要应用于大量的重复对象,这样不能马上应用于我们的游戏,因此作者实现了一个quick gigavoxel 版本】



its impossible to resist domain repetition when you’re just raytracing a field…




add some lighting as per my earlier siggraph advances talk (2006 was it?), the sort of thing that has since been massively refined e.g. in the shadertoy community (sampling mip mapped/blurred copies of the distance field – a natural operation in gigavoxel land, and effectively cone tracing) I think it has a lovely alabaster look.

however it focussed on a single large field that (eye) rays were traced through, and I needed the kind of scene complexity of the earlier VB paper – a cloud of rigid voxels models.





the idea is to take the brick tree from gigavoxels, but instead of marching rays from the eye, directly choose a ‘cut’ through the tree of bricks based on view distance (to get nice LOD), then rasterise each brick individually. The pixel shader then only has to trace rays from the edge of the bricks(砖) to any surface.

【直接根据view distance来LOD,然后每个brick独立光栅化,光线追踪从brick的边缘开始】


As an added advantage, the bricks are stored in an atlas(图册), but there is no virtual-texturing style indirection needed in the inner loop (as it is in gigavoxels), because each rastered cube explicitly(明确的) bounds each individual brick, so we know which bit of the atlas to fetch(取) from at VS level.




here you can see the individual cubes that the VS/PS is shading. each represents an 8x8x8 little block of volume data, giga-voxels style. again: rather than tracing eye rays for the whole screen, we do a hybrid scatter/gather: the rasteriser scatters pixels in roughly the right places (note also that the LOD has been adapted so that the cubes are of constant screen space size, ie lower LOD cut of the brick tree is chosen in the distance) then the Pixelshader walks from the surface of the cubes to the SDF surface.

【你看到的每一个独立的cube,包含64个 little block of volume data(gigavoxel)。我们不采用反向光线追踪,而是hybird scatter/gather: LOD后的cube尺寸(这就是hybird的概念)光栅化,然后通过pixelshader对cube表面转化成SDF表面】


also, I could move the vertices of the cubes around using traditional vertex skinning techniqes, to get animation and deformation… oh my god its going to be amazing!

【使用传统的vertex skinning技术来移动vubes顶点来做动画,效果很好!】



(sorry for the bad screenshot – I suck at archiving my work)

It sort of amounts to POM/tiny raymarch inside each 8x8x8 cube, to find the local surface. with odepth to set the zbuffer.

it has the virtue of being very simple to implement.




Because of that simplicity(简单), This technique actually ended up being the main engine a lot of the artists used for a couple of years; you’ll see a couple more shots later. So while the ‘bricks’ engine as it was known, went into heavy use, I really wanted more.




I wasn’t happy! why not? I also wanted to keep that pre-filtered look from Volumetric Billboards. I felt that if we pursued(追求) just hard z buffered surfaces, we might as well just do polys, or at least, the means didn’t lead to a visual result that was different enough. so I started a long journey(旅程) into OIT.

【我还是不那么开心,我也想保持预过滤Volumetric Billboards。我感到如果我只追求z buffered surfaces,那么只适用多边形】



I immediately found that slicing every cube into 8-16 tiny slices, ie pure ‘VB’, was going to burn way too much fill rate.

so I tried a hybrid where: when the PS marched the 8x8x8 bricks, I had it output a list of fuzzy ‘partial alpha’ voxels, as well as outputting z when it hit full opacity. then all I had to do was composite the gigantic (10s of millions) of accumulated fuzzy samples onto the screen… in depth sorted order. Hmm

【我马上发现把每个cube切成8-16个微笑的切片,就是纯粹的”VB”,要烧掉太多填充率。因此采用hybird的方式:PS匹配64个bricks,我们需要它输出a list of fuzzy ‘partial alpha’ voxels,全透明的时候输出Z。那么我要做的就是在depth排序的基础上复合积累模糊样本到屏幕上。】



so it was ‘just’ a matter of figuring out how to composite all the non-solid voxels. I had various ground truth images, and I was particularly excited about objects overlapping each other with really creamy falloff(平滑衰减)

  • e.g. between the blue arch and the grey arch thats just the two overlapping and the ‘fuzz’ around them smoothly cross-intersecting.

【这里想搞清楚如何合成所有的non-solid voxels. overlap的对象可以平滑的衰减,下面就是在说这件事情】



and pre filtering is great for good LOD! this visualizes the pre-filtered mips of dad’s head, where I’ve added a random beard to him as actual geometry in the SDF.

【pre filtering 适用于LOD】



and here’s what it looks like rendered.




but getting from the too-slow ground truth to something consistently fast-enough was very, very hard.


prefiltering is beautiful, but it generates a lot of fuzz(模糊), everywhere. the sheer(绝对) number of non-opaque(不透明) pixels was getting high – easily 32x 1080p

【prefiltering 会导致模糊】

I spent over a year trying everything – per pixel atomic bubble sort, front k approximations, depth peeling(剥落)..


one thing I didn’t try because I didn’t think of it and it hadn’t been published yet, was McGuire style approximate commutative(可交换) OIT. however it wont work in its vanilla form

  • it turns out the particular case of a very ‘tight’ fuzz around objects is very unforgiving of artefacts
  • for example, if adjacent pixels in space or time made different approximations (eg discarded or merged different layers), you get really objectionable visible artefacts.

【有一个我没有想到的也没有出版的是 McGuire style approximate commutative OIT。但因该也不管用】



it’s even worse because the depth complexity changes drastically(大幅度) over 2 orders of magnitude between pixels that hit a hard back and ‘edge on’ pixels that spend literally hundred of voxels skating through fuzz. this is morally the same problem that a lot of sphere tracing approaches have, where edge pixels are waaaay harder than surface pixels.



I did have some interesting CS load balancing experiments(负载均衡实验), based on wavefronts peeling off 8 layers at a time, and re-circulating pixels for extra passes that needed it a kind of compute shader depth peel(剥离) but with load balancing its goal.




here’s a simpler case. fine when your sort/merge algo has enough layers. but if we limit it to fewer blended voxels than necessary…




I couldn’t avoid ugly artefacts.


in the end, the ‘hard’ no-fuzz/no-oit shader was what went over the fence to the designers, who proceeded to work with dreams with a ‘hard’ look while I flailed in OIT land.




see what I mean about failure?

and this is over the period of about 2 years, at this point




I think this is a really cool technique, its another one we discarded(丢弃) but I think it has some legs for some other project.

I call it the refinement renderer.




there are very few screenshots of this as it didn’t live long, but its interestingly odd. have this sort of image in your mind for the next few slides. note the lovely pre-filtered AA, the soft direct lighting (shadows but no shadow maps!). but this one is pure compute, no rasterised mini cubes.

the idea is to go back to the gigavoxels approach of tracing eye rays through fuzz directly… but find a way to make it work for scenes made out of a large number of independently moving objects. I think if you squint(斜眼) a bit this technique shares some elements in common with what Daniel Wright is going to present in the context of shadows; however since this focuses on primary-ray rendering, I’m not going to steal any of his thunder! phew.




a bit of terminology(术语) – we call post projection voxels(后投影像素) – that is, little pieces of view frustum- ‘froxels’ as opposed to square voxels. The term originated at the sony WWS ATG group, I believe.

if you look at a ray marcher like many of the ones on shadertoy , like iq’s famous cloud renderer, you can think of the ray steps as stepping through ‘froxels’.




https://www.shadertoy.com/view/XslGRr – clouds by iq

typically you want to step the ray exponentially so that you spend less time sampling in the distance.


Intuitively(直观的) you want to have ‘as square as possible’ voxels, that is, your step size should be proportional(成比例的) to the inverse of the projected side length, which is 1/1/z, or z. so you can integrate and you get slices spaced as t=exp(A*i) for some constant A (slice index i), or alternatively write it iteratively as t+=K*t at each step for some constant K.




the only problem with this is that near the eye, as t goes to 0, you get infinitely small froxel slices. oh dear. if you look at iq’s cloud example, you see this line:




t += max(0.1,0.02*t);

which is basically saying, let’s have even slicing up close then switch to exponential after a while.

I’ve seen this empirically(经验) used a few times. here’s an interesting (?) insight. what would real life do? they dont have pinhole cameras.




so, consider a thin lens DOF model for a second. what if you tuned your froxel sampling rate not just for projected pixel size, but for projected bokeh(背景虚化) size. the projected bokeh radius is proportional to (z-f)/z, so we want A(z-f)/z + 1/z where A is the size in pixels of your bokeh at infinity. (the +1/z is the size of single ‘sharp’ pixel, i.e. the footprint of your AA filter)



if you put this together, you can actually compute two exponential slicing rates – one for in front of the focal plane, and one for behind.

at the focal plane, it’s the same step rate you would have used before, but in the distance it’s a little sparser, and near to the camera it’s WAY faster. extra amusingly, if you work through the maths, if you set A to be 1 pixel, then the constant in the ‘foreground’ exponential goes to 0 and it turns out that linear slicing is exactly what you want. so the empirical ‘even step size’ that iq uses, is exactly justified if you had a thin lens camera model with aperture such that bokeh-at-infinity is 1pixel across on top of your AA. neat! for a wider aperture, you can step faster than linear.





how does this relate to rendering lots of objects?

the idea I had was to borrow from the way the evaluator works. you start by dividing your frustum into coarse froxels. I chose 64th res, that is about 32×16 in x and y, with 32-64 in z depending on the far z and the DOF aperture. (blurrier dof = fewer slices needed, as in previous slides).

then you do a simple frustum vs object intersection test, and build a list per froxel of which objects touch it.



【好开始考虑如何渲染大量的objects:首先切分frustum成coarse froxels,然后做相交测试找出所有的与object相交的froxels】



then, you recursively subdivide your froxels!

for each froxel, in a compute shader you split them into 8 children. as soon as your froxel size matches the size of gigavoxel prefiltered voxels, you sample the sparse octree of the object (instead of just using OBBs) to futher cull your lists.




as you get finer and finer, the lists get shorter as the object’s shape is more accurately represented. it’s exactly like the evaluator, except this time we have whole objects stored as gigavoxel trees of bricks (instead of platonic SDF elements in the evaluator), we don’t support soft blend, and our domain is over froxels, not voxels.




for the first few steps, I split every froxel in parallel using dense 3d volume textures to store pointers into flat tables of per froxel lists. however at the step that refines from 1/16th res to 8th res (128x64x128 -> 256x128x256) the dense pointer roots get too expensive so I switch to a 2d representation, where every pixel has a single list of objects, sorted by z.

the nice thing is that everything is already sorted coming out of the dense version, so this is really just gluing together a bunch of small lists into one long list per screen pixel.

each refine step is still conceptually splitting froxels into 8, but each pixel is processed by one thread, serially, front to back.

that also means you can truncate the list when you get to solid – perfect, hierarchical occlusion culling!.


the results were pretty

【一开始我们采用3D volume texture来表示纹理,但是太耗资源,后来我们改用2d来表示:每个pixel有一个object列表,按z值排序。】




and the pre-filtered look is really special.

Look how yummy(美味) the overlap of the meshes is! Really soft, and there’s no ‘post’ AA there. It’s all prefiltered.



so I did a bit of work on lighting; a kind of 3d extension of my siggraph 2006 advances talk.

【接下来是灯光的处理,是对我的06年的siggraph paper的高级拓展】



imagine this setup. this is basically going to be like LPV with a voxelized scene, except we use froxels instead of voxels, and we propagate(传播) one light at a time in such a way that we can smear(涂抹) light from one side of the frustum to another in a single frame, with nice quality soft shadows. ‘LPV for direct lights, with good shadows’, if you


【其实这个场景基本就是a voxelized scene,只是我们用froxel的概念代替了voxel。那么基本做法也就和voxel类似。】



imagine a single channel dense froxel grid at low resolution, I think I used 256x128x256 with 8 bits per froxel. We will have one of those for the ‘density’ of the scene – defined everywhere inside the camera frustum.

– As a side effect of the refinement process I write that ‘density’ volume out, more or less for free. Now we are also going to have one extra volume texture for each ‘hero’ light. (I did tests with 4 lights).

STOP PRESS – as far as I can tell from the brilliant morning session by frostbite guys, they have a better idea than the technique I present on the next few slides. They start from the same place -a dense froxel map of ‘density’, as above, but they resample it for each light into a per light 32^3 volume, in light-space. then they can smear density directly in light space. This is better than what I do over the next few slides, I think. See their talk for more!




To wipe(擦拭) the light around, you set the single froxel where the light is to ‘1’ and kick a compute shader in 4 froxel thick ‘shells’ radiating out from that central light froxel.

(with a sync between each shell). Each thread is a froxel in the shell, and reads (up to) 4 trilinear taps from the density volume, effectively a short raycast towards the light.

Each shell reads from the last shell, so it’s sort of a ‘wipe’ through the whole frustum.




here come the shells! each one reads from the last. yes, there are stalls(摊位,档位). no, they’re not too bad as you can do 4 lights and pipeline it all.




The repeated feedback causes a pleasant blur in the propagated shadows.

it’s like LPV propagation, except that it’s for a single light so you have no direction confusion(混乱), and you can wipe from one side of the screen to the other with a frame, since you process the froxels strictly in order radiating out from the light.

You can jitter the short rays to simulate area lights. You do 4 lights at once, to overlap the syncs, and you do it on an async pipe to mop up space on your compute units so the syncs don’t actually hurt that much. (offscreen lights are very painful to do well and the resolution is brutally low). However the results were pretty, and the ‘lighting’ became simple coherent volume texture lookups.






Look ma! no shadowmaps!

would be super cool for participating media stuff, since we also have the brightness of every light conveniently stored at every froxel in the scene. I didn’t implement it





Ambient occlusion was done by simply generating mip-maps of the density volume and sampling it at positions offset from the surface by the normal, ie a dumb very wide cone trace. (大锥痕迹)

【Ambient occlusion效果也好】


The geometric detail and antialiasing was nice:




You could also get really nice subsurface effects by cone tracing the light volumes a little and turning down the N.L term:

【 subsurface effects 效果也很好】




However- the performance was about 4x lower than what I needed for PS4 (I forget the timings, but it was running at 30 for the scenes above ñ but only just! For more complex scenes, it just died). The lighting technique and the refinement engine are separate ideas, but they both had too many limitations and performance problems that I didn’t have time to fix.




(ie I still think this technique has legs, but I can’t make it work for this particular game)

in particular, since edge pixels could still get unboundedly ‘deep’, the refinement lists were quite varied in length, I needed to jump through quite a few hoops to keep the GPU well load balanced. I also should have deferred lighting a bit more – I lit at every leaf voxel, which was slow. however everything I tried to reduce (merge etc) led to visible artefacts. what I didn’t try was anything stochastic(随机). I had yet to fall in love with ‘stochastic all the things’…. definitely an avenue to pursue.

We were also struggling with the memory for all the gigavoxel bricks.




The nail in the coffin was actually to do with art direction.



directly rendering the distance field sculptures was leaving very little to the imagination . So it was very hard to create ‘good looking’ sculptures; lots of designers were creating content that basically looked like untextured unreal-engine, or ‘crap’ versions of what traditional poly engines would give you, but slower. It was quite a depressing time because as you can see it’s a promising tech, but it was a tad too slow and not right for this project.


this is the start of 2014. we’re 3 years in, and the engine prototypes have all been rejected, and the art director (rightly) doesn’t think the look of any of them suits the







there was a real growing uneasiness(担心) in the studio. I had been working on OIT – refinement and sorting and etc for a LONG time; in the meantime, assets were being made using the ‘hard’ variant of the bricks engine, that simply traced each 8x8x8 rasterised brick for the 0 crossing and output raw pixels which were forward lit. at its best, it produced some lovely looking results (above) – but that was more the art than the engine! It also looked rather like ‘untextured poly engine’ – why were we paying all this runtime cost (memory & time) to render bricks if they just gave us a poly look?




also, there was a growing disparity(差距) between what the art department – especially art director kareem and artist jon – were producing as reference/concept work. it was so painterly!


there was one particular showdown with the art director, my great friend kareem, where he kept pointing at an actual oil painting and going ‘I want it to look like this’ and I’d say ‘everyone knows concept art looks like that but the game engine is a re-interpretation of that’ and kareem was like ‘no literally that’. it took HOURS for the penny to drop, for me to overcome my prejudice.




So after talking to the art director and hitting rock bottom in January 2014, he convinced me to go with a splat based engine, intentionally made to look like 3d paint strokes. I have a strong dislike of ‘painterly post fx’ especially 2d ones, so I had resisted this direction for a looooooooooong time.

(btw this is building on the evaluator as the only thing that has survived all this upheaval)

【因此到了2014年1月,我们开始搞 a splat based engine。故意把它弄的像 3d paint strokes。算是一种妥协吧】



I had to admit that for our particular application of UGC, it was *brutal(野蛮的)* that you saw your exact sculpture crisply(简明的) rendered, it was really hard to texture & model it using just CSG shapes. (we could have changed the modelling primitives to include texturing or more noise type setups, but the sculpting UI was so loved that it was notmovable. The renderer on the other hand was pretty but too slow, so it got the axe instead).



So I went back to the output of the evaluator, poked simon a bit, and instead of using the gigavoxel style bricks, I got point clouds, and had a look at what I could do.

There’s a general lesson in here too – that tech direction and art direction work best when they are both considered, both given space to explore possibilities; but also able to give different perspectives on the right (or wrong) path to take.




So! now the plan is: generate a nice dense point cloud on the surface of our CSG sculpts.

EVERYTHING is going to be a point cloud. the SDF becomes an intermediate representation, we use it to spawn the points at evaluation time, (and also for collision. But thats another talk)




we started from the output of the existing evaluator, which if you remember was hierarchically refining lists of primitives to get close to voxels on the surface of the SDF. as it happens, the last refinement pass is dealing in 4x4x4 blocks of SDF to match GCN wavefronts of 64 threads.




We add one point to the cloud per leaf voxel (remember, thats about 900^3 domain, so for example, a sphere model will become a point cloud with diameter 900 and one point per integer lattice cell that intersects the sphere surface)

【对于每一个叶节点的voxel add a point】


actually we are using a dual grid IIRC so that we look at a 2x2x2 neighbourhood of SDF values and only add points where there is a zero crossing.

So now we have a nice fairly even, dense point cloud. Since the bounding voxel grid is up to around 900^3 voxels -> around 2 million surface voxels -> around 2 million points.




The point cloud is sorted into Hilbert order (actually, 4^3 bricks of voxels are in Hilbert order and then the surface voxels inside those bricks are in raster order, but I digress) and cut into clusters of approximately 256 points (occasionally there is a jump in the hilbert brick order so we support partially filled clusters, to keep their bounding boxes tight).

【点云是按照Hilbert order排序好的,然后切成点集群,每个包含大约256个点。】



Each cluster is tightly bounded in space, and we store for each a bounding box, normal bounds. then each point within the cluster is just one dword big, storing bitpacked pos,normal,roughness, and colour in a DXT1 texture. All of which is to say, we now have a point cloud cut into lumps of 256 points with a kind of VQ compression per point. We also compute completely independent cluster sets for each LOD – that is, we generate point clouds and their clusters for a ‘mip pyramid’ going from 900 voxels across, to 450, to 225, etc.

【每个簇紧贴空间边界,我们存储其包围盒和normal bounds。簇中每一个点也要存一些信息。这样我们就把点云层次化了,簇还可以用来实现LOD,用来压缩数据和提高性能。】



I can’t find many good screenshots but here’s an example of the density, turned down by a factor of 2x to see what’s going on.



my initial tests here were all PS/VS using the PS4 equivalent of glPoint. it wasn’t fast, but it showed the potential. I was using russian roulette(俄罗斯轮盘赌) to do ‘perfect’ stochastic LOD, targeting a 1 splat to 1 screen pixel rate , or just under.



At this point we embraced(拥抱) TAA *bigtime* and went with ‘stochastic all the things, all the time!’. Our current frame, before TAA, is essentially verging on white noise. It’s terrifying. But I digress!




for rendering, we arranged the clusters for each model into a BVH. we also computed a separate point cloud, clustering and BVH for each mipmap (LOD) of the filtered SDF. to smooth the LOD transitions, we use russian roulette to adapt the number of points in each cluster from 256 smoothly down to 25%, i.e. 256 down to 64 points per cluster, then drop to the next LOD.

simon wrote some amazingly nicely balanced CS splatters that hierarchically culled and refined the precomputed clusters of points, computes bounds on the russian roulette rates, and then packs reduced cluster sets into groups of ~64 splats.




so in this screenshot the color cycling you can see is visualizing the steps through the different degrees of decimation(不同程度的抽取), from <25%, <50%, <75%, then switching to a completely different power of 2 point cloud;




What you see is the ‘tight’ end of our spectrum. i.e. the point clouds are dense enough that you see sub pixel splats everywhere. The artist can also ‘turn down’ the density of points, at which point each point becomes a ‘seed’ for a traditional 2d textured quad splat. Giving you this sort of thing:





We use pure stochastic transparency(纯随即透明度), that is, we just randomly discard pixels based on the alpha of the splat, and let TAA sort it out. It works great in static scenes.

However the traditional ‘bounding box in color space’ to find valid history pixelsí starts breaking down horribly with stochastic alpha, and we have yet to fully solve that.

So we are still in fairly noisy/ghosty place. TODO!

We started by rendering the larger strokes – we call them megasplats – as flat quads with the rasterizer. thats what you see here, and in the E3 trailer.




Interestingly , simon tried making a pure CS ‘splatting shader’, that takes the large splats, and instead of rasterizing a quad, we actually precompute a ‘mini point cloud’ for the splat texture, and blast(爆破) it to the screen using atomics, just like the main point cloud when it’s in ‘microsplat’ (tight) mode.




So now we have a scene made up of a whole cloud of sculpts…




which are point clouds,



and each point is itself, when it gets close enough to the camera, an (LOD adapted) ‘mini’ point cloud – Close up, these mini point clouds representing a single splat get ‘expanded’ to a few thousand points (conversely, In the distance or for ‘tight’ objects, the mini points clouds degenerate to single pixels).

Amusingly, the new CS based splatter beats(飞溅的节拍) the rasterizer due to not wasting time on all the alpha=0 pixels. That also means our ‘splats’ need not be planar any more, however, we don’t yet have an art pipe for non-planar splats so for now the artists don’t know this! Wooahaha!




That means that if I were to describe what the current engine is, I’d say it’s a cloud of clouds of point clouds. 🙂

【如果让我来描述引擎的特点: it’s a cloud of clouds of point clouds】



Incidentally, this atomic based approach means you can do some pretty insane things to get DOF like effects: instead of post blurring, this was a quick test where we simply jittered the splats in a screenspace disc based on COC, and again let the TAA sort it all out.

It doesn’t quite look like blur, because it isn’t – its literally the objects exploding a little bit – but it’s cool and has none of the usual occlusion artefacts 🙂



We’ve left it in for now as our only DOF.




I should at this point pause to give you a rough outline of the rendering pipe – it’s totally traditional and simple at the lighting end at least.

We start with 64 bit atomic min (== splat of single pixel point(单个像素点的图示)) for each point into 1080p buffer, using lots of subpixel jitter and stochastic(随机) alpha. There are a LOT of points to be atomic-min’d! (10s of millions per frame) Then convert that from z+id into traditional 1080 gbuffer, with normal, albedo, roughness, and z. then deferred light that as usual.

Then, hope that TAA can take all the noise away. 😉


【对于每个点 64 bit 来表示的时候处理subpixel jitter和随机透明度(就是上面讲的过程),然后把z+id转到传统的gbuffer(with normal, albedo, roughness, and z),再采用光照,最后noise交给TAA处理。】



I’m not going to go into loads of detail about this, since I don’t have time, but actually for now the lighting is pretty vanilla – deferred shading, cascaded shadow map sun.

there are a couple of things worth touching on though.


【这里是 the lighting is pretty vanilla – deferred shading, cascaded shadow map sun 的效果】



ISMs: Now we are in loads-of-points land, we did the obvious thing and moved to imperfect shadow maps. We have 4 (3?) cascades for a hero sun light, that we atomicsplat into and then sample pretty traditionally (however, we let the TAA sort out a LOT of the noise since we undersample and undersplat and generally do things quite poorly)

【阴影进化效果:ISM 】


We have a budget of 64 small (128×128) shadowmaps, which we distribute over the local lights in the scene, most of which the artists are tuning as spotlights. They are brute force splatted and sampled, here were simonís first test, varying their distribution over an area light:




these images were from our first test of using 64 small ISM lights, inspired by the original ISM paper and the ‘ManyLODs’ paper. the 3 images show spreading a number of low quality lights out in an area above the object.



Imperfect Shadow Maps for Efficient Computation of Indirect Illumination

T. Ritschel, T. Grosch, M. H. Kim, H.-P. Seidel, C. Dachsbacher, J. Kautz



ManyLoDs http://perso.telecom-paristech.fr/~boubek/papers/ManyLoDs/

Parallel Many-View Level-of-Detail Selection for Real-Time Global Illumination

Matthias Holländer, Tobias Ritschel, Elmar Eisemann and Tamy Boubekeur



I threw in solid-angle esque equi-angular sampling of participating media for the small local lights. See https://www.shadertoy.com/view/Xdf3zB for example implementation. Just at 1080p with no culling and no speedups, just let TAA merge it. this one will DEFINITELY need some bilateral blur and be put into a separate layer, but for now It ís not:




(just a visualisation classic paraboloid projection on the ISMs)

sorry for the quick programmer art, DEADLINES!




this ‘vanilla’ approach to lighting worked surprisingly well for both the ‘tight’ end… (single pixel splats, which we call microsplats)… as well as

【this ‘vanilla’ approach对于灯光的处理在 microsplats 和 gigasplates 一样的好】



…the loose end (‘megasplats’).



this was the first time I got specular in the game! two layers of loose splats, the inner layer is tinted red to make it look like traditional oil underpainting. then the specular hi lights from the environment map give a real sense of painterly look. this was the first image I made where I was like ‘ooooh maybe this isn’t going to fail!’

【我们第一次尝试在游戏中加入镜面光,两层的loose splats,里面一层加入红色元素模拟传统油画,外面那层镜面反射环境贴图来模拟真实的画家的感觉。】



At this point you’ll notice we have painterly sky boxes. I wanted to do all the environment lighting from this. I tried to resurrect my previous LPV tests, then I tried ‘traditional’ Kapalanyan style SH stuff, but it was all too muddy and didn’t give me contact shadows nor did it give me ‘dark under the desk’ type occlusion range.

【sky box:尝试了很多中,我们希望环境光从这里得到,但是这些方法最后都没错采用,因为引入光照阴影模型比较麻烦。】


For a while we ran with SSAO only, which got us to here (point clouds give you opportunities to do ridiculous geometrical detail, lol)




the SSAO we started with was based on Morgan McGuire’s awesome alchemy spiral style SSAO, but then I tried just picking a random ray direction from the cosine weighted hemisphere above each point and tracing the z buffer, one ray per pixel (and let the TAA sort it out ;)) and that gave us more believable occlusion, less like dirt in

the creases.

【我们的SSAO:一开始是:Morgan McGuire’s awesome alchemy spiral style SSAO,对于ray的选择做了修改,为了使画面看起来更脏。】


From there it was a trivially small step to output either black (occluded) or sky colour (from envmap) and then do a 4×4 stratified dither. here it is without TAA (above).

However this is still just SSAO in the sense that the only occluder is the z buffer.

【SSAO without TAA】


(random perf stat of the atomic_min splatter: this scene shows 28.2M point splats, which takes 4.38ms, so thats about 640 million single pixel splats per second)




For longer range, I tried voxelizing the scene – since we have point clouds, it was fairly easy to generate a work list with LOD adapted to 4 world cascades, and atomic OR each voxel – (visualised here, you can see the world space slices in the overlay) into a 1 bit per voxel dense cascaded volume texture




then we hacked the AO shader to start with the z buffer, and then switch to the binary voxelization, moving through coarser and coarser cascades. it’s cone-tracing like, in that I force it to drop to lower cascades (and larger steps), but all the fuzziness is from stochastic sampling rather than prefiltered mip maps. The effect is great for mid range AO – on in the left half, off in the right.


That gets us to more or less where we are today, rough and noisy as hell but extremely simple.I really like the fact you get relatively well defined directional occlusion(遮挡) , which LPV just can’t give you due to excessive diffusion(过度扩散).


【AO的细节:通过z buffer的随机采样的AO效果比pre filtered mip map的AO好很多。】



(at this point we’re in WIP land! like, 2015 time!)

The last test, was to try adding a low resolution world space cascade that is RGB emissive, and then gather light as the sky occlusion rays are marched. The variance is INSANELY high, so it isn’t usable, and this screenshot is WITH taa doing some temporal averaging! But it looks pretty cool. It might be enough for bounce light (rather than direct light, as above), or for extremely large area sources. I don’t know yet. I’m day dreaming about maybe making the emissive volume lower frequency (-> lower variance when gathered with such few samples) by smearing it around with LPV, or at least blurring it. but I haven’t had a chance to investigate.

【对于low resolution的世界空间的点云的自发光处理】



Oh wait I have! I just tried bilateral filtering and stratified sampling over 8×8 blocks, it does help a lot.

I think the general principle of z buffer for close, simple bitmask voxelization for further range gather occlusion is so simple that it’s worth a try in almost any engine. Our voxel cascades are IIRC 64^3, and the smallest cascade covers most of the scene, so they’re sort of mine-craft sized voxels or just smaller at the finest scale. (then blockier further out, for the coarser cascades). But the screenspace part captures occlusion nicely for smaller than voxel distances.

【做法就是filter 模糊: bilateral filtering and stratified sampling over 8×8 blocks】



another bilateral test pic. WIP 😉



and that’s pretty much where we are today!

as a palette cleanser, here’s some non-testbed, non-programmer art





It feels like we’re still in the middle of it all; we still have active areas of R&D; and as you can see, many avenues didn’t pan out for this particular game. But I hope that you’ve found this journey to be inspiring in some small way. Go forth and render things in odd ways!




The artwork in this presentation is all the work of the brilliant art team at MediaMolecule. Kareem, Jon (E & B!), Francis, Radek to name the most prominent authors of the images in this deck. But thanks all of MM too! Dreams is the product of at least 25 fevered minds at this point.

And of course @sjb3d and @antonalog who did most of the engine implementation, especially of the bits that actually weren’t thrown away 🙂

Any errors or omissions are entirely my own, with apologies.

if you have questions that fit in 140 chars I’ll do my best to answer at @mmalex.










SIGGRAPH 15 – Physically Based and Unified Volumetric Rendering in Frostbite


Sebastien Hillaire – Electronic Arts / frostbite





  • introduction


Physically based rendering in Frostbite



Volumetric rendering in Frostbite was limited

  • Global distance/height fog
  • Screen space light shafts
  • Particles




Real-life volumetric 真实的体素

我们期望做到的就是自然界中的这些 云与大气层,雾,光线散射等效果



  • Related Work




Analytic fog [Wenzel07]

Analytic light scattering(散射) [Miles]

特点:Fast,Not shadowed,Only homogeneous media






Screen space light shaft 屏幕空间的光轴

  • Post process [Mitchell07]
  • Epipolar sampling [Engelhardt10]


  • High quality
  • Sun/sky needs to be visible on screen
  • Only homogeneous media 均匀介质
  • Can go for Epipolar sampling but this won’t save the day




  • Light volumes
    • [Valliant14][Glatzel14][Hillaire14]
  • Emissive volumes [Lagarde13]

This can result in high quality scattering but usually it does not match the participating media of the scene. (这种方法已经很常用了,但是相对独立处理)




Volumetric fog [Wronski14] 体积雾

  • Sun and local lights
  • Heterogeneous media

allowing spatially varying participating media and local lights to scatter.

spatially 参与 (scatter)散射,此做法与这边作者的想法一致

However it did not seem really physically based at the time and some features we wanted were missing.





  • Scope and motivation


Increase visual quality and give more freedom to art direction!(更好的视觉效果)


Physically based volumetric rendering (物理)

  • Meaningful material parameters
  • Decouple(去耦合) material from lighting
  • Coherent(一致性) results

We want it to be physically based: this means that participating media materials are decoupled from the light sources (e.g. no scattering colour on the light entities). Media parameters are also a meaningful set of parameters. With this we should get more coherent results that are easier to control and understand.


Unified volumetric interactions(交互)

  • Lighting + regular and volumetric shadows
  • Interaction with opaque, transparent and particles

Also, because there are several entities interacting with volumetric in Frostbite (fog, particles, opaque&transparent surfaces, etc). We also want to unify the way we deal with that to not have X methods for X types of interaction.



This video gives you an overview of what we got from this work: lights that generate scattering according to the participating media, volumetric shadow, local fog volumes, etc.

And I will show you now how we achieve it.





  • Volumetric rendering


  • Single Scattering


As of today we restrict ourselves to single scattering when rendering volumetric. This is already challenging to get right. (单看一条)


When a light surface interact with a surface, it is possible to evaluate the amount of light bounce to the camera by evaluating for example a BRDF. But in the presence of participating media, things get more complex. (一条光线与物理世界的交互是很复杂的)


  1. You have to take into account transmittance when the light is traveling through the media(考虑光源到物体的传输介质影响)
  2. Then you need to integrate the scattered light along the view ray by taking many samples(物体表面整合散射过来的光)
  3. For each of these samples, you also need to take into account transmittance to the view point(考虑光从物体到相机的传输介质的影响)
  4. You also need to integrate the scattered light at each position(相机各个位置收集所有散射结果)
  5. And take into account phase function, regular shadow map (opaque objects) and volumetric shadow map (participating media and other volumetric entity)(考虑相位函数,普通阴影贴图(不透明的物体)和体积阴影贴图(与会媒体和其他体积实体))








  • Clip Space Volumes


Frustum aligned 3D textures [Wronski14]

  • Frustum voxel in world space => Froxel J

As in Wronski, All our volumes are 3d textures that are clip space aligned (such voxels become Froxels in world space, Credit Alex Evans and Sony ATG J, see Learning from Failure: a Survey of Promising, Unconventional and Mostly Abandoned Renderers for ‘Dreams PS4′, a Geometrically Dense, Painterly UGC Game’, Advances in Real-Time Rendering course, SIGGRAPH 2015).


Note: Frostbite is a tiled-based deferred lighting(平铺的延迟光照)

  • 16×16 tiles with culled light lists


Align volume tiles on light tiles

  • Reuse per tile culled light list
  • Volume tiles can be smaller (8×8, 4×4, etc.)
  • Careful correction for resolution integer division


This volume is also aligned with our screen light tiles. This is because we are reusing the forward light tile list culling result to accelerate the scattered light evaluation (remember, Frostbite is a tile based deferred lighting engine).


Our volume tiles in screen space can be smaller than the light tiles (which are 16×16 pixels).


By default we use

Depth resolution of 64

8×8 volume tiles


720p requires 160x90x64 (~7mb per rgbaF16 texture)

1080p requires 240x135x64 (~15mb per rgbaF16 texture)




  • Data flow



This is an overview of our data flow.

We are using clip space volumes(使用裁剪空间体素) to store the data at different stages of our pipeline.


We have material properties(材料特性) which are first voxelised from participating media entities.


Then using light sources of our scene(场景光源) and this material property volume(材料特性体素) we can generate scattered light data per froxel. This data can be temporally upsampled to increase the quality. Finally, we have an integration(积分) step that prepares the data for rendering.


  1. Participating media material definition (对应图上第一部分)


Follow the theory [PBR]

  • Absorption 𝝈𝒂 (m^-1) 【吸收】

Absorption describing the amount of light absorbed by the media over a certain path length

  • Scattering 𝝈𝒔 (m^-1) 【散射】

Scattering describing the amount of light scattered over a certain path length

  • Phase 𝒈 【相位】

And a single lobe phase function describing how the light bounces on particles (uniformly, forward scattering, etc.). It is based on Henyey-Greenstein (and you can use the Schlick approximation).

  • Emissive 𝝈𝒆 (irradiance.m-1) 【自发光】

Emissive describing emitted light

  • Extinction 𝝈𝒕 = 𝝈𝒔 + 𝝈𝒂 【消失】
  • Albedo 𝛒 = 𝝈𝒔 / 𝝈𝒕 【返照光】


Artists can author {absorption, scattering} or {albedo, extinction}

  • Train your artists! Important for them to understand their meaning!

As with every physically based component, it is very important for artists to understand them so take the time to educate them.




Participating Media(PM) sources

  • Depth fog
  • Height fog
  • Local fog volumes
    • With or W/o density textures


Depth/height fog and local fog volumes are entities(实体的) that can be voxelized. You can see here local fog volumes as plain or with varying density(密度) according to a density texture.


下面解释 数据结构及存储。


Voxelize PM properties into V-Buffer

  • Add Scattering, Emissive and
  • Average Phase g (no multi lobe)
  • Wavelength independent 𝝈𝒕 (for now)


We voxelize(体素化) them into a Vbuffer analogous(类似的) to screen Gbuffer but in Volume (clip space). We basically add all the material parameters together since they are linear. Except the phase function which is averaged. We only also only consider a single lobe for now according to the HG phase function.


We have deliberately(故意) chosen to go with wavelength independent(波长无关) extinction(消失) to have cheaper volumes (material, lighting, shadows). But it would be very easy to extend if necessary at some point.


Supporting emissive is an advantage for artist to position local fog volume that emit light as scattering would do but that do not match local light. This can be used for cheap ambient lighting. (自发光是可选项)




V-Buffer (per Froxel data)





Scattering R

Scattering G

Scattering B



Emissive R

Emissive G

Emissive B

Phase (g)




  1. 1 Froxel integration (对应图上第二部分)


Per froxel

  • Sample PM properties data
  • Evaluate
    • Scattered(稀疏的) light 𝑳𝒔𝒄𝒂𝒕(𝒙𝒕,𝝎𝒐)
    • Extinction


For each froxel, one thread will be in charge of gathering scattered light and extinction.


Extinction is simply copied over from the material. You will see later why this is important for visual quality in the final stage (to use extinction instead of transmittance for energy conservative scattering). Extinction is also linear so it will be better to temporally integrate it instead of the non linear transmittance value. (线性的 Extinction就够了)


Scattered light:

  • 1 sample per froxel
  • Integrate all light sources: indirect light + sun + local lights





Indirect light on local fog volume

  • From Frostbite diffuse SH light probe
    • 1 probe(探测) at volume centre
    • Integrate w.r.t. phase function as a SH cosine lobe [Wronski14]


Then we integrate the scattered light. One sample per froxel.


We first integrate ambient the same way as Wronski. Frostbite allows us to sample diffuse SH light probes. We use one per local fog volume positioned at their centre.


We also integrate the sun light according to our cascaded shadow maps. We could use exponential(指数) shadow maps but we do not as our temporal up-sampling is enough to soften the result.


You can easily notice the heterogeneous nature of the local fog shown here.



Local lights

  • Reuse tiled-lighting code
  • Use forward tile light list post-culling
  • No scattering? skip local lights


We also integrate local lights. And we re-use the tile culling(平铺剔除) result to only take into account lights visible within each tile.

One good optimisation is to skip it all if you do not have any scattering possible according to your material properties.



  • Regular shadow maps
  • Volumetric shadow maps


Each of these lights can also sample their associated shadow maps. We support regular shadow maps and also volumetric shadow maps (described later).



  1. 2 Temporal volumetric integration (对应图上第二部分)




scattering/extinction sample per frame

  • Under sampling with very strong material
  • Aliasing under camera motion
  • Shadows make it worse


As I said, we are only using a single sample per froxel.


aliasing (下面两个视频见投影片,很明显的aliasing)

This can unfortunately result in very strong aliasing for very thick participating media and when integrating the local light contribution.



You can also notice it in the video, as well as very strong aliasing of the shadow coming from the tree.



解决:Temporal integration(时间积分)

To mitigate these issues, we temporally integrate our frame result with the one of previous frame. (well know, also used by Karis last year for TAA).


To achieve this,

we jitter our samples per frame uniformly along the view ray

The material and scattered light samples are jittered using the same offset (to soften evaluated material and scattered light)

Integrate (集成) each frame according to an exponential(指数) moving average

And we ignore previous result in case no history sample is available (out of previous frustum)


Jittered samples (Halton)

Same offset for all samples along view ray

Jitter scattering AND material samples in sync


Re-project previous scattering/extinction

5% Blend current with previous

Exponential moving average [Karis14]

Out of Frustum: skip history






This is great and promising but there are several issues remaining:


Local fog volume and lights will leave trails when moving

One could use local fog volumes motion stored in a buffer the same way as we do in screenspace for motion blur

But what do we do when two volumes intersect? This is the same problem as deep compositing

For lighting, we could use neighbour colour clamping but this will not solve the problem entirely


This is an exciting and challenging R&D area for the future and I’ll be happy to discuss about it with you if you have some ideas J


  1. Final integration



Integrate froxel {scattering, extinction} along view ray

  • Solves {𝑳𝒊(𝒙,𝝎𝒐), 𝑻𝒓(𝒙,𝒙𝒔)} for each froxel at position 𝒙𝒔


We basically accumulate near to far scattering according to transmittance. This will solve the integrated scattered light and transmittance along the view and that for each froxel.



One could use the code sample shown here: accumulate scattering and then transmittance for the next froxel, and this slice by slice. However, that is completely wrong. Indeed there is a dependency on the accumScatteringTransmitance.a value (transmittance). Should we update transmittance of scattering first?





Non energy conservative integration: (非能量守恒的集成)


You can see here multiple volumes with increasing scattering properties. It is easy to understand that integrating scattering and then transmittance is not energy conservative.



We could reverse the order of operations. You can see that we get somewhat get back the correct albedo one would expect but it is overall too dark and temporally integrating that is definitely not helping here.



So how to improve this? We know we have one light and one extinction sample.


We can keep the light sample: it is expensive to evaluate and good enough to assume it constant on along the view ray inside each depth slice.


But the single transmittance is completely wrong. The transmittance should in fact be 0 at the near interface of the depth layer and exp(-mu_t d) at the far interface of the depth slice of width d.


What we do to solve this is integrate scattered light analytically according to the transmittance in each point on the view ray range within the slice. One can easily find that the analytical integration of constant scattered light over a definite range according to one extinction sample can be reduced this equation.

Using this, we finally get consistent lighting result for scattering and this with respect to our single extinction sample (as you can see on the bottom picture).


  • Single scattered light sample 𝑆=𝑳𝒔𝒄𝒂𝒕(𝒙𝒕,𝝎𝒐) OK
  • Single transmittance sample 𝑻𝒓(𝒙,𝒙𝒔) NOT OK


è Integrate lighting w.r.t. transmittance over froxel depth D



Also improves with volumetric shadows

You can also see that this fixes the light leaking we noticed sometimes for relatively large depth slices and strongly scattering media even when volumetric shadow are enabled.



Once we have that final integrated buffer, we can apply it on everything in our scene during the sky rendering pass. As it contains scattered light reaching the camera and transmittance, it is easy to apply it as a pre-multiplied colour-alpha on everything.


For efficiency, it is applied per vertex on transparents but we are thinking of switching this to per pixel for better quality.


  • {𝑳𝒊(𝒙,𝝎𝒐), 𝑻𝒓(𝒙,𝒙𝒔)} Similar to pre-multiplied color/alpha
  • Applied on opaque surfaces per pixel
  • Evaluated on transparent surfaces per vertex, applied per pixel




Result validation


Our target is to get physically based results. As such, we have compared our results against the physically based path tracer called Mitsuba. We constrained Mitsuba to single scattering and to use the same exposure, etc. as our example scenes.


Compare results to references from Mitsuba

  • Physically based path tracer
  • Same conditions: single scattering only, exposure, etc.


The first scene I am going to show you is a thick participating media layer with a light above and then into it.



You can see here the frostbite render on top and Mitsuba render at the bottom. You can also see the scene with a gradient applied to it. It is easy to see that our result matches, you can also recognize the triangle shape of scattered light when the point lights is within the medium.


This is a difficult case when participating media is non uniform and thick due to our discretisation of volumetric shadows and material representation. So you can see some small differences. But overall, it matches and we are happy with these first results and improve them in the future.



This is another example showing very good match for an HG phase function with g=0 and g=0,9 (strong forward scattering).





Sun + shadow cascade

14 point lights

  • 2 with regular & volumetric shadows

6 local fog volumes

  • All with density textures


PS4, 900p


Volume tile resolution



PM Material voxelization

0.45 ms

0.15 ms

Light scattering

2.00 ms

0.50 ms

Final accumulation

0.40 ms

0.08 ms

Application (Fog pass)

+0.1 ms

+0.1 ms


2.95 ms

0.83 ms


Light scattering components


Local lights

1.1 ms

+Sun scattering

+0.5 ms

+Temporal integration

+0.4 ms


You can see that the performance varies a lot depending on what you have enabled and the resolution of the clip space volumes.


This shows that it will be important to carefully plan what are the needs of you game and different scenes. Maybe one could also bake static scenes scattering and use the emissive channel to represent the scattered light for an even faster rendering of complex volumetric lighting.



  • Volumetric shadows


Volumetric shadow maps


We also support volumetric shadow maps (shadow resulting from voxelized volumetric entities in our scene)


To this aim, we went for a simple and fast solution


  • We first define a 3 levels cascaded clip map volume following and containing the camera.(定义三个跟随相机的体)
    • With tweakable per level voxel size and world space snapping
  • This volume contains all our participating media entities voxelized again within it (required for out of view shadow caster, clip space volume would not be enough)
  • A volumetric shadow map is defined as a 3D texture (assigned to a light) that stores transmittance
    • Transmittance is evaluated by ray marching the extinction volume
    • Projection is chosen as a best fit for the light type (e.g. frustum for spot light)
  • Our volumetric shadow maps are stored into an atlas to only have to bind a single texture (with uv scale and bias) when using them.



Volumetric shadow maps are entirely part of our shared lighting pipeline and shader code.


Part of our common light shadow system

  • Opaque
  • Particles
  • Participating media


It is sampled for each light having it enabled and applied on everything in the scene (particles, opaque surfaces, participating media) as visible on this video.




Another bonus is that we also voxelize our particles.


We have tried many voxelization method. Point and its blurred version but this was just too noisy. Our default voxelization method is trilinear(三线性). You can see the shadow is very soft and there is no popping(抛出) visible.


We also have a high quality voxelization where all threads write all the voxels contained within the particle sphere. A bit brute force for now but it works when needed.


You can see the result of volumetric shadows from particle onto participating media in the last video.


(See bonus slides for more details)



Quality: PS4


Ray marching of 323 volumetric shadow maps

Spot light:         

0.04 ms

Point light:         

0.14 ms


1k particles voxelization

Default quality:         

0.03 ms

High quality:         

0.25 ms


Point lights are more expensive than spot lights because spot lights are integrated slice by slice whereas a full raytrace is done for each point light shadow voxels. We have ideas to fix that in the near future.


Default particle voxelization is definitely cheap for 1K particles.


  • More volumetric rendering in Frostbite


Particle/Sun interaction


  • High quality scattering and self-shadowing for sun/particles interactions
  • Fourier opacity Maps [Jansen10]
  • Used in production now



Our translucent(半透) shadows in Frostbite (see Andersson11) allows particles to cast shadows on opaque surfaces but not on themselves. This technique also did not support scattering.


We have added that support in frostbite by using Fourier opacity mapping. This allows us to have some very high quality coloured shadowing, scattering resulting in sharp silver lining visual effects as you can see on this screenshots and cloud video.


This is one special case for the sun (non unified) but it was needed to get that extra bit of quality were needed for the special case of the sun which requires special attention.


Physically-based sky/atmosphere


  • Improved from [Elek09] (Simpler but faster than [Bruneton08])
  • Collaboration between Frostbite, Ghost and DICE teams.
  • In production: Mirror’s Edge Catalyst, Need for Speed and Mass Effect Andromeda



We also have added support for physically based sky and atmosphere scattering simulation last year. This was a fruitful collaboration between Frostbite and Ghost and DICE game teams (Mainly developed by Edvard Sandberg and Gustav Bodare at Ghost). Now it is used in production by lots games such as Mirror’s Edge or Mass Effect Andromeda.


It is an improved version of Elek’s paper which is simpler and faster than Bruneton. I unfortunately have no time to dive into details in this presentation.


But in the comment I have time J. Basically, the lighting artist would define the atmosphere properties and the light scattering and sky rendering will automatically adapt to the sun position. When the atmosphere is changed, we need to update our pre-computed lookup tables and this can be distributed over several frame to limit the evaluation impact on GPU.


  • Conclusion


Physically-based volumetric rendering framework used for all games powered by Frostbite in the future


Physically based volumetric rendering

  • Participating media material definition
  • Lighting and shadowing interactions


A more unified volumetric rendering system

  • Handles many interactions
    • Participating media, volumetric shadows, particles, opaque surfaces, etc.


Future work


Improved participating media rendering

  • Phase function integral w.r.t. area lights solid angle
  • Inclusion in reflection views
  • Graph based material definition, GPU simulation, Streaming
  • Better temporal integration! Any ideas?
  • Sun volumetric shadow
  • Transparent shadows from transparent surfaces?



  • V-Buffer packing
  • Particles voxelization
  • Volumetric shadow maps generation
  • How to scale to 4k screens efficiently


For further discussions








[Lagarde & de Rousiers 2014] Moving Frostbite to PBR, SIGGRAPH 2014.

[PBR] Physically Based Rendering book, http://www.pbrt.org/.

[Wenzel07] Real time atmospheric effects in game revisited, GDC 2007.

[Mitchell07] Volumetric Light Scattering as a Post-Process, GPU Gems 3, 2007.

[Andersson11] Shiny PC Graphics in Battlefield 3, GeForceLan, 2011.

[Engelhardt10] Epipolar Sampling for Shadows and Crepuscular Rays in Participating Media with Single Scattering, I3D 2010.

[Miles] Blog post http://blog.mmacklin.com/tag/fog-volumes/

[Valliant14] Volumetric Light Effects in Killzone Shadow Fall, SIGGRAPH 2014.

[Glatzel14] Volumetric Lighting for Many Lights in Lords of the Fallen, Digital Dragons 2014.

[Hillaire14] Volumetric lights demo

[Lagarde13] Lagarde and Harduin, The art and rendering of Remember Me, GDC 2013.

[Wronski14] Volumetric fog: unified compute shader based solution to atmospheric solution, SIGGRAPH 2014.

[Karis14] High Quality Temporal Super Sampling, SIGGRAPH 2014.

[Jansen10] Fourier Opacity Mapping, I3D 2010.

[Salvi10] Adaptive Volumetric Shadow Maps, ESR 2010.

[Elek09] Rendering Parametrizable Planetary Atmospheres with Multiple Scattering in Real-time, CESCG 2009.

[Bruneton08] Precomputed Atmospheric scattering, EGSR 2008.