Advanced Game Tech All

Occlusion Culling – oxel



  1. Hierarchical Z-Buffer Occlusion Culling


Hierarchical Z-Buffer Culling Steps

  1. Bake step – Have your artists prepare occlusion geometry for things in the world that make sense as occluders, buildings, walls…etc. They should all be super cheap to render, boxes/planes. I actually ran across this paper, Geometric Simplification For Efficient Occlusion Culling In Urban Scenes ( ), that sounded like a neat way of automating the process.【生成遮挡体】
  2. CPU – Take all the occlusion meshes and frustum cull them.【遮挡体进入CPU
  3. GPU – Render the remaining occluders to a ‘depth buffer’. The depth buffer should not be full sized, in my code I’m using 512×256. There is a Frostbite paper that mentions using a 256×114-ish sized buffer for a similar solution to do occlusion culling. The ‘depth buffer’ should just be mip 0 in a full mip chain of render targets (not the actual depth buffer).【渲染depth buffer,不需要全尺寸】
  4. GPU – Now downsample the RT containing depth information filling out the entire mipchain. You’ll do this by rendering a fullscreen effect with a pixel shader taking the last level of the mip chain and down sampling it into the next, preserving the highest depth value in a sample group of 4 pixels. For DX11, you can just constrain the shader resource view so that you can both read and render from the same mip chain. For DX9 you’ll have to use StretchRect to copy from a second mip chain, since you can’t sample and render to the same mip chain in DX9. In my code I actually found a more optimized solution, by ping-ponging between 2 mip chains one containing even and the other odd levels, and a single branch in your shader code you can get around the overhead of having to do the StretchRect and just sample from a different mip chain based on the even/odd mip level you need.【对RT降采样获取深度信息】
  5. CPU – Gather all the bounding spheres for everything in your level that could possibly be visible.【收集所有的包围盒】
  6. GPU – DX11 send the list of bounds to a compute shader, which computes the screen space width of the sphere then uses the width to compute the mip level to sample from the HiZ map generated in step 4, such that the sphere covers no more than 2 pixels wide. So large objects in screen space will sample from very high values in the mip chain since they require a coarse view of the world. Whereas small objects in screen space will sample from very low values in the mip chain. In DX9 the process is basically the same, the only difference is that you’ll render a point list of vertices, that instead of a Float3 position are Float4 bounds (xyz = position, w = radius). You’ll also send down a stream of texcoords that will represent x/y pixel values of where to encode the results of the occlusion test for that bound. Instead of a compute shader you’ll process the vertices using a vertex shader, you’ll also need to use the pixel location provided in the texcoord stream to make sure the results of the test are written out to that point in a render target, and in a pixel shader you’ll need to do the sampling to test to see if it’s visible, and output a color like white for culled, black for visible.【将所有的包围盒,以及第四步生成的HiZ Map送入Compute Shader,来比较判断遮挡与否】
  7. CPU – Try to do some work on the CPU after the occluder rendering and culling process is kicked off, for me the entire process took about 0.74 ms of GPU time on a Radeon 5450, with 900 bounds. The overhead of generating the HiZ mip chain and dispatching the culling process is the real bottleneck though, there’s little difference between 900 bounds and 10,000 bounds.occluder rendering and culling 过程中的CPU工作开销】
  8. CPU – Read back the results. DX11 you’re just reading back a buffer output by a compute shader. For DX9 you’ll have to copy the render target rendered to in step 6 containing the checker pattern of black and white pixels and then iterate over the pixels on the CPU to know what is visible and what is hidden.【读取结果】


Hierarchical Z-Buffer Downsampling Code



Hierarchical Z-Buffer Culling Code:












  1. Hierarchical Z-Buffer Occlusion Culling – Shadows


目的就是通过Z-buffer裁剪阴影,见paper:CC Shadow Volumes



  1. Create a HZB from the point of view of the player.【从玩家视点创建HZB
  2. Create a HZB from the point of view of the shadow casting light.【从光源处创建HZB
  3. Cull the shadow casting objects using the lights HZB. 1. If the object is culled, mark it as culled.【裁剪objects
  4. If the object isn’t culled, we need to compute a rough approximation of the extents of shadow volume. You can think of this volume as being the minimum depth points on your bounding box (A-D) and the maximum depth sampled from where those points project onto the light’s HZB (E-H) (Figure 1). Now that you have a bounding volume for the shadow, you can transform that volume into the camera space of the player, and test it against the player’s HZB; just as if the extents were any other object you were attempting to cull with the HZB (Figure 2). If you can’t see the volume, you can write out that the caster is culled, otherwise write it as visible.【如果object没有被裁减,才需要计算shadow volume



  1. Hierarchical Z-Buffer Occlusion Culling – Generating Occlusion Volumes


occlusion volume 特征:

  1. Conservativeness – Doesn’t extend beyond the surface of the mesh
  2. Simplicity – The occlusion volume is made of very few triangles or is fast to render
  3. Volume Conservation – Closely matches the original mesh’s volume
  4. Dynamic – Some games have large movable occluders or destroyable walls


Normal methods of simplifying a mesh such as typical triangle simplification causes problems in both the area of conservativeness and volume conservation. In some cases you can use the physics collision mesh if available. One thing to be aware of is that when the physics volume is larger than the visual mesh it can cause false occlusions leading to things popping into view. Also not every mesh needs a physics volume nor are physics volumes always meshes. Will Vale’s talk from GDC 2011 covered using physics collision meshes pretty thoroughly for HZB-OC, you should check it out.【直接使用碰撞体会导致错误,原因是碰撞体一般都比对应的可见体大】




  1. Find all the voxels completely inside a mesh


The voxelization algorithms I ended up using comes from this paper Fast Parallel Surface and Solid Voxelization on GPUs [Schwarz 2010]. ( ) I used the 3.1 section for surface voxelization and the 4.1 section for solid voxelization. Both are required because a voxel is considered part of the solid volume if the center of the voxel is inside the mesh. We want the entire voxel to be inside the mesh though. So you have to remove the excess voxels from the solid set using the surface set.【方法来源】



My first implementation was in C++ and entirely on the CPU which took about 20 seconds to run on a mesh with ~10,000 triangles and a 503 voxel volume. In the version I’m presenting here I moved to C#, when I did that time went up to about 60 seconds. So I ended up porting a similar version of the CPU implementation to the GPU using OpenCL (Cloo ), on the same data it ran in about 6 seconds on my Nvidia 540M, with no attempt to optimize.CPU,GPU性能差异】


One unfortunate limitation of Schwarz’s solid voxelization algorithm it that it requires watertight meshes. The reason for this is you’re shooting a ray down a column of voxels until you intersect with a triangle. You’re then xor’ing the bit for each voxel signaling that this column of voxel’s is inside or outside. Voxels that are inside a mesh will be Xor’ed an odd number of times, where as outside voxels will be xored an even number of times. So if there is a hole in the mesh, you’d leave a incomplete trail of xor’s in that column, leaving every voxel from the last triangle to the bounding box marked as ‘inside’.【算法的问题】


Two papers I’ve run across that might contain solutions to this problem are An Automatic Hole-Filling Algorithm for Polygon Meshes and Complete Polygonal Scene Voxelization.【问题的解决方法】


  1. Find the voxel at the densest point in the volume



  1. Expand a box from this point until all sides collide with the mesh surface or another box


Once you’ve found the densest voxel you’re going to create a 1^3 voxel sized box at that location. You’ll then proceed to iteratively expand each side of the box in voxel space until you can’t expand any side of the box without entering an empty voxel or another box that has already been placed.【从最密的层向上找到最大的box】


  1. Repeat 2-3 until you’ve consumed X% of the total volume



  1. Filter small or useless boxes (Unimplemented)



  1. Use a Constructive Solid Geometry (CSG) algorithm to merge the boxes you create




  1. Oxel: A Tool for Occluder Generation