SIGGRAPH 15 – Learning from Failure: a Survey of Promising, Unconventional and Mostly Abandoned Renderers for ‘Dreams PS4’, a Geometrically Dense, Painterly UGC Game’

SIGGRAPH 15 – Learning from Failure: a Survey of Promising, Unconventional and Mostly Abandoned Renderers for ‘Dreams PS4’, a Geometrically Dense, Painterly UGC Game’


 
 

this talk is about showing you some of the approaches that we tried and failed to make stick for our project. if you’re looking for something to take away, hopefully it could be inspiration or some points are places to start, where we left off. I also just think it’s interesting to hear about failures, and the lessons learnt along the way. it’s a classic story of the random walk of R&D…

此片讲的是作者在这个领域的尝试和失败,希望能对你有所启发。

 
 


 
 

spoiler section! 搅局部分

================

this is where we’re headed if you didnt see it at e3 {e3 trailer}

https://www.youtube.com/watch?v=4j8Wp-sx5K0

 
 


back to the beginning

=====================

it all began with @antonalog doing an experiment with move controllers, and a DX11 based marching cubes implementation.

 
 

额外连接:

http://paulbourke.net/geometry/polygonise/

https://github.com/smistad/GPU-Marching-Cubes

 
 


here he is! this was on PC, using playstation move controllers. the idea was to record a series of add & subtraction using platonic shapes with simple distance field functions

PC平台使用PS手柄来实现UGC

方法: the idea was to record a series of add & subtraction using platonic shapes with simple distance field functions

 
 

 
 


we use (R to L) cubic strokes(笔触), cylinders, cones, cuboids, ellipsoids, triangular prisms, donuts, biscuits, markoids*, pyramids.

(markoids are named for our own mark z who loves them; they’re super ellipsoids with variable power for x,y,z)

 
 


here’s the field for the primitives…

 
 


we called each primitive an ‘edit’,

we support a simple list, not tree of CSG edits. 没有使用场景树

and models are made up of anything from 1 to 100,000 edits

with add, subtract or ‘color’ only, along with…

 
 


soft blend, which is effectively soft-max and soft-min functions.

 
 

 
 


here’s the field for the hard blend. 【硬混合】

 
 


and the soft. I’ll talk more about the function for this in a bit. note how nicely defined and distance-like it is, everywhere! 【软混合】

 
 


[timelapse of dad’s head, with randomised colours] he’s 8,274 edits.

(an side: MM artists Kareem, Jon B and Francis spent a LONG time developing artistic techniques like the ‘chiselled’ look you see above, with half made early versions of this tech. It’s their artistry which convinced us to carry on down this path. It can’t be understated how important it is when making new kinds of tools, to actually try to use them in order to improve them. Thanks guys!).

这里作者想说的是:工欲善其事必先利其器

 
 

Anyway:

the compound SDF function , was stored in 83^3 fp16 volume texture blocks, incrementally(渐近,增量) updated as new edits arrived. each block was independently meshed using marching cubes on the compute shader;

at the time this was a pretty advanecd use of CS( as evidenced by frequent compiler bugs/driver crashes) – many of the problems stemmed from issues with generating index buffers dynamically on the GPU. (这是现在相当高级的使用方式,使用中会频繁的编译错误和驱动崩溃,原因是动态生成IB用于GPU)

the tech was based on histopyramids(历史金字塔), which is a stream compaction(压缩) technique where you count the number of verts/indices each cell needs, iteratively halve(一半的) the resolution building cumulative ‘summed area’ tables, then push the totals back up to full resolution, which gives you a nice way to lookup for each cell where in the target VB/IB its verts should go. there’s lots of material online, just google it.

【解释做法:动态VB/IB修改,GPU处理】

 
 


the core idea of lists of simple SDF elements, is still how all sculptures are made in dreams, and is the longest living threads. this was the opposite of a failure! it was our first pillar in the game.

【作者游戏中最成功的第一支柱】

 
 

Anton worked with Kareem, our art director, to get some pretty cool gestural UI going too; there’s minimal UI intrusion so artists can get into flow state. I think he was planning to implement classic z-brush style pull/smear/bend modifications of the field – which is probably what some of you may have thought we did first- but luckily he didn’t. Why? welllll………..

【一开始想全盘模拟zbrush的操作方式,最后不是这样】

 
 


 
 

some early animation tests were done around this time to see what could be achieved – whether with purely with semi- or fully rigid pieces, or some other technique. The results were varied in quality and all over the place in art style – we didn’t know what we wanted to do, or what was possible; so we imagined lots of futures:

【一开始我们也不知道想要什么,也就不知道要用什么技术,我们想象了很多情况】

 
 


rigid-ish pieces (low resolution FFD deformer over rigid pieces):

【刚体块】

 
 


competing with that was the idea of animating the edits themselves. the results were quite compelling(引人注目) –

 
 


this was an offline render using 3DS Max’s blob mode to emulate soft blends. but it shows the effect.

【3ds max的软混合效果】

 
 


this was in Anton’s PC prototype, re-evaluating and re-meshing every frame in realtime.

【每一帧都需要re-evaluation & re-meshing】

 
 


and there was a visual high bar, which everyone loved, inspired by the work of legendary claymation animator & film maker jan svankmajer

【受粘土动画启发】

 
 


here we made stop motion by scrubbing(擦洗) through the edit history, time lapse style (just like the earlier dad’s head). and on a more complex head model… pretty expensive to re-evaluate every frame though!

【对于动作的每帧变化都re-evaluation很贵】

 
 


 
 

however to achieve this, the SDF would need to be re-evaluated every frame. in the first pc prototype, we had effectively added each edit one at a time to a volume texture – it was great for incremental edits, but terrible for loading and animation. the goal of dreams is for UGC to be minimal size to download, so we can’t store the SDF fields themselves anyway – we need a fast evaluator!

【但是为了达到效果,SDF就得那么做。在我们最开始的例子里面,我们可以有效的编辑 volume texture,但是对于动画和loading还不可以。我们需要一种快速的evaluator方法】

 
 


 
 

Nevertheless, a plan was forming! the idea was this

{‘csg’ edit list => CS of doom => per object voxels => meshing? => per object poly model => scene graph render! profit!

【思路】

 
 

Before getting to rendering, I’d like to talk about the CS of doom, or evaluator as we call it. The full pipeline from edit list to renderable data is 40+ compute shaders in a long pipeline, but the “CS of doom” are a few 3000+ instruction shaders chained together that make the sparse SDF output. fun to debug on early PS4 hardware!

【先来看 Constructive solid of doom(构造solid的厄运),性能问题严重 】

 
 

here are some actual stats on dispatch counts(调度数) for the a model called crystal’s dad to be converted from an edit list to a point cloud and a filtered brick tree:

eval dispatch count: 60

sweep dispatch count: 91

points dispatch count: 459

bricker dispatch count: 73

【调度数比较】

 
 


 
 

We had limited the set of edits to exclude domain deformation or any non-local effects like blur (much to the chagrin of z-brush experienced artists), and our CSG trees were entirely right leaning, meaning they were a simple list. Simple is good!

so in *theory* we had an embarrassingly parallel problem(尴尬的并行问题) on our hands. take a large list of 100k edits, evaluate them at every point in a ~1000^3 grid, mesh the result, voila! one object!

【基本版问题:100K大小的操作量 * 场景大小 1000^3 依次evaluation = 100 billion】

 
 


 
 

alas, that’s 100 billion evaluations, which is too many.

 
 


 
 

anton wrote the first hierarchical prototype, which consisted of starting with a very coarse voxel grid, say 4x4x4

{slide}

【改版一代:使用层次化的 voxel grid】

 
 


 
 

building a list of edits that could possibly overlap each voxel, and then iteratively refining the voxels by splitting them and shortening the lists for each.

【讲的是层次化结构怎么运作】

 
 


 
 

empty cells and full cells are marked early in the tree; cells near the boundary are split recursively(递归) to a resolution limit. (the diagram shows a split in 2×2, but we actually split by 4x4x4 in one go, which fits GCN’s 64 wide wavefronts and lets us make coherent scalar branches on primitive type etc) the decision to split a given cell and when not to, is really tricky(狡猾).

【决定什么时候细分cell非常不容易】

 
 

if you err on the ‘too little split’ side, you get gaps in the model. most of the renderering backends we were trying required at least 1 to 1.5 voxels of valid data on each side of the mesh.

if you err on the ‘too much split’ side, you can easily get pathological cases where the evaluator ends up doing orders of magnitude too much work.

 
 

Also, the splits must be completely seamless(无缝). The quality constraints are much, much more stringent than what you’d need for something like sphere tracing.

 
 

Both Anton and I had a crack at various heuristic evaluators(破解各种启发式评估), but neither was perfect. And it was made worse by the fact that even some of our base primitives, were pretty hard to compute ‘good’ distances for!

【这种方式的实现难度和理论缺陷都存在,不完美】

 
 


 
 

an aside on norms. everyone defaults to the L2 distance (ie x^2+y^2+z^2) because it’s the length we’re used to.

 
 


 
 

the L2 norm for boxes and spheres is easy. but the ellipsoid… not so much. Most of the public attempts at ‘closest point on an ellipsoid’ are either slow, unstable in corner cases, or both. Anton spent a LONG time advancing the state of the art, but it was a hard, hard battle.

【距离衡量:L2 norm: X^2 + Y^2 + Z^2】

 
 

Ellipsoid: https://www.shadertoy.com/view/ldsGWX

Spline: https://www.shadertoy.com/view/XssGWl

 
 


 
 

luckily, anton noticed that for many primitives, the max norm was simpler and faster to evaluate.

 
 

Insight from “Efficient Max-Norm Distance Computation and Reliable Voxelization” http://gamma.cs.unc.edu/RECONS/maxnorm.pdf

 
 

  • Many non-uniform primitives have much simpler distance fields under max norm, usually just have to solve some quadratics!
  • Need to be careful when changing basis as max norm is not rotation-invariant, but a valid distance field is just a scaling factor away

 
 

So evaluator works in max norm i.e. d = max(|x|,|y|,|z|). The shape of something distance ‘d’ away from a central origin in max norm is a cube, which nicely matches the shape of nodes in our hierarchy. 🙂

【距离衡量:Max norm: 简单快速】

【这里所说的距离衡量指的是soft计算会作用的范围】

 
 


 
 

Soft blend breaks ALL THE CULLING, key points:

– Soft min/max needs to revert(还原) to hard min/max once distance fields are sufficiently far apart(一旦距离场有足够的相距甚远) (otherwise you can never cull either side)

  • Ours is for some radius r: soft_min(a, b, r) { float e = max(r – abs(a – b), 0); return min(a, b) – e*e*0.25/r; }, credit to Dave Smith @ media molecule 【radius计算】
  • Has no effect once abs(a – b) > r 【两对象没有接触】
  • Need to consider the amount of ‘future soft blend’ when culling, as soft blend increases the range at which primitives can influence the final surface (skipping over lots of implementation details!) 【考虑soft融合方式时候的影响范围】
  • Because our distance fields are good quality, we can use interval arithmetic for additional culling (skipping over lots of implementation details!) 【影响范围由距离来衡量】

 
 


 
 

this is a visualisation of the number of edits affecting each voxel; you can see that the soft blend increases the work over a quite large area.

 
 

【下面考虑culling的效率比较】

 
 

however, compared to the earlier, less rigorous evaluators(缺少严格的评估), simon’s interval-arithmetic and careful-maxnorm-bounds was a tour-de-force of maths/engineering/long dependent compute shader chains/compiler bug battling.

 
 


 
 

thanks for saving the evaluator sjb!

 
 


 
 

STATS! for some test models, you can see a range of edits (‘elements’) from 600 – 53000 (the worst is around 120k, but thats atypical); this evaluates to between 1m and 10m surface voxels (+-1.5 of surface),

 
 


 
 

… the culling rates compared to brute force are well over 99%. we get 10m – 100m voxels evaluated per second on a ps4, from a model with tens of thousands of edits.

 
 


 
 

this is one of those models… (crystals dad, 8274 edits, 5.2m voxels)

 
 


 
 

…and this is a visualisation of the number of edits that touch the leaf voxels

 
 


 
 


 
 

moar (head40) (22k edits, 2.4m voxels)

note the colouring is per block, so the voxel res is much higher than the apparent color res in this debug view(voxel密度远高于颜色表示)

 
 


 
 

the meshes output from the blob prototype, as it was called, were generally quite dense – 2m quads at least for a large sphere, and more as the thing got more crinkly(皱巴巴). In addition, we wanted to render scenes consisting of, at very least, a ‘cloud’ of rigidly oriented blob-meshes.

 
 

at this point anton and I started investigating(调查) different approaches. anton looked into adaptive variants of marching cubes, such as dual marching cubes, various octree schemes, and so on. let’s call this engine – including the original histopyramids marching cubes, engine 1: the polygon edition.

 
 

从这边开始介绍引擎的mesh生成方法

 
 


 
 

here are some notes from the man himself about the investigations SDF polygonalization

 
 


 
 

【MC算法:网格太密,边没有棱角,slivers ,输出不对称代码不利于GPU实现】

Marching cubes: Well it works but the meshes are dense and the edges are mushy and there are slivers and the output makes for asymmetrical code in a GPU implementation.

 
 


 
 

I dont know if you can tell but that’s the wireframe!

oh no

 
 


 
 

【DC算法:GPU实现容易,但是不能保证sharp的sharp,smooth的smooth,对于边缘的流动变化没有对齐的方法】

Dual Contouring(双轮廓): Hey this is easy on GPU. Oh but it’s kind of hard to keep sharp edges sharp and smooth things smooth and it doesn’t really align to features for edge flow either.

 
 

http://www.frankpetterson.com/publications/dualcontour/dualcontour.pdf

‘Dual Contouring of Hermite Data’

Ju, Losasso, Schaefer and Warren

 
 


 
 

note the wiggly(扭动的) edge on the bottom left of the cuboid – really hard to tune the hard/soft heuristics when making animated deathstars.

【动画过程中不能保证边的正确变化,会扭动】

 
 


 
 

more complex model….

 
 


 
 

the DC mesh is still quite dense in this version, but at least it preserves edges.

【这例子里面虽然网格一样的密,但是可以保留边】

 
 

however it shows problems: most obviously, holes in the rotor due to errors in the evaluator we used at this stage (heuristic(启发式) culling -> makes mistakes on soft blend; pre simon eval!) – also occasionally what should be a straight edge ends up wobbly because it cant decide if this should be smooth or straight. VERY tricky to tune in the general case for UGC.

【但也显示出一些问题:有错误的洞,原来直的边变的摇晃】

 
 


 
 

【自交叉的解决】

ALSO! Oh no, there are self intersections! This makes the lighting look glitched – fix em:

 
 

http://www.cs.wustl.edu/~taoju/research/interfree_paper_final.pdf

‘Intersection-free Contouring on An Octree Grid’

Tao Ju, Tushar Udeshi

 
 


 
 

【不必要manifold解决】

Oh no, now it’s not necessarily manifold(合成), fix that.

 
 

http://faculty.cs.tamu.edu/schaefer/research/dualsimp_tvcg.pdf

Manifold Dual Contouring

Scott Schaefer, Tao Ju, Joe Warren

 
 


 
 

【这里可能包含个人因素,MC算法可能不是那么不好】

Oh no, it’s self intersecting again. Maybe marching cubes wasn’t so bad after all… and LOD is still hard (many completely impractical papers).

 
 


 
 

the ability to accumulate to an ‘append buffer’ via DS_ORDERED_COUNT * where the results are magically in deterministic order based on wavefront dispatch(调度) index* is …

magical and wonderful feature of GCN. it turns this…

【这没大明白】

 
 


(non deterministic vertex/index order on output from a mesher, cache thrashing(抖动) hell:)

【mesher的不确定的VB/IB输出顺序】

 
 


 
 

into this – hilbert ordered dual contouring! so much better on your (vertex) caches.

we use ordered append in a few places. it’s a nice tool to know exists!

【hilbert 顺序的DC,非常有用】

 
 


 
 

back to the story! the answer to Isla’s question is,

【你喜欢多边形么?】

 
 


 
 

no, I do not like polygons.

【不喜欢】

 
 

I mean, they are actually pretty much the best representation of a hard 2D surface embedded in 3D, especially when you consider all the transistors(晶体管) and brain cells(脑细胞) dedicated(专用) to them.

【多边形是最好的来描述3D对象内置的2D表面的方法】

 
 

but… they are also very hard to get right automatically (without a human artist in the loop), and make my head hurt. My safe place is voxels and grids and filterable representations.

【但是这种多边形很难自动化生成】

 
 

Plus, I have a real thing for noise, grain, ‘texture’ (in the non texture-mapping sense), and I loved the idea of a high resolution volumetric representation being at the heart of dreams. it’s what we are evaluating, after all. why not try rendering it directly? what could possibly go wrong?

【那么我们为什么不去直接渲染体素呢】

 
 

so while anton was researching DC/MC/…, I was investigating alternatives(调查替代方案).

 
 


 
 

there was something about the artefacts(工艺品) of marching cubes meshes that bugged me.

I really loved the detailed sculpts, where polys were down to a single pixel and the lower res / adaptive res stuff struggled(挣扎) in some key cases.

so, I started looking into… other techniques.

【这里作者提到他特别喜欢看高分辨率细节,那些每个多边形只作用于一个pixel了】

 
 


 
 

【体素billboard,特别适合作者想要的】

since the beginning of the project, I had been obsessed by this paper:

http://phildec.users.sourceforge.net/Research/VolumetricBillboards.php

by Philippe Decaudin, Fabrice Neyret.

 
 

it’s the spiritual(精神) precursor to gigavoxels, SVOs, and their even more recent work on prefiltered voxels. I became convinced(相信) around this time that there was huge visual differentiation to be had, in having a renderer based not on hard surfaces, but on clouds of prefiltered, possibly gassy looking, models. and our SDF based evaluator, interpreting the distances around 0 as opacities, seemed perfect. this paper still makes me excited looking at it. look at the geometric density, the soft anti-aliased look, the prefiltered LODs. it all fitted!

 
 


 
 

the paper contributed a simple LOD filtering scheme based on compositing ‘over’ along each axis in turn, and taking the highest opacity of the three cardinal directions. this is the spiritual precursor to ‘anisotropic’ voxels used in SVO. I love seeing the lineage of ideas in published work. ANYWAY.

【这paper还实现了相关的LOD算法,对作者来说特别有用】

 
 


 
 

the rendering was simple too: you take each rigid object, slice it screen-aligned along exponentially spaced z slices, and composite front to back or back to front. it’s a scatter-based, painters algorithm style volume renderer. they exploit the rasterizer to handle sparse scenes with overlapping objects. they also are pre-filtered and can handle transparent & volumetric effects. this is quite rare – unique? – among published techniques. it’s tantalising. I think a great looking game could be made using this technique.

【渲染方式也很简单:对于所有的对象按相机空间Z轴排列,然后由画家算法来渲染。他们利用光栅化处理重叠的对象在稀疏的场景,通过pre-filtered来处理透明和特效。所有技术都是现成的,特别适合用来做游戏。】

 
 

I have a small contribution – they spend a lot of the paper talking about a complex Geometry shader to clip the slices to the relevant object bounds. I wish it was still 2008 so I could go back in time and tell them you don’t need it! 😉 well, complex GS sucks. so even though I’m 7 years late I’m going to tell you anyway 😉

【原作者用大量篇幅来讲述一个复杂的几何shader来对对象裁减slices,我相信因为那是08年所以操作那么复杂,现在2015年了可以简单的重写一下,下面就是讲做法】

 
 


 
 

to slice an object bounded by this cube…

 
 


 
 

pick the object axis closest to the view direction, and consider the 4 edges of the cube along this axis.

 
 


 
 

generate the slices as simple quads with the corners constrained to these 4 edges,

 
 


 
 

some parts of the slice quads will fall outside the box. that’s what the GS was there for! but with this setup, we can use existing HW:

 
 


 
 

just enable two user clipping planes for the front and back of the object. the hardware clipping unit does all the hard work for you.

【现在可以硬件实现】

 
 


 
 

ANYWAY. this idea of volumetric billboards stuck with me. and I still love it.

 
 

fast forward a few years, and the french were once again rocking it.

http://maverick.inria.fr/Members/Cyril.Crassin/

Cyril Crassin, Fabrice Neyret, Sylvain Lefebvre (note: neyret is the secondary author on VBs) had put out gigavoxels.

 
 

this is the next precursor to SVOs. seen through the lens of the earlier VB work, I loved that it kept that pre-filtered look, the geometric density from having a densely sampled field. it layered on top a heirachical, sparse representation – matching very well the structure of our evaluator. hooray! however it dispensed with the large number of overlapping objects, which makes it less immediately applicable to Dreams/games. But I did implement a quick version of gigavoxels, here are some shots.

【原来版本主要应用于大量的重复对象,这样不能马上应用于我们的游戏,因此作者实现了一个quick gigavoxel 版本】

 
 


 
 

its impossible to resist domain repetition when you’re just raytracing a field…

【本游戏不存在大量重复对象】

 
 


 
 

add some lighting as per my earlier siggraph advances talk (2006 was it?), the sort of thing that has since been massively refined e.g. in the shadertoy community (sampling mip mapped/blurred copies of the distance field – a natural operation in gigavoxel land, and effectively cone tracing) I think it has a lovely alabaster look.

however it focussed on a single large field that (eye) rays were traced through, and I needed the kind of scene complexity of the earlier VB paper – a cloud of rigid voxels models.

【灯光处理,我们不需要光线跟踪,而是采用早期的一些做法。】

 
 

 
 


 
 

the idea is to take the brick tree from gigavoxels, but instead of marching rays from the eye, directly choose a ‘cut’ through the tree of bricks based on view distance (to get nice LOD), then rasterise each brick individually. The pixel shader then only has to trace rays from the edge of the bricks(砖) to any surface.

【直接根据view distance来LOD,然后每个brick独立光栅化,光线追踪从brick的边缘开始】

 
 

As an added advantage, the bricks are stored in an atlas(图册), but there is no virtual-texturing style indirection needed in the inner loop (as it is in gigavoxels), because each rastered cube explicitly(明确的) bounds each individual brick, so we know which bit of the atlas to fetch(取) from at VS level.

【纹理的处理:每一个cube明确绑定到独立的brick,因此我们知道从atlas的哪个位置去取】

 
 


 
 

here you can see the individual cubes that the VS/PS is shading. each represents an 8x8x8 little block of volume data, giga-voxels style. again: rather than tracing eye rays for the whole screen, we do a hybrid scatter/gather: the rasteriser scatters pixels in roughly the right places (note also that the LOD has been adapted so that the cubes are of constant screen space size, ie lower LOD cut of the brick tree is chosen in the distance) then the Pixelshader walks from the surface of the cubes to the SDF surface.

【你看到的每一个独立的cube,包含64个 little block of volume data(gigavoxel)。我们不采用反向光线追踪,而是hybird scatter/gather: LOD后的cube尺寸(这就是hybird的概念)光栅化,然后通过pixelshader对cube表面转化成SDF表面】

 
 

also, I could move the vertices of the cubes around using traditional vertex skinning techniqes, to get animation and deformation… oh my god its going to be amazing!

【使用传统的vertex skinning技术来移动vubes顶点来做动画,效果很好!】

 
 


 
 

(sorry for the bad screenshot – I suck at archiving my work)

It sort of amounts to POM/tiny raymarch inside each 8x8x8 cube, to find the local surface. with odepth to set the zbuffer.

it has the virtue of being very simple to implement.

【按视线深度对cube排序】

 
 


 
 

Because of that simplicity(简单), This technique actually ended up being the main engine a lot of the artists used for a couple of years; you’ll see a couple more shots later. So while the ‘bricks’ engine as it was known, went into heavy use, I really wanted more.

【因为简单好用,这项技术慢慢的流行起来】

 
 


 
 

I wasn’t happy! why not? I also wanted to keep that pre-filtered look from Volumetric Billboards. I felt that if we pursued(追求) just hard z buffered surfaces, we might as well just do polys, or at least, the means didn’t lead to a visual result that was different enough. so I started a long journey(旅程) into OIT.

【我还是不那么开心,我也想保持预过滤Volumetric Billboards。我感到如果我只追求z buffered surfaces,那么只适用多边形】

 
 


 
 

I immediately found that slicing every cube into 8-16 tiny slices, ie pure ‘VB’, was going to burn way too much fill rate.

so I tried a hybrid where: when the PS marched the 8x8x8 bricks, I had it output a list of fuzzy ‘partial alpha’ voxels, as well as outputting z when it hit full opacity. then all I had to do was composite the gigantic (10s of millions) of accumulated fuzzy samples onto the screen… in depth sorted order. Hmm

【我马上发现把每个cube切成8-16个微笑的切片,就是纯粹的”VB”,要烧掉太多填充率。因此采用hybird的方式:PS匹配64个bricks,我们需要它输出a list of fuzzy ‘partial alpha’ voxels,全透明的时候输出Z。那么我要做的就是在depth排序的基础上复合积累模糊样本到屏幕上。】

 
 


 
 

so it was ‘just’ a matter of figuring out how to composite all the non-solid voxels. I had various ground truth images, and I was particularly excited about objects overlapping each other with really creamy falloff(平滑衰减)

  • e.g. between the blue arch and the grey arch thats just the two overlapping and the ‘fuzz’ around them smoothly cross-intersecting.

【这里想搞清楚如何合成所有的non-solid voxels. overlap的对象可以平滑的衰减,下面就是在说这件事情】

 
 


 
 

and pre filtering is great for good LOD! this visualizes the pre-filtered mips of dad’s head, where I’ve added a random beard to him as actual geometry in the SDF.

【pre filtering 适用于LOD】

 
 


 
 

and here’s what it looks like rendered.

【rendered效果】

 
 


 
 

but getting from the too-slow ground truth to something consistently fast-enough was very, very hard.

【让一个事实上很慢的东西变快是非常非常难的】

prefiltering is beautiful, but it generates a lot of fuzz(模糊), everywhere. the sheer(绝对) number of non-opaque(不透明) pixels was getting high – easily 32x 1080p

【prefiltering 会导致模糊】

I spent over a year trying everything – per pixel atomic bubble sort, front k approximations, depth peeling(剥落)..

【我花了一年时间尝试每个pixel自动冒泡排序】

one thing I didn’t try because I didn’t think of it and it hadn’t been published yet, was McGuire style approximate commutative(可交换) OIT. however it wont work in its vanilla form

  • it turns out the particular case of a very ‘tight’ fuzz around objects is very unforgiving of artefacts
  • for example, if adjacent pixels in space or time made different approximations (eg discarded or merged different layers), you get really objectionable visible artefacts.

【有一个我没有想到的也没有出版的是 McGuire style approximate commutative OIT。但因该也不管用】

 
 


 
 

it’s even worse because the depth complexity changes drastically(大幅度) over 2 orders of magnitude between pixels that hit a hard back and ‘edge on’ pixels that spend literally hundred of voxels skating through fuzz. this is morally the same problem that a lot of sphere tracing approaches have, where edge pixels are waaaay harder than surface pixels.

【它甚至会在边界深度大幅变化的情况下更糟糕更模糊】

 
 

I did have some interesting CS load balancing experiments(负载均衡实验), based on wavefronts peeling off 8 layers at a time, and re-circulating pixels for extra passes that needed it a kind of compute shader depth peel(剥离) but with load balancing its goal.

 
 


 
 

【sort/merge足够多的层次,效果好】

here’s a simpler case. fine when your sort/merge algo has enough layers. but if we limit it to fewer blended voxels than necessary…

 
 


 
 

【如果我们限制更少的体素混合,会出错】

I couldn’t avoid ugly artefacts.

 
 

in the end, the ‘hard’ no-fuzz/no-oit shader was what went over the fence to the designers, who proceeded to work with dreams with a ‘hard’ look while I flailed in OIT land.

【最后,清晰的没有模糊没有OIT的shader那是梦想,不是现实】

 
 


 
 

see what I mean about failure?

and this is over the period of about 2 years, at this point

【到此我们花了超过两年时间】

 
 


 
 

I think this is a really cool technique, its another one we discarded(丢弃) but I think it has some legs for some other project.

I call it the refinement renderer.

【精致渲染,作者最后放弃使用了,但是可能很多项目会用到。】

 
 


 
 

there are very few screenshots of this as it didn’t live long, but its interestingly odd. have this sort of image in your mind for the next few slides. note the lovely pre-filtered AA, the soft direct lighting (shadows but no shadow maps!). but this one is pure compute, no rasterised mini cubes.

the idea is to go back to the gigavoxels approach of tracing eye rays through fuzz directly… but find a way to make it work for scenes made out of a large number of independently moving objects. I think if you squint(斜眼) a bit this technique shares some elements in common with what Daniel Wright is going to present in the context of shadows; however since this focuses on primary-ray rendering, I’m not going to steal any of his thunder! phew.

【idea是回到gigavoxel的方法,专注光线渲染】

 
 


 
 

a bit of terminology(术语) – we call post projection voxels(后投影像素) – that is, little pieces of view frustum- ‘froxels’ as opposed to square voxels. The term originated at the sony WWS ATG group, I believe.

if you look at a ray marcher like many of the ones on shadertoy , like iq’s famous cloud renderer, you can think of the ray steps as stepping through ‘froxels’.

【froxels定义:视锥体里面分割的小voxel】

 
 


 
 

https://www.shadertoy.com/view/XslGRr – clouds by iq

typically you want to step the ray exponentially so that you spend less time sampling in the distance.

 
 

Intuitively(直观的) you want to have ‘as square as possible’ voxels, that is, your step size should be proportional(成比例的) to the inverse of the projected side length, which is 1/1/z, or z. so you can integrate and you get slices spaced as t=exp(A*i) for some constant A (slice index i), or alternatively write it iteratively as t+=K*t at each step for some constant K.

【直观上你想要的是立方体形状的voxels,透视投影以后froxels在相机空间的数学上应该是成比例】

 
 


 
 

the only problem with this is that near the eye, as t goes to 0, you get infinitely small froxel slices. oh dear. if you look at iq’s cloud example, you see this line:

【问题在于非常靠近你眼睛的部分按照上面的方法会获得无限小,当然不能这么做,iq的做法如下:】

 
 


 
 

t += max(0.1,0.02*t);

which is basically saying, let’s have even slicing up close then switch to exponential after a while.

I’ve seen this empirically(经验) used a few times. here’s an interesting (?) insight. what would real life do? they dont have pinhole cameras.

【做法就是限制最小值】

 
 


 
 

so, consider a thin lens DOF model for a second. what if you tuned your froxel sampling rate not just for projected pixel size, but for projected bokeh(背景虚化) size. the projected bokeh radius is proportional to (z-f)/z, so we want A(z-f)/z + 1/z where A is the size in pixels of your bokeh at infinity. (the +1/z is the size of single ‘sharp’ pixel, i.e. the footprint of your AA filter)

【考虑镜头相机而不是针孔相机,镜头虚化效果】

 
 

if you put this together, you can actually compute two exponential slicing rates – one for in front of the focal plane, and one for behind.

at the focal plane, it’s the same step rate you would have used before, but in the distance it’s a little sparser, and near to the camera it’s WAY faster. extra amusingly, if you work through the maths, if you set A to be 1 pixel, then the constant in the ‘foreground’ exponential goes to 0 and it turns out that linear slicing is exactly what you want. so the empirical ‘even step size’ that iq uses, is exactly justified if you had a thin lens camera model with aperture such that bokeh-at-infinity is 1pixel across on top of your AA. neat! for a wider aperture, you can step faster than linear.

【同时考虑上面两个问题,你要考虑的是两个指数切片率:从焦平面到behand面。在焦平面就是不用考虑对焦的指数关系,和第一个问题一样。最终结果是焦平面周围比较密,两头稀疏。】

 
 


 
 

ANYWAY.

how does this relate to rendering lots of objects?

the idea I had was to borrow from the way the evaluator works. you start by dividing your frustum into coarse froxels. I chose 64th res, that is about 32×16 in x and y, with 32-64 in z depending on the far z and the DOF aperture. (blurrier dof = fewer slices needed, as in previous slides).

then you do a simple frustum vs object intersection test, and build a list per froxel of which objects touch it.

{pic}

 
 

【好开始考虑如何渲染大量的objects:首先切分frustum成coarse froxels,然后做相交测试找出所有的与object相交的froxels】

 
 


 
 

then, you recursively subdivide your froxels!

for each froxel, in a compute shader you split them into 8 children. as soon as your froxel size matches the size of gigavoxel prefiltered voxels, you sample the sparse octree of the object (instead of just using OBBs) to futher cull your lists.

【然后再对froxel细分8个孩子(八叉树),通过这种方式进一步缩小你的froxel列表】

 
 


 
 

as you get finer and finer, the lists get shorter as the object’s shape is more accurately represented. it’s exactly like the evaluator, except this time we have whole objects stored as gigavoxel trees of bricks (instead of platonic SDF elements in the evaluator), we don’t support soft blend, and our domain is over froxels, not voxels.

【分得越来越细则对你想标识的对象描述得越来越精确,直到每一个froxel都被gigavoxel树的一个brick来表示】

 
 


 
 

for the first few steps, I split every froxel in parallel using dense 3d volume textures to store pointers into flat tables of per froxel lists. however at the step that refines from 1/16th res to 8th res (128x64x128 -> 256x128x256) the dense pointer roots get too expensive so I switch to a 2d representation, where every pixel has a single list of objects, sorted by z.

the nice thing is that everything is already sorted coming out of the dense version, so this is really just gluing together a bunch of small lists into one long list per screen pixel.

each refine step is still conceptually splitting froxels into 8, but each pixel is processed by one thread, serially, front to back.

that also means you can truncate the list when you get to solid – perfect, hierarchical occlusion culling!.

SHOW ME THE PICTURES! OK

the results were pretty

【一开始我们采用3D volume texture来表示纹理,但是太耗资源,后来我们改用2d来表示:每个pixel有一个object列表,按z值排序。】

 
 

 
 


 
 

and the pre-filtered look is really special.

Look how yummy(美味) the overlap of the meshes is! Really soft, and there’s no ‘post’ AA there. It’s all prefiltered.

【这样对于overlap的区域得到了很好的效果】

 
 

so I did a bit of work on lighting; a kind of 3d extension of my siggraph 2006 advances talk.

【接下来是灯光的处理,是对我的06年的siggraph paper的高级拓展】

 
 


 
 

imagine this setup. this is basically going to be like LPV with a voxelized scene, except we use froxels instead of voxels, and we propagate(传播) one light at a time in such a way that we can smear(涂抹) light from one side of the frustum to another in a single frame, with nice quality soft shadows. ‘LPV for direct lights, with good shadows’, if you

will.

【其实这个场景基本就是a voxelized scene,只是我们用froxel的概念代替了voxel。那么基本做法也就和voxel类似。】

 
 


 
 

imagine a single channel dense froxel grid at low resolution, I think I used 256x128x256 with 8 bits per froxel. We will have one of those for the ‘density’ of the scene – defined everywhere inside the camera frustum.

– As a side effect of the refinement process I write that ‘density’ volume out, more or less for free. Now we are also going to have one extra volume texture for each ‘hero’ light. (I did tests with 4 lights).

STOP PRESS – as far as I can tell from the brilliant morning session by frostbite guys, they have a better idea than the technique I present on the next few slides. They start from the same place -a dense froxel map of ‘density’, as above, but they resample it for each light into a per light 32^3 volume, in light-space. then they can smear density directly in light space. This is better than what I do over the next few slides, I think. See their talk for more!

【考虑一种低分辨率下的场景,只考虑视域范围内的东西】

 
 


 
 

To wipe(擦拭) the light around, you set the single froxel where the light is to ‘1’ and kick a compute shader in 4 froxel thick ‘shells’ radiating out from that central light froxel.

(with a sync between each shell). Each thread is a froxel in the shell, and reads (up to) 4 trilinear taps from the density volume, effectively a short raycast towards the light.

Each shell reads from the last shell, so it’s sort of a ‘wipe’ through the whole frustum.

【考虑从光源所在位置向外发散,直到扩散到视域内的每一个区域。】

 
 


 
 

here come the shells! each one reads from the last. yes, there are stalls(摊位,档位). no, they’re not too bad as you can do 4 lights and pipeline it all.

 
 


 
 


 
 

The repeated feedback causes a pleasant blur in the propagated shadows.

it’s like LPV propagation, except that it’s for a single light so you have no direction confusion(混乱), and you can wipe from one side of the screen to the other with a frame, since you process the froxels strictly in order radiating out from the light.

You can jitter the short rays to simulate area lights. You do 4 lights at once, to overlap the syncs, and you do it on an async pipe to mop up space on your compute units so the syncs don’t actually hurt that much. (offscreen lights are very painful to do well and the resolution is brutally low). However the results were pretty, and the ‘lighting’ became simple coherent volume texture lookups.

PICS PLZ:

【这可以在传播阴影的过程中有一个完美的模糊,这种方式类似LPV,只是不会方向混乱。】

【这里要看的是LPV是什么方法。】

 
 


 
 

Look ma! no shadowmaps!

would be super cool for participating media stuff, since we also have the brightness of every light conveniently stored at every froxel in the scene. I didn’t implement it

though….

【效果,秒杀shadowmap】

 
 


 
 

Ambient occlusion was done by simply generating mip-maps of the density volume and sampling it at positions offset from the surface by the normal, ie a dumb very wide cone trace. (大锥痕迹)

【Ambient occlusion效果也好】

 
 

The geometric detail and antialiasing was nice:

【几何细节效果也很好】

 
 


 
 

You could also get really nice subsurface effects by cone tracing the light volumes a little and turning down the N.L term:

【 subsurface effects 效果也很好】

 
 


 
 


 
 

However- the performance was about 4x lower than what I needed for PS4 (I forget the timings, but it was running at 30 for the scenes above ñ but only just! For more complex scenes, it just died). The lighting technique and the refinement engine are separate ideas, but they both had too many limitations and performance problems that I didn’t have time to fix.

【上面说了那么多,但是性能不行,白说】

 
 


 
 

(ie I still think this technique has legs, but I can’t make it work for this particular game)

in particular, since edge pixels could still get unboundedly ‘deep’, the refinement lists were quite varied in length, I needed to jump through quite a few hoops to keep the GPU well load balanced. I also should have deferred lighting a bit more – I lit at every leaf voxel, which was slow. however everything I tried to reduce (merge etc) led to visible artefacts. what I didn’t try was anything stochastic(随机). I had yet to fall in love with ‘stochastic all the things’…. definitely an avenue to pursue.

We were also struggling with the memory for all the gigavoxel bricks.

【目前的游戏未采用,但还是有一些改进方向分享给大家。每个pixel的list长度不一样,因此GPU处理的时候需要考虑balance。另外这种方法内存也是个问题。】

 
 


 
 

The nail in the coffin was actually to do with art direction.

【把棺材钉起来的活事实上都是艺术指导做的】

 
 

directly rendering the distance field sculptures was leaving very little to the imagination . So it was very hard to create ‘good looking’ sculptures; lots of designers were creating content that basically looked like untextured unreal-engine, or ‘crap’ versions of what traditional poly engines would give you, but slower. It was quite a depressing time because as you can see it’s a promising tech, but it was a tad too slow and not right for this project.

TL;DR:

this is the start of 2014. we’re 3 years in, and the engine prototypes have all been rejected, and the art director (rightly) doesn’t think the look of any of them suits the

game.

argh.

SO……..

【要达到好的效果目前性能是很大的问题。我们这货干了三年,中间很多的引擎版本都被艺术指导枪毙了。】

 
 


 
 

there was a real growing uneasiness(担心) in the studio. I had been working on OIT – refinement and sorting and etc for a LONG time; in the meantime, assets were being made using the ‘hard’ variant of the bricks engine, that simply traced each 8x8x8 rasterised brick for the 0 crossing and output raw pixels which were forward lit. at its best, it produced some lovely looking results (above) – but that was more the art than the engine! It also looked rather like ‘untextured poly engine’ – why were we paying all this runtime cost (memory & time) to render bricks if they just gave us a poly look?

【我们的工作越来越让人担心,尝试了很多艺术风格很好看但是引擎上面还是难度很大。】

 
 


 
 

also, there was a growing disparity(差距) between what the art department – especially art director kareem and artist jon – were producing as reference/concept work. it was so painterly!

【艺术总监的理想与我们的现实差距越来越大】

there was one particular showdown with the art director, my great friend kareem, where he kept pointing at an actual oil painting and going ‘I want it to look like this’ and I’d say ‘everyone knows concept art looks like that but the game engine is a re-interpretation of that’ and kareem was like ‘no literally that’. it took HOURS for the penny to drop, for me to overcome my prejudice.

【比如他想要油画风格然后我说我也想要,但是引擎实现不了】

 
 


 
 

So after talking to the art director and hitting rock bottom in January 2014, he convinced me to go with a splat based engine, intentionally made to look like 3d paint strokes. I have a strong dislike of ‘painterly post fx’ especially 2d ones, so I had resisted this direction for a looooooooooong time.

(btw this is building on the evaluator as the only thing that has survived all this upheaval)

【因此到了2014年1月,我们开始搞 a splat based engine。故意把它弄的像 3d paint strokes。算是一种妥协吧】

 
 


 
 

I had to admit that for our particular application of UGC, it was *brutal(野蛮的)* that you saw your exact sculpture crisply(简明的) rendered, it was really hard to texture & model it using just CSG shapes. (we could have changed the modelling primitives to include texturing or more noise type setups, but the sculpting UI was so loved that it was notmovable. The renderer on the other hand was pretty but too slow, so it got the axe instead).

【我们采用了简明的渲染风格,是因为性能和实现方面的考虑】

 
 

So I went back to the output of the evaluator, poked simon a bit, and instead of using the gigavoxel style bricks, I got point clouds, and had a look at what I could do.

There’s a general lesson in here too – that tech direction and art direction work best when they are both considered, both given space to explore possibilities; but also able to give different perspectives on the right (or wrong) path to take.

【采用上面的评估结果来实现,同时这边也学到了:技术和艺术导演同时考虑好了以后工作最好,他们会从不同的角度去考虑最好的选择。】

 
 


 
 

So! now the plan is: generate a nice dense point cloud on the surface of our CSG sculpts.

EVERYTHING is going to be a point cloud. the SDF becomes an intermediate representation, we use it to spawn the points at evaluation time, (and also for collision. But thats another talk)

【接下来我们的计划是:在我们雕塑的表面生成密集的点云。】

【SDF又是要看的技术点】

 
 

we started from the output of the existing evaluator, which if you remember was hierarchically refining lists of primitives to get close to voxels on the surface of the SDF. as it happens, the last refinement pass is dealing in 4x4x4 blocks of SDF to match GCN wavefronts of 64 threads.

【层次细化,最后的4*4*4则匹配64个线程执行。】

 
 


 
 

We add one point to the cloud per leaf voxel (remember, thats about 900^3 domain, so for example, a sphere model will become a point cloud with diameter 900 and one point per integer lattice cell that intersects the sphere surface)

【对于每一个叶节点的voxel add a point】

 
 

actually we are using a dual grid IIRC so that we look at a 2x2x2 neighbourhood of SDF values and only add points where there is a zero crossing.

So now we have a nice fairly even, dense point cloud. Since the bounding voxel grid is up to around 900^3 voxels -> around 2 million surface voxels -> around 2 million points.

【最终我们将得到匹配边界表面的密集的点云】

 
 


 
 

The point cloud is sorted into Hilbert order (actually, 4^3 bricks of voxels are in Hilbert order and then the surface voxels inside those bricks are in raster order, but I digress) and cut into clusters of approximately 256 points (occasionally there is a jump in the hilbert brick order so we support partially filled clusters, to keep their bounding boxes tight).

【点云是按照Hilbert order排序好的,然后切成点集群,每个包含大约256个点。】

 
 


 
 

Each cluster is tightly bounded in space, and we store for each a bounding box, normal bounds. then each point within the cluster is just one dword big, storing bitpacked pos,normal,roughness, and colour in a DXT1 texture. All of which is to say, we now have a point cloud cut into lumps of 256 points with a kind of VQ compression per point. We also compute completely independent cluster sets for each LOD – that is, we generate point clouds and their clusters for a ‘mip pyramid’ going from 900 voxels across, to 450, to 225, etc.

【每个簇紧贴空间边界,我们存储其包围盒和normal bounds。簇中每一个点也要存一些信息。这样我们就把点云层次化了,簇还可以用来实现LOD,用来压缩数据和提高性能。】

 
 


 
 

I can’t find many good screenshots but here’s an example of the density, turned down by a factor of 2x to see what’s going on.

【density的例子】

 
 

my initial tests here were all PS/VS using the PS4 equivalent of glPoint. it wasn’t fast, but it showed the potential. I was using russian roulette(俄罗斯轮盘赌) to do ‘perfect’ stochastic LOD, targeting a 1 splat to 1 screen pixel rate , or just under.

【一开始我们尝试的方法不够快,所以我们尝试了LOD的方式,俄罗斯轮盘赌是用来LOD的删点机制】

 
 

At this point we embraced(拥抱) TAA *bigtime* and went with ‘stochastic all the things, all the time!’. Our current frame, before TAA, is essentially verging on white noise. It’s terrifying. But I digress!

【这里一个关键点是我们采用了TAA技术,没有采用TAA会导致存在很大的Noise,效果不可接受。】

 
 


 
 

for rendering, we arranged the clusters for each model into a BVH. we also computed a separate point cloud, clustering and BVH for each mipmap (LOD) of the filtered SDF. to smooth the LOD transitions, we use russian roulette to adapt the number of points in each cluster from 256 smoothly down to 25%, i.e. 256 down to 64 points per cluster, then drop to the next LOD.

simon wrote some amazingly nicely balanced CS splatters that hierarchically culled and refined the precomputed clusters of points, computes bounds on the russian roulette rates, and then packs reduced cluster sets into groups of ~64 splats.

【再次解释了一遍LOD,一次删点到原来的25%】

【SDF,BVH要译者注】

 
 

so in this screenshot the color cycling you can see is visualizing the steps through the different degrees of decimation(不同程度的抽取), from <25%, <50%, <75%, then switching to a completely different power of 2 point cloud;

【就是删点程度不同的效果展示】

 
 


 
 

What you see is the ‘tight’ end of our spectrum. i.e. the point clouds are dense enough that you see sub pixel splats everywhere. The artist can also ‘turn down’ the density of points, at which point each point becomes a ‘seed’ for a traditional 2d textured quad splat. Giving you this sort of thing:

【首先看到的是我们的范围的’严密’端,就是不删。】

 
 


 
 


 
 

We use pure stochastic transparency(纯随即透明度), that is, we just randomly discard pixels based on the alpha of the splat, and let TAA sort it out. It works great in static scenes.

However the traditional ‘bounding box in color space’ to find valid history pixelsí starts breaking down horribly with stochastic alpha, and we have yet to fully solve that.

So we are still in fairly noisy/ghosty place. TODO!

We started by rendering the larger strokes – we call them megasplats – as flat quads with the rasterizer. thats what you see here, and in the E3 trailer.

【随机删的效果】

 
 


 
 

Interestingly , simon tried making a pure CS ‘splatting shader’, that takes the large splats, and instead of rasterizing a quad, we actually precompute a ‘mini point cloud’ for the splat texture, and blast(爆破) it to the screen using atomics, just like the main point cloud when it’s in ‘microsplat’ (tight) mode.

【继续废话,不用管】

 
 


 
 

So now we have a scene made up of a whole cloud of sculpts…

【到此为止我们全部采用点云来刻画场景】

 
 


 
 

which are point clouds,

 
 


 
 

and each point is itself, when it gets close enough to the camera, an (LOD adapted) ‘mini’ point cloud – Close up, these mini point clouds representing a single splat get ‘expanded’ to a few thousand points (conversely, In the distance or for ‘tight’ objects, the mini points clouds degenerate to single pixels).

Amusingly, the new CS based splatter beats(飞溅的节拍) the rasterizer due to not wasting time on all the alpha=0 pixels. That also means our ‘splats’ need not be planar any more, however, we don’t yet have an art pipe for non-planar splats so for now the artists don’t know this! Wooahaha!

【并且采用了点云LOD技术来提高效率:根据点位置来判断其渲染方式,对于透明的pixel不去浪费时间处理。】

 
 


 
 

That means that if I were to describe what the current engine is, I’d say it’s a cloud of clouds of point clouds. 🙂

【如果让我来描述引擎的特点: it’s a cloud of clouds of point clouds】

 
 


 
 

Incidentally, this atomic based approach means you can do some pretty insane things to get DOF like effects: instead of post blurring, this was a quick test where we simply jittered the splats in a screenspace disc based on COC, and again let the TAA sort it all out.

It doesn’t quite look like blur, because it isn’t – its literally the objects exploding a little bit – but it’s cool and has none of the usual occlusion artefacts 🙂

【顺便提一下,这种原子级的技术给与了特别高的自由度,比如实现特效也特别方便。像这个景深特效效果就非常好。】

 
 

We’ve left it in for now as our only DOF.

【我们将它留在现在作为我们唯一的自由度。】

 
 


 
 

I should at this point pause to give you a rough outline of the rendering pipe – it’s totally traditional and simple at the lighting end at least.

We start with 64 bit atomic min (== splat of single pixel point(单个像素点的图示)) for each point into 1080p buffer, using lots of subpixel jitter and stochastic(随机) alpha. There are a LOT of points to be atomic-min’d! (10s of millions per frame) Then convert that from z+id into traditional 1080 gbuffer, with normal, albedo, roughness, and z. then deferred light that as usual.

Then, hope that TAA can take all the noise away. 😉

【到此展示一下渲染流水线,其简单和传统。】

【对于每个点 64 bit 来表示的时候处理subpixel jitter和随机透明度(就是上面讲的过程),然后把z+id转到传统的gbuffer(with normal, albedo, roughness, and z),再采用光照,最后noise交给TAA处理。】

 
 


 
 

I’m not going to go into loads of detail about this, since I don’t have time, but actually for now the lighting is pretty vanilla – deferred shading, cascaded shadow map sun.

there are a couple of things worth touching on though.

【这边没时间将太多细节了关于光照和阴影】

【这里是 the lighting is pretty vanilla – deferred shading, cascaded shadow map sun 的效果】

 
 


 
 

ISMs: Now we are in loads-of-points land, we did the obvious thing and moved to imperfect shadow maps. We have 4 (3?) cascades for a hero sun light, that we atomicsplat into and then sample pretty traditionally (however, we let the TAA sort out a LOT of the noise since we undersample and undersplat and generally do things quite poorly)

【阴影进化效果:ISM 】

 
 

We have a budget of 64 small (128×128) shadowmaps, which we distribute over the local lights in the scene, most of which the artists are tuning as spotlights. They are brute force splatted and sampled, here were simonís first test, varying their distribution over an area light:

【我们对于shadowmaps的做法,数量大小和使用方法】

 
 


 
 

these images were from our first test of using 64 small ISM lights, inspired by the original ISM paper and the ‘ManyLODs’ paper. the 3 images show spreading a number of low quality lights out in an area above the object.

【ISM我们尝试的两种方法】

 
 

Imperfect Shadow Maps for Efficient Computation of Indirect Illumination

T. Ritschel, T. Grosch, M. H. Kim, H.-P. Seidel, C. Dachsbacher, J. Kautz

http://resources.mpi-inf.mpg.de/ImperfectShadowMaps/ISM.pdf

 
 

ManyLoDs http://perso.telecom-paristech.fr/~boubek/papers/ManyLoDs/

Parallel Many-View Level-of-Detail Selection for Real-Time Global Illumination

Matthias Holländer, Tobias Ritschel, Elmar Eisemann and Tamy Boubekeur

 
 


 
 

I threw in solid-angle esque equi-angular sampling of participating media for the small local lights. See https://www.shadertoy.com/view/Xdf3zB for example implementation. Just at 1080p with no culling and no speedups, just let TAA merge it. this one will DEFINITELY need some bilateral blur and be put into a separate layer, but for now It ís not:

【对灯光的采样方法:3D等角采样】

 
 


 
 

(just a visualisation classic paraboloid projection on the ISMs)

sorry for the quick programmer art, DEADLINES!

【ISM效果】

 
 


 
 

this ‘vanilla’ approach to lighting worked surprisingly well for both the ‘tight’ end… (single pixel splats, which we call microsplats)… as well as

【this ‘vanilla’ approach对于灯光的处理在 microsplats 和 gigasplates 一样的好】

 
 


 
 

…the loose end (‘megasplats’).

 
 


 
 

this was the first time I got specular in the game! two layers of loose splats, the inner layer is tinted red to make it look like traditional oil underpainting. then the specular hi lights from the environment map give a real sense of painterly look. this was the first image I made where I was like ‘ooooh maybe this isn’t going to fail!’

【我们第一次尝试在游戏中加入镜面光,两层的loose splats,里面一层加入红色元素模拟传统油画,外面那层镜面反射环境贴图来模拟真实的画家的感觉。】

 
 


 
 

At this point you’ll notice we have painterly sky boxes. I wanted to do all the environment lighting from this. I tried to resurrect my previous LPV tests, then I tried ‘traditional’ Kapalanyan style SH stuff, but it was all too muddy and didn’t give me contact shadows nor did it give me ‘dark under the desk’ type occlusion range.

【sky box:尝试了很多中,我们希望环境光从这里得到,但是这些方法最后都没错采用,因为引入光照阴影模型比较麻烦。】

 
 

For a while we ran with SSAO only, which got us to here (point clouds give you opportunities to do ridiculous geometrical detail, lol)

【最终只采用了SSAO】

 
 


 
 

the SSAO we started with was based on Morgan McGuire’s awesome alchemy spiral style SSAO, but then I tried just picking a random ray direction from the cosine weighted hemisphere above each point and tracing the z buffer, one ray per pixel (and let the TAA sort it out ;)) and that gave us more believable occlusion, less like dirt in

the creases.

【我们的SSAO:一开始是:Morgan McGuire’s awesome alchemy spiral style SSAO,对于ray的选择做了修改,为了使画面看起来更脏。】

 
 

From there it was a trivially small step to output either black (occluded) or sky colour (from envmap) and then do a 4×4 stratified dither. here it is without TAA (above).

However this is still just SSAO in the sense that the only occluder is the z buffer.

【SSAO without TAA】

 
 

(random perf stat of the atomic_min splatter: this scene shows 28.2M point splats, which takes 4.38ms, so thats about 640 million single pixel splats per second)

【效率统计】

 
 


 
 

For longer range, I tried voxelizing the scene – since we have point clouds, it was fairly easy to generate a work list with LOD adapted to 4 world cascades, and atomic OR each voxel – (visualised here, you can see the world space slices in the overlay) into a 1 bit per voxel dense cascaded volume texture

【一开始我们想要采用的方法是体素,最终我们采用了点云。容易LOD等行为。】

 
 


 
 

then we hacked the AO shader to start with the z buffer, and then switch to the binary voxelization, moving through coarser and coarser cascades. it’s cone-tracing like, in that I force it to drop to lower cascades (and larger steps), but all the fuzziness is from stochastic sampling rather than prefiltered mip maps. The effect is great for mid range AO – on in the left half, off in the right.

 
 

That gets us to more or less where we are today, rough and noisy as hell but extremely simple.I really like the fact you get relatively well defined directional occlusion(遮挡) , which LPV just can’t give you due to excessive diffusion(过度扩散).

 
 

【AO的细节:通过z buffer的随机采样的AO效果比pre filtered mip map的AO好很多。】

 
 


 
 

(at this point we’re in WIP land! like, 2015 time!)

The last test, was to try adding a low resolution world space cascade that is RGB emissive, and then gather light as the sky occlusion rays are marched. The variance is INSANELY high, so it isn’t usable, and this screenshot is WITH taa doing some temporal averaging! But it looks pretty cool. It might be enough for bounce light (rather than direct light, as above), or for extremely large area sources. I don’t know yet. I’m day dreaming about maybe making the emissive volume lower frequency (-> lower variance when gathered with such few samples) by smearing it around with LPV, or at least blurring it. but I haven’t had a chance to investigate.

【对于low resolution的世界空间的点云的自发光处理】

 
 


 
 

Oh wait I have! I just tried bilateral filtering and stratified sampling over 8×8 blocks, it does help a lot.

I think the general principle of z buffer for close, simple bitmask voxelization for further range gather occlusion is so simple that it’s worth a try in almost any engine. Our voxel cascades are IIRC 64^3, and the smallest cascade covers most of the scene, so they’re sort of mine-craft sized voxels or just smaller at the finest scale. (then blockier further out, for the coarser cascades). But the screenspace part captures occlusion nicely for smaller than voxel distances.

【做法就是filter 模糊: bilateral filtering and stratified sampling over 8×8 blocks】

 
 


 
 

another bilateral test pic. WIP 😉

 
 


 
 

and that’s pretty much where we are today!

as a palette cleanser, here’s some non-testbed, non-programmer art

 
 


 
 


 
 


 
 

It feels like we’re still in the middle of it all; we still have active areas of R&D; and as you can see, many avenues didn’t pan out for this particular game. But I hope that you’ve found this journey to be inspiring in some small way. Go forth and render things in odd ways!

【我们之做完了一半的工作量】

 
 


 
 

The artwork in this presentation is all the work of the brilliant art team at MediaMolecule. Kareem, Jon (E & B!), Francis, Radek to name the most prominent authors of the images in this deck. But thanks all of MM too! Dreams is the product of at least 25 fevered minds at this point.

And of course @sjb3d and @antonalog who did most of the engine implementation, especially of the bits that actually weren’t thrown away 🙂

Any errors or omissions are entirely my own, with apologies.

if you have questions that fit in 140 chars I’ll do my best to answer at @mmalex.

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

SIGGRAPH 15 – Physically Based and Unified Volumetric Rendering in Frostbite

作者:

Sebastien Hillaire – Electronic Arts / frostbite

sebastien.hillaire@frostbite.com

https://twitter.com/SebHillaire

 
 

 
 

  • introduction

 
 

Physically based rendering in Frostbite


基于物理渲染的效果非常好!

 
 

Volumetric rendering in Frostbite was limited

  • Global distance/height fog
  • Screen space light shafts
  • Particles

体素渲染在这里还是受到限制的,主要受限于这三点


 
 

 
 

Real-life volumetric 真实的体素


我们期望做到的就是自然界中的这些 云与大气层,雾,光线散射等效果

 
 

 
 

  • Related Work

 
 

Billboards

 
 

Analytic fog [Wenzel07]

Analytic light scattering(散射) [Miles]

特点:Fast,Not shadowed,Only homogeneous media

http://blog.mmacklin.com/2010/05/29/in-scattering-demo/

http://research.microsoft.com/en-us/um/people/johnsny/papers/fogshop-pg.pdf

http://indus3.org/atmospheric-effects-in-games/

 
 


 
 

Screen space light shaft 屏幕空间的光轴

  • Post process [Mitchell07]
  • Epipolar sampling [Engelhardt10]

特点

  • High quality
  • Sun/sky needs to be visible on screen
  • Only homogeneous media 均匀介质
  • Can go for Epipolar sampling but this won’t save the day

 
 


 
 

Splatting(泼溅)

  • Light volumes
    • [Valliant14][Glatzel14][Hillaire14]
  • Emissive volumes [Lagarde13]

This can result in high quality scattering but usually it does not match the participating media of the scene. (这种方法已经很常用了,但是相对独立处理)


 
 


 
 

 
 

Volumetric fog [Wronski14] 体积雾

  • Sun and local lights
  • Heterogeneous media

allowing spatially varying participating media and local lights to scatter.

spatially 参与 (scatter)散射,此做法与这边作者的想法一致

However it did not seem really physically based at the time and some features we wanted were missing.

缺点是不是很符合物理规则

 
 


 
 

 
 

  • Scope and motivation

 
 

Increase visual quality and give more freedom to art direction!(更好的视觉效果)

 
 

Physically based volumetric rendering (物理)

  • Meaningful material parameters
  • Decouple(去耦合) material from lighting
  • Coherent(一致性) results

We want it to be physically based: this means that participating media materials are decoupled from the light sources (e.g. no scattering colour on the light entities). Media parameters are also a meaningful set of parameters. With this we should get more coherent results that are easier to control and understand.

 
 

Unified volumetric interactions(交互)

  • Lighting + regular and volumetric shadows
  • Interaction with opaque, transparent and particles

Also, because there are several entities interacting with volumetric in Frostbite (fog, particles, opaque&transparent surfaces, etc). We also want to unify the way we deal with that to not have X methods for X types of interaction.

 
 


 
 

This video gives you an overview of what we got from this work: lights that generate scattering according to the participating media, volumetric shadow, local fog volumes, etc.

And I will show you now how we achieve it.

先秀结果(视频见投影片)


 
 

 
 

 
 

  • Volumetric rendering

 
 

  • Single Scattering

 
 

As of today we restrict ourselves to single scattering when rendering volumetric. This is already challenging to get right. (单看一条)

 
 

When a light surface interact with a surface, it is possible to evaluate the amount of light bounce to the camera by evaluating for example a BRDF. But in the presence of participating media, things get more complex. (一条光线与物理世界的交互是很复杂的)

 
 

  1. You have to take into account transmittance when the light is traveling through the media(考虑光源到物体的传输介质影响)
  2. Then you need to integrate the scattered light along the view ray by taking many samples(物体表面整合散射过来的光)
  3. For each of these samples, you also need to take into account transmittance to the view point(考虑光从物体到相机的传输介质的影响)
  4. You also need to integrate the scattered light at each position(相机各个位置收集所有散射结果)
  5. And take into account phase function, regular shadow map (opaque objects) and volumetric shadow map (participating media and other volumetric entity)(考虑相位函数,普通阴影贴图(不透明的物体)和体积阴影贴图(与会媒体和其他体积实体))

 
 

 
 


 
 


 
 

公式里面存在两个积分标识的就是上面2,4两条解释的散射整合。

求和表示的是sample光线

 
 

  • Clip Space Volumes

 
 

Frustum aligned 3D textures [Wronski14]

  • Frustum voxel in world space => Froxel J

As in Wronski, All our volumes are 3d textures that are clip space aligned (such voxels become Froxels in world space, Credit Alex Evans and Sony ATG J, see Learning from Failure: a Survey of Promising, Unconventional and Mostly Abandoned Renderers for ‘Dreams PS4′, a Geometrically Dense, Painterly UGC Game’, Advances in Real-Time Rendering course, SIGGRAPH 2015).

 
 

Note: Frostbite is a tiled-based deferred lighting(平铺的延迟光照)

  • 16×16 tiles with culled light lists

 
 

Align volume tiles on light tiles

  • Reuse per tile culled light list
  • Volume tiles can be smaller (8×8, 4×4, etc.)
  • Careful correction for resolution integer division

 
 

This volume is also aligned with our screen light tiles. This is because we are reusing the forward light tile list culling result to accelerate the scattered light evaluation (remember, Frostbite is a tile based deferred lighting engine).

 
 

Our volume tiles in screen space can be smaller than the light tiles (which are 16×16 pixels).

 
 

By default we use

Depth resolution of 64

8×8 volume tiles

 
 

720p requires 160x90x64 (~7mb per rgbaF16 texture)

1080p requires 240x135x64 (~15mb per rgbaF16 texture)

 
 


 
 

 
 

  • Data flow

 
 


 
 

This is an overview of our data flow.

We are using clip space volumes(使用裁剪空间体素) to store the data at different stages of our pipeline.

 
 

We have material properties(材料特性) which are first voxelised from participating media entities.

 
 

Then using light sources of our scene(场景光源) and this material property volume(材料特性体素) we can generate scattered light data per froxel. This data can be temporally upsampled to increase the quality. Finally, we have an integration(积分) step that prepares the data for rendering.

 
 

  1. Participating media material definition (对应图上第一部分)

 
 

Follow the theory [PBR]

  • Absorption 𝝈𝒂 (m^-1) 【吸收】

Absorption describing the amount of light absorbed by the media over a certain path length

  • Scattering 𝝈𝒔 (m^-1) 【散射】

Scattering describing the amount of light scattered over a certain path length

  • Phase 𝒈 【相位】

And a single lobe phase function describing how the light bounces on particles (uniformly, forward scattering, etc.). It is based on Henyey-Greenstein (and you can use the Schlick approximation).

  • Emissive 𝝈𝒆 (irradiance.m-1) 【自发光】

Emissive describing emitted light

  • Extinction 𝝈𝒕 = 𝝈𝒔 + 𝝈𝒂 【消失】
  • Albedo 𝛒 = 𝝈𝒔 / 𝝈𝒕 【返照光】

 
 

Artists can author {absorption, scattering} or {albedo, extinction}

  • Train your artists! Important for them to understand their meaning!

As with every physically based component, it is very important for artists to understand them so take the time to educate them.

(美术需要相关物理知识!)

 
 


 
 

Participating Media(PM) sources

  • Depth fog
  • Height fog
  • Local fog volumes
    • With or W/o density textures

 
 

Depth/height fog and local fog volumes are entities(实体的) that can be voxelized. You can see here local fog volumes as plain or with varying density(密度) according to a density texture.

 
 

下面解释 数据结构及存储。

 
 

Voxelize PM properties into V-Buffer

  • Add Scattering, Emissive and
    Extinction
  • Average Phase g (no multi lobe)
  • Wavelength independent 𝝈𝒕 (for now)

 
 

We voxelize(体素化) them into a Vbuffer analogous(类似的) to screen Gbuffer but in Volume (clip space). We basically add all the material parameters together since they are linear. Except the phase function which is averaged. We only also only consider a single lobe for now according to the HG phase function.

 
 

We have deliberately(故意) chosen to go with wavelength independent(波长无关) extinction(消失) to have cheaper volumes (material, lighting, shadows). But it would be very easy to extend if necessary at some point.

 
 

Supporting emissive is an advantage for artist to position local fog volume that emit light as scattering would do but that do not match local light. This can be used for cheap ambient lighting. (自发光是可选项)

 
 

 
 


 
 

V-Buffer (per Froxel data)

  

  

  

Format

Scattering R

Scattering G

Scattering B

Extinction

RGBA16F

Emissive R

Emissive G

Emissive B

Phase (g)

RGBA16F

 
 

 
 

  1. 1 Froxel integration (对应图上第二部分)

 
 

Per froxel

  • Sample PM properties data
  • Evaluate
    • Scattered(稀疏的) light 𝑳𝒔𝒄𝒂𝒕(𝒙𝒕,𝝎𝒐)
    • Extinction

 
 

For each froxel, one thread will be in charge of gathering scattered light and extinction.

 
 

Extinction is simply copied over from the material. You will see later why this is important for visual quality in the final stage (to use extinction instead of transmittance for energy conservative scattering). Extinction is also linear so it will be better to temporally integrate it instead of the non linear transmittance value. (线性的 Extinction就够了)

 
 

Scattered light:

  • 1 sample per froxel
  • Integrate all light sources: indirect light + sun + local lights

 
 


 
 

Sun/Ambient/Emissive

 
 

Indirect light on local fog volume

  • From Frostbite diffuse SH light probe
    • 1 probe(探测) at volume centre
    • Integrate w.r.t. phase function as a SH cosine lobe [Wronski14]

 
 

Then we integrate the scattered light. One sample per froxel.

 
 

We first integrate ambient the same way as Wronski. Frostbite allows us to sample diffuse SH light probes. We use one per local fog volume positioned at their centre.

 
 

We also integrate the sun light according to our cascaded shadow maps. We could use exponential(指数) shadow maps but we do not as our temporal up-sampling is enough to soften the result.

 
 

You can easily notice the heterogeneous nature of the local fog shown here.

 
 


 
 

Local lights

  • Reuse tiled-lighting code
  • Use forward tile light list post-culling
  • No scattering? skip local lights

 
 

We also integrate local lights. And we re-use the tile culling(平铺剔除) result to only take into account lights visible within each tile.

One good optimisation is to skip it all if you do not have any scattering possible according to your material properties.

 
 

Shadows

  • Regular shadow maps
  • Volumetric shadow maps

 
 

Each of these lights can also sample their associated shadow maps. We support regular shadow maps and also volumetric shadow maps (described later).

 
 


 
 

  1. 2 Temporal volumetric integration (对应图上第二部分)

 
 

问题:

 
 

scattering/extinction sample per frame

  • Under sampling with very strong material
  • Aliasing under camera motion
  • Shadows make it worse

 
 

As I said, we are only using a single sample per froxel.

 
 

aliasing (下面两个视频见投影片,很明显的aliasing)

This can unfortunately result in very strong aliasing for very thick participating media and when integrating the local light contribution.

 
 


 
 

You can also notice it in the video, as well as very strong aliasing of the shadow coming from the tree.

 
 


 
 

解决:Temporal integration(时间积分)

To mitigate these issues, we temporally integrate our frame result with the one of previous frame. (well know, also used by Karis last year for TAA).

 
 

To achieve this,

we jitter our samples per frame uniformly along the view ray

The material and scattered light samples are jittered using the same offset (to soften evaluated material and scattered light)

Integrate (集成) each frame according to an exponential(指数) moving average

And we ignore previous result in case no history sample is available (out of previous frustum)

 
 

Jittered samples (Halton)

Same offset for all samples along view ray

Jitter scattering AND material samples in sync

 
 

Re-project previous scattering/extinction

5% Blend current with previous

Exponential moving average [Karis14]

Out of Frustum: skip history

 
 


 
 

效果很明显,先投影片视频。

 
 

仍然存在问题:

This is great and promising but there are several issues remaining:

 
 

Local fog volume and lights will leave trails when moving

One could use local fog volumes motion stored in a buffer the same way as we do in screenspace for motion blur

But what do we do when two volumes intersect? This is the same problem as deep compositing

For lighting, we could use neighbour colour clamping but this will not solve the problem entirely

 
 

This is an exciting and challenging R&D area for the future and I’ll be happy to discuss about it with you if you have some ideas J

 
 

  1. Final integration

 
 

积分

Integrate froxel {scattering, extinction} along view ray

  • Solves {𝑳𝒊(𝒙,𝝎𝒐), 𝑻𝒓(𝒙,𝒙𝒔)} for each froxel at position 𝒙𝒔

 
 

We basically accumulate near to far scattering according to transmittance. This will solve the integrated scattered light and transmittance along the view and that for each froxel.

 
 

代码示例

One could use the code sample shown here: accumulate scattering and then transmittance for the next froxel, and this slice by slice. However, that is completely wrong. Indeed there is a dependency on the accumScatteringTransmitance.a value (transmittance). Should we update transmittance of scattering first?

 
 


 
 

Final

 
 

Non energy conservative integration: (非能量守恒的集成)

 
 

You can see here multiple volumes with increasing scattering properties. It is easy to understand that integrating scattering and then transmittance is not energy conservative.

 
 


 
 

We could reverse the order of operations. You can see that we get somewhat get back the correct albedo one would expect but it is overall too dark and temporally integrating that is definitely not helping here.

 
 


 
 

So how to improve this? We know we have one light and one extinction sample.

 
 

We can keep the light sample: it is expensive to evaluate and good enough to assume it constant on along the view ray inside each depth slice.

 
 

But the single transmittance is completely wrong. The transmittance should in fact be 0 at the near interface of the depth layer and exp(-mu_t d) at the far interface of the depth slice of width d.

 
 

What we do to solve this is integrate scattered light analytically according to the transmittance in each point on the view ray range within the slice. One can easily find that the analytical integration of constant scattered light over a definite range according to one extinction sample can be reduced this equation.

Using this, we finally get consistent lighting result for scattering and this with respect to our single extinction sample (as you can see on the bottom picture).

 
 

  • Single scattered light sample 𝑆=𝑳𝒔𝒄𝒂𝒕(𝒙𝒕,𝝎𝒐) OK
  • Single transmittance sample 𝑻𝒓(𝒙,𝒙𝒔) NOT OK

 
 

è Integrate lighting w.r.t. transmittance over froxel depth D


 
 


 
 

Also improves with volumetric shadows

You can also see that this fixes the light leaking we noticed sometimes for relatively large depth slices and strongly scattering media even when volumetric shadow are enabled.

 
 


 
 

Once we have that final integrated buffer, we can apply it on everything in our scene during the sky rendering pass. As it contains scattered light reaching the camera and transmittance, it is easy to apply it as a pre-multiplied colour-alpha on everything.

 
 

For efficiency, it is applied per vertex on transparents but we are thinking of switching this to per pixel for better quality.

 
 

  • {𝑳𝒊(𝒙,𝝎𝒐), 𝑻𝒓(𝒙,𝒙𝒔)} Similar to pre-multiplied color/alpha
  • Applied on opaque surfaces per pixel
  • Evaluated on transparent surfaces per vertex, applied per pixel

 
 


 
 

 
 

Result validation

 
 

Our target is to get physically based results. As such, we have compared our results against the physically based path tracer called Mitsuba. We constrained Mitsuba to single scattering and to use the same exposure, etc. as our example scenes.

 
 

Compare results to references from Mitsuba

  • Physically based path tracer
  • Same conditions: single scattering only, exposure, etc.

 
 

The first scene I am going to show you is a thick participating media layer with a light above and then into it.

 
 


 
 

You can see here the frostbite render on top and Mitsuba render at the bottom. You can also see the scene with a gradient applied to it. It is easy to see that our result matches, you can also recognize the triangle shape of scattered light when the point lights is within the medium.

 
 

This is a difficult case when participating media is non uniform and thick due to our discretisation of volumetric shadows and material representation. So you can see some small differences. But overall, it matches and we are happy with these first results and improve them in the future.

 
 


 
 

This is another example showing very good match for an HG phase function with g=0 and g=0,9 (strong forward scattering).

 
 


 
 

Performance

 
 

Sun + shadow cascade

14 point lights

  • 2 with regular & volumetric shadows

6 local fog volumes

  • All with density textures

 
 

PS4, 900p

 
 

Volume tile resolution

8×8

16×16

PM Material voxelization

0.45 ms

0.15 ms

Light scattering

2.00 ms

0.50 ms

Final accumulation

0.40 ms

0.08 ms

Application (Fog pass)

+0.1 ms

+0.1 ms

Total

2.95 ms

0.83 ms

 
 

Light scattering components

8×8

Local lights

1.1 ms

+Sun scattering

+0.5 ms

+Temporal integration

+0.4 ms

 
 

You can see that the performance varies a lot depending on what you have enabled and the resolution of the clip space volumes.

 
 

This shows that it will be important to carefully plan what are the needs of you game and different scenes. Maybe one could also bake static scenes scattering and use the emissive channel to represent the scattered light for an even faster rendering of complex volumetric lighting.

 
 

 
 

  • Volumetric shadows

 
 

Volumetric shadow maps

 
 

We also support volumetric shadow maps (shadow resulting from voxelized volumetric entities in our scene)

 
 

To this aim, we went for a simple and fast solution

 
 

  • We first define a 3 levels cascaded clip map volume following and containing the camera.(定义三个跟随相机的体)
    • With tweakable per level voxel size and world space snapping
  • This volume contains all our participating media entities voxelized again within it (required for out of view shadow caster, clip space volume would not be enough)
  • A volumetric shadow map is defined as a 3D texture (assigned to a light) that stores transmittance
    • Transmittance is evaluated by ray marching the extinction volume
    • Projection is chosen as a best fit for the light type (e.g. frustum for spot light)
  • Our volumetric shadow maps are stored into an atlas to only have to bind a single texture (with uv scale and bias) when using them.

 
 


 
 

Volumetric shadow maps are entirely part of our shared lighting pipeline and shader code.

 
 

Part of our common light shadow system

  • Opaque
  • Particles
  • Participating media

 
 

It is sampled for each light having it enabled and applied on everything in the scene (particles, opaque surfaces, participating media) as visible on this video.

 
 

(这边可以看PPt效果视频)

 
 

Another bonus is that we also voxelize our particles.

 
 

We have tried many voxelization method. Point and its blurred version but this was just too noisy. Our default voxelization method is trilinear(三线性). You can see the shadow is very soft and there is no popping(抛出) visible.

 
 

We also have a high quality voxelization where all threads write all the voxels contained within the particle sphere. A bit brute force for now but it works when needed.

 
 

You can see the result of volumetric shadows from particle onto participating media in the last video.

 
 

(See bonus slides for more details)

 
 


 
 

Quality: PS4

 
 

Ray marching of 323 volumetric shadow maps

Spot light:         

0.04 ms

Point light:         

0.14 ms

 
 

1k particles voxelization

Default quality:         

0.03 ms

High quality:         

0.25 ms

 
 

Point lights are more expensive than spot lights because spot lights are integrated slice by slice whereas a full raytrace is done for each point light shadow voxels. We have ideas to fix that in the near future.

 
 

Default particle voxelization is definitely cheap for 1K particles.

 
 

  • More volumetric rendering in Frostbite

 
 

Particle/Sun interaction

 
 

  • High quality scattering and self-shadowing for sun/particles interactions
  • Fourier opacity Maps [Jansen10]
  • Used in production now

 
 


 
 

Our translucent(半透) shadows in Frostbite (see Andersson11) allows particles to cast shadows on opaque surfaces but not on themselves. This technique also did not support scattering.

 
 

We have added that support in frostbite by using Fourier opacity mapping. This allows us to have some very high quality coloured shadowing, scattering resulting in sharp silver lining visual effects as you can see on this screenshots and cloud video.

 
 

This is one special case for the sun (non unified) but it was needed to get that extra bit of quality were needed for the special case of the sun which requires special attention.

 
 

Physically-based sky/atmosphere

 
 

  • Improved from [Elek09] (Simpler but faster than [Bruneton08])
  • Collaboration between Frostbite, Ghost and DICE teams.
  • In production: Mirror’s Edge Catalyst, Need for Speed and Mass Effect Andromeda

 
 


 
 

We also have added support for physically based sky and atmosphere scattering simulation last year. This was a fruitful collaboration between Frostbite and Ghost and DICE game teams (Mainly developed by Edvard Sandberg and Gustav Bodare at Ghost). Now it is used in production by lots games such as Mirror’s Edge or Mass Effect Andromeda.

 
 

It is an improved version of Elek’s paper which is simpler and faster than Bruneton. I unfortunately have no time to dive into details in this presentation.

 
 

But in the comment I have time J. Basically, the lighting artist would define the atmosphere properties and the light scattering and sky rendering will automatically adapt to the sun position. When the atmosphere is changed, we need to update our pre-computed lookup tables and this can be distributed over several frame to limit the evaluation impact on GPU.

 
 

  • Conclusion

 
 

Physically-based volumetric rendering framework used for all games powered by Frostbite in the future

 
 

Physically based volumetric rendering

  • Participating media material definition
  • Lighting and shadowing interactions

 
 

A more unified volumetric rendering system

  • Handles many interactions
    • Participating media, volumetric shadows, particles, opaque surfaces, etc.

 
 

Future work

 
 

Improved participating media rendering

  • Phase function integral w.r.t. area lights solid angle
  • Inclusion in reflection views
  • Graph based material definition, GPU simulation, Streaming
  • Better temporal integration! Any ideas?
  • Sun volumetric shadow
  • Transparent shadows from transparent surfaces?

 
 

Optimisations

  • V-Buffer packing
  • Particles voxelization
  • Volumetric shadow maps generation
  • How to scale to 4k screens efficiently

 
 

For further discussions

 
 

sebastien.hillaire@frostbite.com

https://twitter.com/SebHillaire

 
 

 
 

References

 
 

[Lagarde & de Rousiers 2014] Moving Frostbite to PBR, SIGGRAPH 2014.

[PBR] Physically Based Rendering book, http://www.pbrt.org/.

[Wenzel07] Real time atmospheric effects in game revisited, GDC 2007.

[Mitchell07] Volumetric Light Scattering as a Post-Process, GPU Gems 3, 2007.

[Andersson11] Shiny PC Graphics in Battlefield 3, GeForceLan, 2011.

[Engelhardt10] Epipolar Sampling for Shadows and Crepuscular Rays in Participating Media with Single Scattering, I3D 2010.

[Miles] Blog post http://blog.mmacklin.com/tag/fog-volumes/

[Valliant14] Volumetric Light Effects in Killzone Shadow Fall, SIGGRAPH 2014.

[Glatzel14] Volumetric Lighting for Many Lights in Lords of the Fallen, Digital Dragons 2014.

[Hillaire14] Volumetric lights demo

[Lagarde13] Lagarde and Harduin, The art and rendering of Remember Me, GDC 2013.

[Wronski14] Volumetric fog: unified compute shader based solution to atmospheric solution, SIGGRAPH 2014.

[Karis14] High Quality Temporal Super Sampling, SIGGRAPH 2014.

[Jansen10] Fourier Opacity Mapping, I3D 2010.

[Salvi10] Adaptive Volumetric Shadow Maps, ESR 2010.

[Elek09] Rendering Parametrizable Planetary Atmospheres with Multiple Scattering in Real-time, CESCG 2009.

[Bruneton08] Precomputed Atmospheric scattering, EGSR 2008.

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

Voxel House

  • Introduction

 
 

http://www.oskarstalberg.com/game/house/Index.html

 
 


 
 

My projects typically revolve(围绕) around some central idea that I want to explore. Here, that central idea is a particular content driven approach to modular tilesets that I’ve had on my mind for a while. This project could have been created as a Python script in Maya or a node graph in Houdini. However, since I don’t want my final presentation material to be a dull narrated youtube clip set in a grey-boxed Maya scene, I created an interactive web demo instead. As a tech artist, the width of my skill set is crucial; I’m not a master artist nor a proper coder, but I’ve got a slice of both in me. I’m most comfortable in the very intersection of art and tech; of procedure and craftsmanship. A web demo is the perfect medium to display those skills.

 
 

 
 

  • Figuring out the tiles

 
 

The core concept is this: the tiles are places in the corners between blocks, not in the center of the blocks. The tiles are defined by the blocks that surround them: a tile adjacent to one block in the corner would be 1,0,0,0,0,0,0; a tile representing a straight wall would be 1,1,1,1,0,0,0,0.

 
 


 
 

Since each corner is surrounded by 8 possible blocks, each of which can be of the 2 possible states of existence or non-existence, the number of possible tiles are 2^8= 256. That is way more blocks than I want to model, so I wrote a script to figure out which of these tiles were truly unique, and which tiles were just rotations of other tiles. The script told me that I had to model 67 unique tiles – a much more manageable number.

 
 


 
 

I could have excluded flipped version of other tiles as well, which would have brought the number down even further. However, I decided to keep those so that I could make some asymmetrically tiling features. The drain pipes you see in concave corners of the building is one example of that.

 
 

 
 

  • Boolean setup in Maya

 
 

Being the tech artist that I am, I often spend more time on my workflow than on my actual work. Even accounting for rotational permutations(排列), this project still involved a large amount of 3D meshes to manually create and keep track of. The modular nature of the project also made it important to continuously see and evaluate the models in their proper context outside of Maya. The export process had to be quick and easy and I decided to write a small python script to help me out.

【这里有巨大的工作量,即使可以旋转,依然有大量的组合,目标是使得各个连接都可以有很好的效果。这个过程要足够的快速和容易,使用python脚本解决。理解就是脚本的作用就是来验证美术做出来的效果是ok可用的。】

 
 

First, the script merges all my meshes into one piece. Second, a bounding box for each tile proceeds to cut out its particular slice of this merged mesh using Maya’s boolean operation. All the cutout pieces inherit the name and transform from their bounding box and are exported together as an fbx.

【把所有相关mesh合并成piece,使用maya布尔操作进入切出每一tile的包围盒,就是包到邻居。】

 
 

Not only did this make the export process a one-button solution, it also meant that I didn’t have to keep my Maya scene that tidy. It didn’t matter what meshes were named, how they were parented or whether they were properly merged or not. I adapted my Maya script to allow several variations of the same tile type. My Unity script then chose randomly from that pool of variation where it existed. In the image below, you can see that some of the bounding boxes are bigger than the others. Those are for tiles that have vertices that stretch outside their allotted volume.

 
 


 
 

 
 

  • Ambient Occulusion 环境光遮蔽

 
 

Lighting is crucial to convey 3D shapes and a good sense of space. Due to the technical limitations in the free version of Unity, I didn’t have access to either real time shadows or ssao – nor could I write my own, since free Unity does not allow render targets. The solution was found in the blocky nature of this project. Each block was made to represent a voxel in a 3D texture. While Unity does not allow me to draw render targets on the GPU, it does allow me to manipulate textures from script on the CPU. (This is of course much slower per pixel, but more than fast enough for my purposes.)

Simply sampling that pixel in the general direction of the normal gives me a decent ambient occlusion approximation.

 
 

I tried to multiply this AO on top of my unlit color texture, but the result was too dark and boring. I decided on an approach that took advantage on my newly acquired experience in 3D textures: Instead of just making pixels darker, the AO lerps the pixel towards a 3D LUT that makes it bluer and less saturated. The result gives me a great variation in hue without too harsh a variation in value. This lighting model gave me the soft and tranquil feeling I was aiming for in this project.

 
 


 
 

 
 

  • Special Pieces(特殊件)

 
 


 
 

When you launch the demo, it will auto generate a random structure for you. By design, that structure does not contain any loose or suspended blocks.

 
 

I know that a seasoned tool-user will try to break the tool straight away by seeing how it might treat these type of abnormal structures. I decided to show off by making these tiles extra special, displaying features such as arcs, passages, and pillars.

 
 


 
 


 
 

 
 

  • Floating Pieces

 
 

There is nothing in my project preventing a user from creating free-floating chunks, and that’s the way I wanted to keep it. But I also wanted to show the user that I had, indeed, thought about that possibility. My solution to this was to let the freefloating chunks slowly bob up and down. This required me to create a fun little algorithm to figure out in real time which blocks were connected to the base and which weren’t:

 
 

The base blocks each get a logical distance of 0. The other block check if any of their neighbors have a shorter logical distance than themselves; if they do, they adopt that value and add 1 to it. Thus, if you disconnect a chunk there will be nothing grounding those blocks to the 0 of the base blocks and their logical distance will quickly go through the roof. That is when they start to bob.

 
 

The slow bobbing of the floating chunks add some nice ambient animation to the scene.

 
 


 
 

 
 

  • Art Choices

 
 

Picking a style is a fun and important part of any project. The style should highlight the features relevant to a particular project. In this project, I wanted a style that would emphasize blockiness and modularity rather than hiding it.

 
 

The clear green lines outline the terraces, the walls are plain and have lines of darker brick marking each floor, the windows are evenly spaced, and the dirt at the bottom is smooth and sedimented in straight lines. Corners are heavily beveled to emphasize that the tiles fit together seamlessly. The terraces are supposed to look like cozy secret spaces where you could enjoy a slow brunch on a quiet Sunday morning. Overall, the piece is peaceful and friendly – a homage to the tranquility of bourgeois life, if you will.

 
 

 
 

  • Animation

 
 

It should be fun and responsive to interact with the piece. I created an animated effect for adding and removing blocks. The effect is a simple combination of a vertex shader that pushes the vertices out along their normals and a pixel shader that breaks up the surface over time. A nice twist is that I was able to use the 3D texture created for the AO to constrain the vertices along the edge of the effect – this is what creates the bulge along the middle seen in the picture.

 
 


 
 


 
 

 
 

 
 

  • Conclusion

 
 

The final result is like a tool, but not. It’s an interactive piece of art that runs in your browser. It can be evaluated for it’s technical aspects, it’s potential as a level editor tool, it’s shader work, it’s execution and finish, or just as a fun thing to play around with. My hope is that it can appeal to developers and laymen alike. In a way, a web demo like this is simply a mischievous way to trick people into looking at your art longer than they otherwise would.

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

Managing Transformations in Hierarchy

  • Introduction

 
 

One of the most fundamental aspects of 3D engine design is management of spatial relationship between objects. The most intuitive way of handling this issue is to organize objects in a tree structure (hierarchy), where each node stores its local transformation, relative to its parent.

 
 

The most common way to define the local transformation is to use a socalled TRS system, where the transformation is composed of translation, rotation, and scale. This system is very easy to use for both programmers using the engine as well as non-technical users like level designers. In this chapter we describe the theory behind such a system.

 
 

One problem with the system is decomposition of a matrix back to TRS. It turns out that this problem is often ill-defined and no robust solution exists. We present an approximate solution that works reasonably well in the majority of cases.

 
 

  • Theory

 
 

树结构

Keeping objects in hierarchy is a well-known concept. Every object can have a number of children and only one parent. It can also be convenient to store and manage a list of pointers to the children so that we have fast access to them. The aforementioned structure is in fact a tree.

 
 

节点结构

We assume that a node stores its translation, rotation, and scale (TRS) that are relative to its parent. Therefore, we say these properties are local. When we move an object, we drag all its children with it. If we increase scale of the object, then all of its children will become larger too.

 
 

例:


 
 

变换矩阵与RTS

 
 

对于单个节点的变换矩阵和RTS的关系

Local TRS uniquely defines a local transformation matrix M. We transform vector v in the following way:


where S is an arbitrary scale matrix, R is an arbitrary rotation matrix, T is a translation matrix, and T is the vector matrix T is made of.

 
 

系统层次结构的变换矩阵关系

To render an object, we need to obtain its global (world) transformation by composing local transformations of all the object’s ancestors up in the hierarchy.

The composition is achieved by simply multiplying local matrices. Given a vector v0, its local matrix M0, and the local matrix M1 of v0’s parent, we can find the global position v2:


Using vector notation for translation, we get


这里需要注意的就是

RS != S’R’

 
 

Skew Problem

 
 

问题描述:

Applying a nonuniform scale (coming from object A) that follows a local rotation (objects B and C) will cause objects (B and C) to be skewed. Skew can appear during matrices composition but it becomes a problem during the decomposition, as it cannot be expressed within a single TRS node. We give an approximate solution to this issue in Section 3.2.4.


解决方法:

Let an object have n ancestors in the hierarchy tree. Let M1,M2, · · · ,Mn be their local transformation matrices, M0 be a local transformation matrix of the considered object, and Mi = SiRiTi.

MTRSΣ = M0M1 · · ·Mn

MTR = R0T0R1T1 · · ·RnTn

TR可以很好的叠加来获得世界坐标的TR

MSΣ = MRSΣMR

here we have the skew and the scale combined. We use diagonal elements of MSΣ to get the scale, and we choose to ignore the rest that is responsible for the skew.

Scale 的话用这边算出来的对角线来表示,其他的结果丢弃采用上面的TR,这样的结果就可以避免skew.

 
 

父节点变化处理

 
 

In a 3D engine we often need to modify objects’ parent-children relationship.

we want to change the local transformation such that the global transformation is still the same. Obviously, that forces us to recompute local TRS values of the object whose parent we’re changing.

 
 

To get from the current local space to a new local space (parent changes, global transform stays the same), we first need to find the global transform of the object by going up in the hierarchy to the root node. Having done this we need to go down the hierarchy to which our new parent belongs.

 
 

LetM’0 be the new parent’s local transformation matrix. Let that new parent have n’ ancestors in the hierarchy tree with local transformations M’1,M’2, · · · ,M’n’, where M’i = S’iR’iT’i. The new local transformation matrix can thus be found using the following formula:


 
 


通过此公式就可以求出新的RTS

 
 

Alternative Systems

 
 

这边主要讲 Scale 处理,和skew相关

做法:除了叶节点存储x,y,z不相同的,各项异的scale值(三维向量)(nonuniform scale in last node),其他节点存储的是uniform scale值(不是三维向量,是值)这样可以有效的解决skew问题且实现简单。

 
 

  • Implementation

 
 

节点结构:


 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

Oculus Optimizing the Unreal Engine 4 Renderer for VR

https://developer.oculus.com/blog/introducing-the-oculus-unreal-renderer/

 
 

For Farlands, the Oculus team wrote an experimental, fast, single-pass forward renderer for Unreal Engine. It’s also used in Dreamdeck and the Oculus Store version of Showdown. We’re sharing the renderer’s source as a sample to help developers reach higher quality levels and frame rates in their own applications. As of today, you can get it as an Unreal developer from https://github.com/Oculus-VR/UnrealEngine/tree/4.11-ofr.

【Oculus团队写了一个试验性的,快速的,单pass forward renderer的unreal engine工具,在这里我们分享出来见github,这工具已经应用在了Dreamdecks等Oculus应用上了】

 
 

Rendering immersive VR worlds at a solid 90Hz is complex and technically challenging. Creating VR content is, in many ways, unlike making traditional monitor-only content—it brings us a stunning variety of new interactions and experiences, but forces developers to re-think old assumptions and come up with new tricks. The recent wave of VR titles showcase the opportunities and ingenuity of developers.

【渲染沉浸式的VR世界保证帧率是一件非常有挑战性的事情。渲染VR内容不像是传统的显示器渲染,交互的创新带来了很多改变。这对于渲染来说带来的就是去重新审视过去的一些技术的选择,想说的就是适合屏幕渲染的技术不一定还继续适合VR渲染,这里重新来考虑一些技术的比较。】

 
 

As we worked, we re-evaluated some of the traditional assumptions made for VR rendering, and developed technology to help us deliver high-fidelity content at 90Hz. Now, we’re sharing some results: an experimental forward renderer for Unreal Engine 4.11.

【我们的工作就是来重新考虑这些旧有技术对于VR的价值,下面就是分享一些实验结果。】

 
 

We’ve developed the Oculus Unreal Renderer with the specific constraints of VR rendering in mind. It lets us more easily create high-fidelity, high-performance experiences, and we’re eager to share it with all UE4 developers.

【我们开发了一个独立的VR内容渲染器,可以获得更高效的渲染结果,见github.】

 
 

Background

 
 

As the team began production on Farlands, we took a moment to reflect on what we learned with the demo experiences we showed at Oculus Connect, GDC, CES, and other events. We used Unreal Engine 4 exclusively to create this content, which provided us with an incredible editing environment and a wealth of advanced rendering features.

【我们团队是使用Unreal开发Farlands的,相关内容已经在各大展会分享过,不作具体介绍】

 
 

Unfortunately, the reality of rendering to Rift meant we’d only been able to use a subset of these features. We wanted to examine those we used most often, and see if we could design a stripped-down renderer that would deliver higher performance and greater visual fidelity, all while allowing the team to continue using UE4’s world-class editor and engine. While the Oculus Unreal Renderer is focused on the use cases of Oculus applications, it’s been retrofit into pre-existing projects (including Showdown and Oculus Dreamdeck) without needing major content work. In these cases, it delivered clearer visuals, and freed up enough GPU headroom to enable additional features or increase resolution 15-30%.

【Ue4很好用但是相对来说渲染性能对于VR程序来说还有可以针对性优化的空间来提升效率并获得更好的渲染结果】

 
 


Comparison at high resolution: The Oculus Unreal Renderer runs at 90fps while Unreal’s default deferred renderer is under 60fps.

【oculus采用 forward 渲染效率秒杀Unreal 默认的 defered渲染】

 
 

The Trouble With Deferred VR

 
 

【这边相关的基础知识可以见Base里面讲述forward/defered rendering的内容】

 
 

Unreal Engine is known for its advanced rendering feature set and fidelity. So, what was our rationale for changing it for VR? It mostly came down our experiences building VR content, and the differences rendering to a monitor vs Rift.

【UE本身包含大量功能,我们要做的就是选择合适的应用到VR渲染。】

 
 

When examining the demos we’d created for Rift, we found most shaders were fairly simple and relied mainly on detailed textures with few lookups and a small amount of arithmetic. When coupled with a deferred renderer, this meant our GBuffer passes were heavily texture-bound—we read from a large number of textures, wrote out to GBuffers, and didn’t do much in between.

【VR更高的分辨率要求如果采用defered rendering带来的是对GBuffer数据传输的超高要求】

 
 

We also used dynamic lighting and shadows sparingly and leaned more heavily on precomputed lighting. In practice, switching to a renderer helped us provide a more limited set of features in a single pass, yielded better GPU utilization, enabled optimization, removed bandwidth overhead, and made it easier for us to hit 90 Hz.

【我们尽量少的使用动态光照和阴影,取而代之的是使用预计算光照。在使用中使用我们提供的渲染器限制了single pass的一些功能,开启了必要的优化关闭了大量的无效功能,最终有助于提升帧率。】

 
 

We also wanted to compare hardware accelerated multi-sample anti-aliasing (MSAA) with Unreal’s temporal antialiasing (TAA). TAA works extremely well in monitor-only rendering and is a very good match for deferred rendering, but it causes noticeable artifacts in VR. In particular, it can cause judder and geometric aliasing during head motion. To be clear, this was made worse by some of our own shader and vertex animation tricks. But it’s mostly due to the way VR headsets function.

【我们还想要比较的是硬件加速的MSAA和unreal提供的TAA的效果。】

【TAA对于显示器终端的渲染效果非常好且可以很好的配合deferred rendering,但是在VR渲染中使用明显让人感觉到假像。在head motion的过程中会导致judder和geometric aliasing. 】

 
 

Compared to a monitor, each Rift pixel covers a larger part of the viewer’s field of view. A typical monitor has over 10 times more pixels per solid angle than a VR headset. Images provided to the Oculus SDK also pass through an additional layer of resampling to compensate for the effects of the headset’s optics. This extra filtering tends to slightly over-smooth the image.

【相比较显示器,头盔的每一个像素覆盖的真实范围视觉比较大。Oculus SDK通过一额外的层来resampling补偿来使得最终的效果更平滑】

 
 

All these factors together contribute to our desire to preserve as much image detail as possible when rendering. We found MSAA to produce sharper, more detailed images that we preferred.

【所有的这些都是为了使最终的渲染效果更加的细腻,而我们发现MSAA提供的效果更佳的shaper,可以保留更多的细节。】

 
 


Deferred compared with forward. Zoom in to compare.

 
 

A Better Fit With Forward

 
 

Current state-of-the-art rendering often leverages(杠杆) screen-space effects, such as screen-space ambient occlusion (SSAO) and screen-space reflections (SSR). Each of these are well known for their realistic and high-quality visual impact, but they make tradeoffs that aren’t ideal in VR. Operating purely in screen-space can introduce incorrect stereo disparities (differences in the images shown to each eye), which some find uncomfortable. Along with the cost of rendering these effects, this made us more comfortable forgoing support of those features in our use case.

【现在的渲染方式通过采用屏幕空间的一些方式来达到更好的效果,比如SSAO,SSR. 但是这些方法都无法直接在VR渲染上面采用。】

 
 

Our decision to implement a forward renderer took all these considerations into account. Critically, forward rendering lets us use MSAA for anti-aliasing, adds arithmetic(算数) to our texture-heavy shaders (and removes GBuffer writes), removes expensive full-screen passes that can interfere with(干扰) asynchronous timewarp, and—in general—gives us a moderate speedup over the more featureful deferred renderer. Switching to a forward renderer has also allowed the easy addition of monoscopic(单视场) background rendering, which can provide a substantial performance boost for titles with large, complex distant geometry. However, these advantages come with tradeoffs that aren’t right for everyone. Our aim is to share our learnings with VR developers as they continue fighting to make world-class content run at 90Hz.

【我们决定采用一种把上面这些因素考虑在内的forward renderer。 采用MSAA,texture-heavy shader,去掉了full-screen passes(会干扰异步timewarp),还有增加了forward renderer 支持的 monoscopic(单视场) background rendering(就是说原理相机的背景部分不用渲染两次,而是渲染一次同时提交给左右眼,Oculus的SDk里面有。)】

 
 

Our implementation is based on Ola Olsson’s 2012 HPG paper, Clustered Deferred and Forward Shading. Readers familiar with traditional forward rendering may be concerned about the CPU and GPU overhead of dynamic lights when using such a renderer. Luckily, modern approaches to forward lighting do not require additional draw calls: All geometry and lights are rendered in a single pass (with an optional z-prepass). This is made possible by using a compute shader to pre-calculate which lights influence 3D “clusters” of the scene (subdivisions of each eye’s viewing frustum, yielding a frustum-voxel grid). Using this data, each pixel can cheaply determine a list of lights that has high screen-space coherence, and perform a lighting loop that leverages the efficient branching capability of modern GPUs. This provides accurate culling and efficiently handles smaller numbers of dynamic lights, without the overhead of additional draw calls and render passes.

【这里的实现是 forward+ 的方法,具体内容见2012年的论文,相关基本的概念见我总结的三种渲染方式的比较。这边后面讲的就是forward+的基本原理:通过与处理来挑选对每个pixel有较大影响的光源,在后面处理的时候只考虑这几个光照,就是light-culling的意思。】

 
 


(Visualization of 3D light grid, illustrating the lighting coherence and culling)

 
 

Beyond the renderer, we’ve modified UE4 to allow for additional GPU and CPU optimizations. The renderer is provided as an unmaintained sample and not an officially-supported SDK, but we’re excited to give projects using Unreal Engine’s world-class engine and editor additional options for rendering their VR worlds.

【我们搞了个UE4的版本大家可以试试。】

 
 

You can grab it today from our Github repository as an Unreal Developer at https://github.com/Oculus-VR/UnrealEngine/tree/4.11-ofr. To see it in action, try out Farlands, Dreamdeck, and Showdown.

 
 

 
 

 
 

 
 

 
 

 
 

The Vanishing of Milliseconds: Optimizing the UE4 renderer for Ethan Carter VR

原文:

https://medium.com/@TheIneQuation/the-vanishing-of-milliseconds-dfe7572d9856#.auamge3rg

 
 


 
 

As a game with very rich visuals, The Vanishing of Ethan Carter (available for the

 Oculus Rift and Steam VR ) has been a difficult case for hitting the VR performance targets. The fact that its graphics workload is somewhat uncommon for Unreal Engine 4 (and, specifically, largely dissimilar to existing UE4 VR demos) did not help. I have described the reasons for that at length in

a previous post; the gist of it, however, is that The Vanishing of Ethan Carter’s game world is statically lit in some 95% of areas, with dynamic lights appearing only in small, contained, indoors.

【The Vanishing of Ethan Carter 作为一款视觉效果很不错的VR游戏,在性能方面达标非常非常的难。事实上要很好的去优化渲染的话,你要对整块知识有足够的了解,这个游戏的渲染工作流和基本的UE4渲染工作流区别也很大,主要的在于大量的采用了静态光照,只有在很少的地方采用动态光源。】

 
 

Important note: Our (The Astronauts’) work significantly(显著的) pre-dates Oculus VR’s UE4 renderer. If we had it at our disposal back then, I would probably not have much to do for this port; but as it were, we were on our own. That said, I highly recommend the aforementioned article and code, especially if your game does not match our rendering scenario, and/or if the tricks we used simply do not work for you.

【我们的工作是在出 Oculus VR’s UE4 renderer 之前,因此我们花了大量的时间搞了一套自己的renderer。 在此作者很推荐 Oculus VR’s UE4 renderer,非常值得一看。】

 
 

Although the studied case is a VR title, the optimizations presented are mostly concerned with general rendering and may be successfully applied to other titles; however, they are closely tied to the UE4 and may not translate well to other game engines.

【尽管在优化的时候我们尽量考虑通用性,但事实上优化本身就是一个特殊化的过程,很多优化内容和实际使用场景息息相关。】

 
 

There are Github links in the article. Getting a 404 error does not mean the link is dead — you need to have your Unreal Engine and Github accounts connected  to see UE4 commits.

【请将Github帐号关联UE4后才能打开这篇文章的一些链接。】

 
 

 
 

Show me the numbers

 
 

To whet the reader’s appetite(胃口), let us compare the graphics profile and timings of a typical frame in the PS4/Redux version to a corresponding one from the state of the VR code on my last day of work at The Astronauts:

 
 

【首先咱们来比较一般的frame以及我们优化后的数据结果。】

 
 


GPU profiles from the PS4/Redux and VR versions, side by side. Spacing has been added to have the corresponding data line up. Detailed textual log data available as Gists: PS4/Redux and VR version.

 
 


Timing graphs displayed with the STAT UNITGRAPH command, side by side.

 
 

Both profiles were captured using the UE4Editor -game -emulatestereo command line in a Development configuration, on a system with an NVIDIA GTX 770 GPU, at default game quality settings and 1920×1080 resolution (960×1080 per eye). Gameplay code was switched off using the PAUSE console command to avoid it affecting the readouts, since it is out of the scope of this article.

【上面数据的依赖规格】

 
 

As you can (hopefully) tell, the difference is pretty dramatic. While a large part of it has been due to code improvements, I must also honour the art team at The Astronauts — Adam BryaMichał Kosieradzki, Andrew Poznański, and Kamil Wojciekiewicz have all made a brilliant job of optimizing the game assets!

【结果也和艺术和代码方面相关。】

 
 

This dead-simple optimization algorithm that I followed set a theme for the couple of months following the release of Ethan Carter PS4, and became the words to live by:

 
 

  1. Profile a scene from the game.
  2. Identify expensive render passes.
  3. If the feature is not essential for the game, switch it off.
  4. Otherwise, if we can afford the loss in quality, turn its setting down.

 
 

【优化算法的思考流程:首先选定一个游戏场景,然后确定耗费资源的render passes,接着考虑关闭不重要的部分,如果可以接受效果的下降就去掉他或者调整他。】

 
 

 
 

Hitting the road to VR

 
 

The beginnings of the VR port were humble. I decided to start the transition from the PS4/Redux version with making it easier to test our game in VR mode. As is probably the case with most developers, we did not initially have enough HMDs for everyone in the office, and plugging them in and out all the time was annoying. Thus, I concluded we needed a way to emulate one.

【早期的VR接口是简陋的,而且设备也不是那么多,作为开发者对VR游戏测试会非常头疼,因此我们首先需要一个可用的仿真环境。】

 
 

Turns out that UE4 already has a handy – emulatestereo command line switch. While it works perfectly in game mode, it did not enable that Play in VR button in the editor. I hacked up the FInternalPlayWorldCommandCallbacks::PlayInVR_*() methods to also test for the presence of FFakeStereoRenderingDevice in GEngine->StereoRenderingDevice, apart from just GEngine->HMDDevice. Now, while this does not accurately emulate the rendering workload of a VR HMD, we could at least get a rough, quick feel for stereo rendering performance from within the editor, without running around with a tangle of wires and connectors. And it turned out to be good enough for the most part.

【UE4已经实现了硬件检测,可以方便的切换在不同设备上预览和显示,但是问题在于没有相关硬件就不给你 Play in VR的选项。作者的做法是 FInternalPlayWorldCommandCallbacks::PlayInVR_*() 函数里面不去做VR头盔检测,而是默认就开启VR渲染,这样就解决了设备不够的问题。】

 
 

While trying it out, Andrew, our lead artist, noticed that game tick time is heavily impacted by having miscellaneous editor windows open. This is most probably the overhead from the editor running a whole lot of Slate UI code. Minimizing all the windows apart from the main frame, and setting the main level editor viewport to immersive mode seemed to alleviate the problem, so I automated the process and added a flag for it to ULevelEditorPlaySettings. And so, the artists could now toggle it from the Editor Preferences window at their own leisure.

【接着我们的艺术团队的头头发现编辑器窗口开着的时候也很耗性能,因为需要运行渲染大量的Ui。最好的方式就是运行的时候直接把这些最小化,把执行窗口全屏,我们把这些工作也写到工程设置里面,以便自动这样子执行。】

 
 

These changes, as well as several of the others described in this article, may be viewed in my fork of Unreal Engine on Github (reminder: you need to have your Unreal Engine and Github accounts connected to see UE4 commits).

 
 

 
 

Killing superfluous renderer features

 
 

Digging(挖掘) for information on UE4 in VR, I discovered that Nick Whiting and Nick Donaldson from Epic Games have delivered an interesting presentation at Oculus Connect, which you can see below.

【关于 UE4 VR方面内容的讲解,下面这篇talk讲的不错大家可以看一下。】

 
 

https://www.youtube.com/watch?v=0oM6Xe7fT-8

 
 

Around the 37 minute mark is a slide which in my opinion should not have been a “bonus”, as it contains somewhat weighty information. It made me realize that, by default, Unreal’s renderer does a whole bunch of things which are absolutely unnecessary for our game. I had been intellectually aware of it beforehand, but the profoundness of it was lost on me until that point. Here is the slide in question:

【这篇内容让我们意识到UE4有大量的预置的耗性能的选项是我们的游戏所不需要的,我们需要去考虑如何去做取舍。这里37分钟的PPT给了一个测试比较符合VR开发的设置例子。】

 
 


 
 

I recommend going over every one of the above console variables in the engine source and seeing which of their values makes most sense in the context of your project. From my experience, their help descriptions are not always accurate or up to date, and they may have hidden side effects. There are also several others that I have found useful and will discuss later on.

【我们非常推荐你去引擎仔细看上面的每一个选项的内容与设置会对你的游戏的影响。以我们的经验,别人的设置不一定适合当前的你的需求。】

 
 

It was the first pass of optimization, and resulted in the following settings — an excerpt from our DefaultEngine.ini:

【给一个我们使用的设置如下】

 
 

[SystemSettings]

r.TranslucentLightingVolume=0

r.FinishCurrentFrame=0

r.CustomDepth=0

r.HZBOcclusion=0

r.LightShaftDownSampleFactor=4

r.OcclusionQueryLocation=1

[/Script/Engine.RendererSettings]

r.DefaultFeature.AmbientOcclusion=False

r.DefaultFeature.AmbientOcclusionStaticFraction=False

r.EarlyZPass=1

r.EarlyZPassMovable=True

r.BasePassOutputsVelocity=False

 
 

The fastest code is that which does not run

May I remind you that Ethan Carter is a statically lit game; this is why we could get rid of translucent lighting volumes and ambient occlusion (right with its static fraction), as these effects were not adding value to the game. We could also disable the custom depth pass for similar reasons.

【这里要提醒你的是Ethan Carter是一个静态光源的游戏,因此我们不采用 translucent lighting volumes and ambient occlusion 和 custom depth pass】

 
 

Trade-offs

On most other occasions, though, the variable value was a result of much trial and error, weighing a feature’s visual impact against performance.

【在不同的场合对上面的数值都要好好的去权衡。】

 
 

One such setting is r.FinishCurrentFrame, which, when enabled, effectively creates a CPU/GPU sync point right after dispatching a rendering frame, instead of allowing to queue multiple GPU frames. This contributes to improving motion-to-photon latency at the cost of performance, and seems to have originally been recommended by Epic (see the slide above), but they have backed out of it since (reminder: you need to have your Unreal Engine and Github accounts connected to see UE4 commits). We have disabled it for Ethan Carter VR.

【r.FinishCurrentFrame 这一个选项如果打开则 每一帧都会创建一个 CPU/GPU 同步的点光源来渲染,而不是允许光源在GPU复用。这个是以性能为代价降低对光子运动处理的延迟,但是是作为UE4的推荐设置默认打开的,这里我们将它关闭了。】

 
 

The variable r.HZBOcclusion controls the occlusion culling algorithm. Not surprisingly, we have found the simpler, occlusion query-based solution to be more efficient, despite it always being one frame late and displaying mild popping artifacts. So do others.

【r.HZBOcclusion 控制的是遮挡剔除的算法,我们发现occlusion query-based solution尽管有效的瑕疵但是效率更高。】

 
 

Related to that is the r.OcclusionQueryLocation variable, which controls the point in the rendering pipeline at which occlusion queries are dispatched. It allows balancing between more accurate occlusion results (the depth buffer to test against is more complete after the base pass) against CPU stalling (the later the queries are dispatched, the higher the chance of having to wait for query results on the next frame). Ethan Carter VR’s rendering workload was initially CPU-bound (we were observing randomly occurring stalls several milliseconds long), so moving occlusion queries to before base pass was a net performance gain for us, despite slightly increasing the total draw call count (somewhere in the 10–40% region, for our workload).

【r.OcclusionQueryLocation 控制顶点在渲染流水线遮挡处理中的处理顺序,她控制的是遮挡处理的精度和CPU失速,原因是越精确的比对越耗时,而整个过程需要CPU等待空转,就会导致变慢。我们的游戏在此采用了这个选项,虽然moving occlusion queries会导致draw call总的数量增加,但是其带来的CPU性能的释放会给游戏总的性能增益。】

 
 


Left eye taking up more than twice the time? That is not normal.

 
 

Have you noticed, in our pre-VR profile data, that the early Z pass takes a disproportionately large amount of time for one eye, compared to the other? This is a tell-tale sign that your game is suffering from inter-frame dependency stalls, and moving occlusion queries around might help you.

【不知道你有没有注意到遮挡处理时一只眼睛的画面更耗时,如上图,这是因为你游戏没有使用帧间依赖, moving occlusion queries 可以解决这个问题。】

 
 

For the above trick to work, you need r.EarlyZPass enabled. The variable has several different settings (see the code for details); while we shipped the PS4 port with a full Z prepass (r.EarlyZPass=2) in order to have D-buffer decals working, the VR edition makes use of just opaque (and non-masked) occluders (r.EarlyZPass=1), in order to conserve computing power. The rationale was that while we end up issuing more draw calls in the base pass, and pay a bit more penalty for overshading due to the simpler Z buffer, the thinner prepass would make it a net win.

【 r.EarlyZPass 为了保证上面的设置起作用,这一个选项需要开启。= 2 的时候是为了 D-buffer 通道贴图起作用, = 1 的时候只是用来处理透明,这里就是,足够用了。】

 
 

We have also settled on bumping r.LightShaftDownSampleFactor even further up, from the default of 2 to 4. This means that our light shaft masks’ resolution is just a quarter of the main render target. Light shafts are very blurry this way, but it did not really hurt the look of the game.

【r.LightShaftDownSampleFactor 我们将这个值调的很高,默认是2,我们使用4。这意思是光照的精度处理只是一般渲染目标质量的四分之一,这样在这里也够用。】

 
 

Finally, I settled on disabling the “new” (at the time) UE 4.8 feature of r.BasePassOutputsVelocity. Comparing its performance against Rolando Caloca’s hack of injecting meshes that utilize world position offset into the velocity pass with previous frame’s timings (which I had previously integrated for the PS4 port to have proper motion blur and anti-aliasing of foliage), I found it simply outperformed the new solution in our workload.

【r.BasePassOutputsVelocity 这个UE4.8以后的新特性我们也关掉了。这边应该说的是这种方法对于我们项目中使用的运动模糊和反锯齿方式来说反倒会起到反作用。】

 
 

 
 

Experiments with shared visibility

 
 

If you are not interested in failures, feel free to skip to the next section (Stereo instancing…).

 
 

Several paragraphs earlier I mentioned stalls in the early Z prepass. You may have also noticed in the profile above that our draw time (i.e. time spent in the render thread) was several milliseconds long. It was a case of a Heisenbug: it never showed up in any external profilers, and I think it has to do with all of them focusing on isolated frames, and not sequences thereof, where inter-frame dependencies rear their heads.

【你会发现上面我们的截图的渲染时间长达几毫秒,这是Heisenbug,因为这里只考虑孤立的帧的情况而不考虑帧间关系。】

 
 

Anyway, while I am still not convinced(相信) that the suspicious prepass GPU timings and CPU draw timings were connected, I took to the conventional wisdom that games are usually CPU-bound when it comes to rendering. Which is why I took a look at the statistics that UE4 collects and displays, searching for something that could help me deconstruct the draw time. This is the output of STAT INITVIEWS, which shows details of visibility culling performance:

【不管怎么样,我还是非常怀疑对CPU/GPU运算的解耦,我想看看在我对GPU绘制的优化过程中CPU有啥反映,这就是为什么我要收集统计UE4相关信息的原因。具体展现信息如下图】

 
 


Output of STAT INITVIEWS in the PS4/Redux version.

 
 

Whoa, almost 5 ms spent on frustum and occlusion culling! That call count of 2 was quite suggestive: perhaps I could halve this time by sharing the visible object set data between eyes?

【视锥和遮挡剔除花了5ms,那么我们是否因该考虑两只眼睛渲染之间的关系来做优化。】

 
 

To this end, I had made several experiments. There was some plumbing required to get the engine not to run the view relevance code for the secondary eye and use the primary eye’s data instead. I had added drawing a debug frustum to the FREEZERENDERING command to aid in debugging culling using a joint frustum for both eyes.I had improved theDrawDebugFrustum() code to better handle the inverse-Z projection matrices that UE4 uses, and also to allow a plane set to be the data source. Getting one frustum culling pass to work for both eyes was fairly easy.

【接下来我做了很多实验,最终使得一次视域剔除作用在两只眼睛的渲染过程中是可行的。】

 
 

But occlusion culling was not.

【但是遮挡处理不可以】

 
 

For performance reasons mentioned previously, we were stuck with the occlusion query-based mechanism (UE4 runs a variant of the original technique). It requires an existing, pre-populated depth buffer to test against. If the buffer does not match the frustum, objects will be incorrectly culled, especially at the edges of the viewport.

【基于性能方面的考虑,我们坚持occlusion query-based mechanism,他必须要一个预填充的depth buffer来做遮挡测试。如果这个buffer不匹配frustum,会出错。】

 
 

There seemed to be no way to generate a depth buffer that could approximate the depth buffer for a “joint eye”, short of running an additional depth rendering pass, which was definitely not an option. So I scrapped the idea.

【这个很难去生成一个双眼近似的depth buffer去做处理,因此放弃。】

 
 

Many months and a bit more experience later, I know now that I could have tried reconstructing the “joint eye” depth buffer via reprojection, possibly weighing in the contributions of eyes according to direction of head movement, or laterality; but it’s all over but the shouting now.

【我现在觉得可以根据双眼的物理存在感去重构depth buffer的reprojection算法来达到上面的目的。】

 
 

And at some point, some other optimization — and I must admit I never really cared to find out which one, I just welcomed it — made the problem go away as a side effect, and so it became a moot point:

【还有一些其他的优化,反正总的目的就是有效,有些具体细节我也没有去深入探讨。】

 
 


Output of STAT INITVIEWS in the VR version.

 
 

 
 

Stereo instancing: not a silver bullet

 
 

Epic have developed the feature of instanced stereo rendering for UE 4.11. We had pre-release access to this code courtesy of Epic and we had been looking forward to testing it out very eagerly.

【UE 4.11 提出了  instanced stereo rendering,我们对此非常非常的期待。】

 
 

It turned out to be a disappointment, though.

【虽然最终是很失望的结果】

 
 

First off, the feature was tailored quite specifically to the Bullet Train UE4 VR demo.

【首先这功能是对Bullet Train demo 量身定做的。】

 
 

https://www.youtube.com/watch?v=DmaxmnPzMWE

 
 

Note that this demo uses dynamic lighting and has zero instanced foliage in it. Our game was quite the opposite. And the instanced foliage would not draw in the right eye. It was not a serious bug; evidently, Epic focused just on the features they needed for the demo, which is perfectly understandable, and the fix was easy.

【注意这个demo采用的是动态光源且has zero instanced foliage in it,这与我们的游戏非常的不一样。对于 instanced foliage不能借鉴一只眼睛的绘制于另一只眼睛。这个不是Bug,但是明显的,epic只在乎的是在这个demo里面很好的使用和容易实现。】

 
 

But the worst part was that it actually degraded performance. I do not have that code laying around anymore to make any fresh benchmarks, but from my correspondence with Ryan Vance, the programmer at Epic who prepared a code patch for us (kudos to him for the initiative!):

【但更坏的是这其实反倒导致性能下降。我没有能力去对这块做修改,但是非常需要demo里那样的性能提升,希望Epic能给我来个补丁。(还讽刺了一下Epic哈哈)】

 
 

Comparing against a pre-change build reveals a considerable perf hit: on foliage-less scenes (where we’ve already been GPU-bound) we experience a ~0.7 ms gain on the draw thread, but a ~0.5 ms loss on the GPU.

【在一个foliage-less的场景测试结果是绘制线程快了 0.7ms, 但是GPU整体慢了 0.5ms】

 
 

Foliage makes everything much, much worse, however (even after fixing it). Stat unit shows a ~1 ms GPU loss with vr.InstancedStereo=0 against baseline, and ~5 ms with vr.InstancedStereo=1!

【Foliage 会让一切变得更坏,vr.InstancedStereo=0 的时候GPU慢了1ms, vr.InstancedStereo=1 的时候慢了 5ms】

 
 

Other UE4 VR developers I have spoken to about this seem to concur. There is also thread at the Unreal forums with likewise complaints. As Ryan points out, this is a CPU optimization, which means trading CPU time for GPU time. I scrapped the feature for Ethan Carter VR — we were already GPU-bound for most of the game by that point.

【我和其他的UE4 VR开发者谈过这个问题,他们也很同意。也有人在论坛里面抱怨过这个问题的,总的来说这可以看作是一个CPU的优化,利用GPU的性能来降低CPu的消耗。】

 
 

 
 

The all-seeing eyes

 
 


The problematic opening scene.

 
 

At a point about two-thirds into the development, we had started to benchmark the game regularly, and I was horrified to find that the very opening scene of the game, just after exiting the tunnel, was suffering from poor performance. You could just stand there, looking forward and doing nothing, and we would stay pretty far from VR performance targets. Look away, or take several steps forward, and we were back under budget.

【这里开始讲VR游戏的一个特点就是,用户在全开放的世界里面的关注点是你没有办法能够预测的,这一点在一开始我们做游戏的时候最让人觉得惊讶的,也给我们带来了程序性能上的问题。】

 
 

A short investigation using the STAT SCENERENDERING command showed us that primitive counts were quite high (in the 4,000–6,000 region). A quick look around using the FREEZERENDERING command did not turn up any obvious hotspots, though, so I took to the VIS command. The contents of the Z-buffer after pre-pass (but before the base pass!) explained everything.

【我们将整个区域分成更多块,然后去测试用户比较关注的块,结果找不到任何的热点。】

 
 


Note the missing ground in the foreground, in the bottom-left visualizer panel.

 
 

At the beginning of the game, the player emerges from a tunnel. This tunnel consists of the wall mesh and a landscape component (i.e. terrain tile) that has a hole in it, which resulted in the entire component (tile) being excluded from the early Z-pass, allowing distant primitives (e.g. from the other side of the lake!) to be visible “through” large swaths of the ground. This was also true of components with traps in them, which are also visible in this scene.

【游戏一开始是一个铁道的隧道,因此可以通过Z值来遮挡掉大量的不可见部分。但是到了开阔区域,场景中的一切都可能随时被看到。】

 
 

I simply special-cased landscape components to be rendered as occluders even when they use masked materials (reminder: you need to have your Unreal Engine and Github accounts connected to see UE4 commits). This cut us from several hundred to a couple thousand draw calls in that scene, depending on the exact camera location.

【因此VR让我们回到了几百年前,你所站的位置也就是相机位置的四周都必须考虑进去渲染计算,然后通过相机裁剪得到最终结果,但其实性能尚有浪费。】

 
 

 
 

Fog so thick one might have spread it on bread

 
 

Still not happy with the draw call count, I took to RenderDoc. It has the awesome render overlay feature that helps you quickly identify some frequent problems. In this case, I started clicking through occlusion query dispatch events in the frame tree with the depth test overlay enabled, and a pattern began to emerge.

【对于Drawcll的数量还是不满意,我们来使用renderdoc来分析一下,它具有叠加呈现的功能可以让你快速的识别一些常见问题。在这里我用来分析遮挡问题。】

 
 


RenderDoc’s depth test overlay. An occlusion query(遮挡查询) dispatched for an extremely distant, large (about 5,000 x 700 x 400 units) object, showing a positive result (1 pixel is visible).

 
 

Since UE4 dispatches bounding boxes of meshes for occlusion queries, making it somewhat coarse and conservative (i.e. subject to false positives), we were having large meshes pass frustum culling tests, and then occlusion, by having just 1 or 2 pixels of the bounding box visible through thick foliage. Skipping through to the actual meshes in the base pass would reveal all of their pixels failing the depth test anyway.

【UE4采用基于包围盒的遮挡测试,粗糙但是快速,避免了实际的网格之间的测试。】

 
 


RenderDoc’s depth test overlay in UE4’s base pass. A mesh of decent size (~30k vertices, 50 x 50 x 30 bounding box), distant enough to occupy just 3 pixels (L-shaped formation in the centre). Successful in coarse occlusion testing, but failing the per-pixel depth tests.

 
 

Of course, every now and then, a single pixel would show through the foliage. But even then, I could not help noticing that it would be almost completely washed out by the thick fog that encompasses the forest at the beginning of the game!

【当然,时不时的,会有些非常小的pixel的深度值很突兀,包围盒方法错的很难看,这也可以通过雾来遮挡掉。】

 
 

This gave me the idea: why not add another plane to the culling frustum, at the distance where fog opacity approaches 100%?

【这给了我一个灵感:为什么不在雾导致完全不透明的距离上加一块板来遮挡掉后面的所有。】

 
 

Solving the fog equation for the distance and adding the far cull plane shaved another several hundred draw calls. We had the draw call counts back under control and in line with the rest of the game.

【这样可以大量的减少drawcall数量】

 
 

 
 

Insane LODs

 
 

At some point late in development, AMD’s Matthäus G. Chajdas was having a look at a build of the game and remarked that we are using way too highly tessellated trees in the aforementioned opening scene. He was right: looking up the asset in the editor had revealed that screen sizes of LODs 1+ were set to ridiculous amounts in the single-digit percentage region. In other words, the lower LODs would practically never kick in.

【在开发的后期,AMD的大神看了demo后说我们在开放场景中用了太多的树的高模。他说得非常对,我们需要在离用户比较近的地方使用这些模型,远距离的时候采用LOD。】

 
 

When asked why, the artists responded that when using the same mesh asset for hand-planted and instanced foliage, they had the LODs kick in(起作用) at different distances, and so they used a “compromise” value to compensate.

 
 

Needless to say, I absolutely hate it when artists try to clumsily work around such evident bugs instead of reporting them. I whipped up a test scene, confirmed the bug and started investigating(调查), and it became apparent that instanced foliage does not take instance scaling into account when computing the LOD factors (moreover, it is not even really technically feasible without a major redecoration, since the LOD factor is calculated per foliage type per entire cluster). As a result, all instanced foliage was culled as if it had a scale of 1.0, which usually was not the case for us.

 
 

Fortunately, the scale does not vary much within clusters. Taking advantage of this property, I put together some code for averaging the scale over entire instance clusters, and used that in LOD factor calculations. Far from ideal, but as long as scale variance within the cluster is low, it will work. Problem solved.

 
 

【必须要说的是,LOD会带来视觉效果上的一些BUG,对于这些问题处理起来很让人受不了。主要是实例化后比例上的问题,通过打组解决】

 
 

 
 

The money shot

 
 

But the most important optimizationthe one which I believe put the entire endeavour in the realm of possibilitywas the runtime toggling of G-buffers. I must again give Matthäus G. Chajdas credit for suggesting this one; seeing a GPU profile of the game prompted him to ask if we could maybe reduce our G-buffer pixel format to reduce bandwidth saturation. I slapped my forehead, hard. ‘Why, of course, we could actually get rid of all of them!’

【最重要的优化就是运行时G-Buffer切换,我们需要去减少G-buffer的使用来减少带宽的负荷来提高运行速率。】

 
 

At this point I must remind you again that Ethan Carter has almost all of its lighting baked and stowed away in lightmap textures. This is probably not true for most UE4 titles.

【说到这里我必须再次提醒 Ethan Carter 几乎采用的全是烘焙好的lightmap来实现光照,这不是大多数UE4的游戏都这么做的。】

 
 

Unreal already has a console variable for that called r.GBuffer, only it requires a restart of the engine and a recompilation of base pass shaders for changes to take effect. I have extended the variable to be an enumeration, assigning the value of 2 to automatic runtime control.

【r.Gbuffer 变量就是来控制这边内容的,注意修改这个值需要重启编译器重新编译所有的shader,数值是一个枚举值,数值为2就是自动运行时控制。】

 
 

This entailed a bunch of small changes all around the engine:

【这里需要对引擎做一些小的修改】

 
 

  1. Moving light occlusion and gathering to before the base pass.
  2. Having TBasePassPS conditionally define the NO_GBUFFER macro for shaders, instead of the global shader compilation environment.
  3. Creating a new shader map key string.
  4. Finally, adjusting the draw policies to pick the G-buffer/no G-buffer shader variant at runtime.

 
 

This change saved us a whopping 2–4 milliseconds per frame, depending on the scene!

【这可以每帧减少2-4ms!!!】

 
 

It does not come free, though — short of some clever caching optimization, it doubles the count of base pass shader permutations, which means significantly longer shader compiling times (offline, thankfully) and some additional disk space consumption. Actual cost depends on your content, but it can easily climb to almost double of the original shader cache size, if your art team is overly generous with materials.

【其实他是在变异的时候做了更多的优化工作来减少所需要的shader的cache,虽然增加了编译时间但那是离线的没关系。】

 
 

The fly in the ointment

Except of course the G-buffers would keep turning back on all the time. And for reasons that were somewhat unclear to me at first.

【美中不足:G-buffers would keep turning back on all the time.】

 
 

A quick debugging session revealed that one could easily position themselves in such a way that a point light, hidden away in an indoor scene at the other end of the level, was finding its way into the view frustum. UE4’s pretty naive light culling (simple frustum test, plus a screen area fraction cap) was simply doing a bad job, and we had no way of even knowing which lights they were.

【可以通过单点光源在UE4里面调试来fit到适合的最好结果。】

 
 

I quickly whipped up a dirty visualisation in the form of a new STATcommand — STAT RELEVANTLIGHTS — that lists all the dynamic lights visible in the last frame, and having instructed the artists on its usage, I could leave it up to them to add manual culling (visibility toggling) via trigger volumes.

【通过statcommand可以可视化frame里所有的光源使用情况,便于调试处理。】

 
 


STAT RELEVANTLIGHTS output. Left: scene with fully static lighting. Right: fully dynamic lighting; one point light has shadow casting disabled.

 
 

Now all that was left to optimize was game tick time, but I was confident that Adam Bienias, the lead programmer, would make it. I was free to clean my desk and leave for my new job!

【到此该讲的就讲完了,作者准备滚蛋开始新工作了。】

 
 

 
 

Conclusions

 
 

In hindsight, all of these optimizations appear fairly obvious. I guess I was simply not experienced enough and not comfortable enough with the engine. This project had been a massive crash course in rendering performance on a tight schedule for me, and there are many corners I regret cutting and not fully understanding the issue at hand. The end result appears to be quite decent, however, and I allow myself to be pleased with that. 😉

【事后来看所有优化效果都是相当明显的,作者非常满意。】

 
 

It seems to me that renderer optimization for VR is quite akin to regular optimization: profile, make changes, rinse, repeat. Original VR content may be more free in their choice of rendering techniques, but we were constrained by the already developed look and style of the game, so the only safe option was to fine-tune what was already there.

【优化工作流程: profile, make changes, rinse, repeat】

 
 

I made some failed attempts at sharing object visibility information between eyes, but I am perfectly certain that it is possible. Again, I blame my ignorance and inexperience.

【失败的尝试就不再这里废话了】

 
 

The problem of early-Z pass per-eye timings discrepancy/occlusion query stalling calls for better understanding. I wish I had more time to diagnose it, and the knowledge how to do it, since all the regular methods failed to pin-point it (or even detect it), and I had only started discovering xperf/ETW andGPUView.

【上面提到的early-Z pass per-eye timings discrepancy/occlusion query stalling calls这部分以后会写的更详细】

 
 

Runtime toggling of G-buffers is an optimization that should have made it into the PS4 port already, but again — I had lacked the knowledge and experience to devise it. On the other hand, perhaps it is only for the better that we could not take this performance margin for granted.

【G-buffers方面可能可以更进一步,我还需努力去了解。】

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

Cubiquity for Unity3D 使用

http://www.cubiquity.net/cubiquity-for-unity3d/1.2/docs/index.html

 

Cubiquity 相关网站提供了功能健全的体素解决方案,特别是功能还切分了核心的引擎以及各个平台的插件,详细的使用说明和问答,非常适合快速实现体素内容。这里介绍一下这一款工具的基本内容和框架,以及使用上的一些总结。

 

  • Installation

There are currently three ways of obtaining Cubiquity for Unity3D.

Enabling ‘unsafe’ code

概念:

Cubiquity for Unity3D is built on top of a native code (C++) library which handles most of the heavy-lifting behind the scenes.These meshes need to be passed from the unmanaged native-code world of Cubiquity into the managed-code world of C# and Unity.

The C# language provides (at least) two ways to perform this communication.

The first is is via Marshalling.

The second approach is to make use of the ‘unsafe’ keyword to enable pointers and direct memory access in our C# code, so that we can directly read from the memory owned by the native code library.

The second approach is faster, cleaner, and is what we recommend.

做法:

If you do not already have .rsp files in your Assets folder then the easiest approach is to copy the sample files which we provide in the Assets/Cubiquity/Unsafe folder (only copy the .rsp files – corresponding .meta files will be generated automatically).

If you do already have conflicting .rsp files then you will need to manually add the required compiler switches to them. These switches are:

-unsafe -define:CUBIQUITY_USE_UNSAFE

The first of these enables unsafe code for the compiler, and the second tells Cubiquity for Unity3D that this mode is available so that it can use the correct code path.

 

  • Quick Start

  • Creating your first voxel terrain

Cubiquity for Unity3D supports two types of voxel environments.

第一种是正常地形

(创建)

Begin by opening the scene ‘EmptySceneWithLighting’ in the Assets/Cubiquity/Examples folder.

Now create a Terrain Volume from within the Unity3D editor by going to the main menu and selecting GameObject -> Create Other -> Terrain Volume.

NTV_1_01

(编辑)

Your new terrain should be automatically selected, and as with any other Unity object you can scale, rotate, and translate it by using the the usual gizmos.

To actually edit the terrain you need to select one of the tools from the ‘Terrain Volume (Script)’ component in the inspector.

NTV_1_02

Take some time to experiment with the editing tools which are available. You can choose a tool to apply (sculpt, smooth, etc) by selecting one of the buttons at the top of the inspector, and then choose your desired brush options and/or materials. Left-clicking on the terrain will then apply the tool.

(碰撞)

Now lets try adding a collider so that other objects can interact with the terrain.

To do this we add a ‘Terrain Volume Collider’ component through Add Component -> Scripts -> Cubiquity -> Terrain Volume Collider.

Note that this is different from the ‘MeshCollider’ which is often added to other Unity objects.

To test the collisions we can now import one of the standard Unity character controllers and walk around the terrain in play mode.

Be aware that it can take a few seconds for the terrain to generate after you press the play button, so for this reason you should set your character to start a hundred units or so above the terrain. This way the terrain will have time to load before the character reaches ground level (there are better approaches, but this is fine for quick-start purposes).(游戏开始需要时间生成地形,因此要把对象放得高点避免还没生成地形人就掉下去了)

 

  • Creating your first colored cubes volume

第二种是方块地形

Cubiquity for Unity3D also supports a second type of voxel environment in which the world is built out of millions of colored cubes.

(创建)

You can then create a Colored Cubes Volume by going to the main menu and selecting GameObject -> Create Other -> Colored Cubes Volume. The initial volume should look like that shown below:

NTV_1_03

(编辑)

the editing facilities of this are currently very limited, and only allow you to create single cubes at a time by left-clicking on the volume.

Let’s try replacing the volume data with one of the example volumes which comes with Cubiquity. (使用一进编辑好的数据)

To do this, select the colored cubes volume, go to ‘Settings’ in the inspector, and click the small circle next to the ‘Volume Data’ field.

From here you can select the ‘VoxeliensColoredCubes’ asset which will cause it be be used as the source data for the volume (this is a map from our previous game Voxeliens).

NTV_1_04

The asset you have selected is actually just a very thin wrapper around our Cubiquity voxel database file format. If you click on the ‘Volume Data’ field (rather than clicking on the circle, as you did previously) then Unity will show you that the asset actually exists in ‘Assets/Cubiquity/Examples/Basic’.

Modifying the volume at run-time

We will now give a quick demonstration of how the volume can be modified during gameplay.

Go to the ‘Assets/Cubiquity/Examples/SharedAssets/Scripts’ folder and find the ‘ClickToDestroy’ script.

Drag this on to the Colored Cubes Volume in the scene hierarchy to add it as a component.

We will also want a collider for this to work correctly so add one via Add Component -> Scripts -> Cubiquity -> Colored Cubes Volume Collider

(note that a Colored Cubes Volume Collider is different to the Terrain Volume Collider you used in the earlier example)

When you press play you should find you are able to fly around the scene, and that if you left-click on the volume it will create a small explosion which breaks off the cubes in the surrounding area. The separated cubes then fall under gravity and can bounce around the scene. This is simply an example of the kind of functionality we can achieve, and you can learn more by reading the Class List later in this user manual, or by looking at the code in ClickToDestroy.cs.

NTV_1_05

 

  • Main Principles

  • Voxel Engine Concepts

Cubiquity is a voxel engine, which means that it represents it’s objects as samples on a 3D grid.

In many ways a voxel can be considered the 3D equivalent of a pixel.

Rendering such a 3D grid directly is not trivial on modern graphics cards as they are designed for rendering triangles rather than voxels.

 

  • ‘Cubiquity’ vs. ‘Cubiquity for Unity3D’

‘Cubiquity’, is a native code (i.e. C/C++) library for storing, editing, and rendering voxel worlds.

‘Cubiquity for Unity3D’ is a set of C# scripts which connect Cubiquity to the Unity3D game engine. These scripts allow Unity3D games to create, edit and display Cubiquity volumes.

Calling the Cubiquity Native-Code Library (P/Invoke技术)

Functions defined in the native-code library can be called from Unity3D scripts using some magic known as P/Invoke. The file ‘CubiquityDLL.cs’ uses this P/Invoke technology to provide thin .NET wrappers around each function which is available in the Cubiquity engine.

in particular that it cannot be used with the Unity3D web-player because this does not support native code.

The Cubiquity Voxel Database Format (SQLite数据库)

The Cubiquity voxel engine stores a volume as a Voxel Database, which is a single file containing all the voxels in the volume.

Internally it is actually an SQLite database and so can be opened with a tool such as SQLite Browser.

 

  • Key Components

we have adopted a compoent-based model in which the user can add a GameObject to a scene, give it a ‘Volume’ component to make it into a voxel object, and then add VolumeRenderer and VolumeCollider components to control it’s behaviour.

With this in mind, the structure of a typical volume is as follows:

NTV_1_06

The structure of a typical volume

 

The Volume Component

(voxel数据的读取,修改,存储)

Adding a Volume component to a GameObject makes it into a voxel object.

Volume components also have custom inspector implemented which allow you to edit the volume in an intuitive way.

Cubiquity for Unity3D wraps voxel databases with a class called VolumeData, and more specifically with its subclasses called TerrainVolumeData and ColoredCubesVolumeData.

 

The VolumeRenderer Component

(voxel数据的显示)

The visual appearance of the volume is controlled by the VolumeRenderer, and if a VolumeRenderer is not present then the volume will not be visible in the scene.

 

The VolumeCollider Component

(voxel碰撞支持)

The VolumeCollider component should be attached to a volume if you wish a collision mesh to be generated.

This will allow other object in the scene (such as rigid bodies) to collide with the volume.

 

 

  • Obtaining Volume Data

unity是游戏引擎,因此voxel数据生成最好在其他地方完成然后导入比较方便快捷

  • Importing From External Sources

where did these .vdb files come from?

Note that both Magica Voxel and image slices are only appropriate for importing colored cubes volumes.

Currently there are no methods for creating terrain volumes outside of Cubiquity for Unity3D, but you can still create them procedurally as discussed later.

 

Importing From Magica Voxel

You may wish to model your voxel geometry in an external application such as Magica Voxel, Voxel Shop, or Qubicle Constructor.

Such applications will typically allow more comprehensive editing options than we will ever provide in Cubiquity, and Cubiquity already provides the option to import Magica Voxel files (others will follow in the future).

This is done by the use of the command-line ProcessVDB tool.

To use this tool you should open a command prompt and change to the StreamingAssets\Cubiquity\SDK directory. From here you can run:

ProcessVDB.exe -import -magicavoxel input.vox -coloredcubes output.vdb

You can then copy the resulting .vdb into the StreamingAssets\Cubiquity\VoxelDatabases folder before following the instruction in Creating From An Existing Voxel Database to import it into Unity.

 

Importing From Voxlap

ProcessVDB.exe -import -vxl input.vxl -coloredcubes output.vdb

 

Importing From Image Slices(图片切片)

The same tool can be used to import colored cubes volume from a series of color images representing slices through the volume

(see Assets\Cubiquity\Examples\VolumeData\VoxeliensLevel3 for an example of such a series of slices).

You can call the converter in the same way as before, but providing a path to a folder containing image slices rather than to a Magica Voxel .vox file:

ProcessVDB.exe -import -imageslices /path/to/image/folder -coloredcubes output.vdb

 

 

  • Generating Volume Data Through Scripts

Cubiquity for Unity3D provides a very simple but powerful API for generating volumes through code.

Each volume is essentially just a 3D grid of voxel values, and the API gives you direct access to these through the VolumeData’s GetVoxel(…) and SetVoxel(…) methods.

 

Using a noise function: Evaluating a 3D noise function (such as Perlin noise or Simplex noise) at each point on the grid can generate both natural and surreal environments. Multiple octaves of noise can be combined to add additional detail. Please see the ‘Procedural Generation’ example in the examples folder.

 

Reading an input image: The ‘Maze’ example (see the examples folder) reads a 2D image of a maze and sets the height of voxel columns based of whether the corresponding pixel is black or white. The same principle can be applied to generating a terrain from a heightmap.

 

Building planets from spheres: You can create spheres by computing the distance function from a point, and a cube map can then be used to apply a texture. The ‘Solar System’ example shows how this can be used to create planets.

 

 

  • Duplicating, instancing, and sharing of volume data

  • General considerations

Cubiquity for Unity3D provides full support for having multiple volumes in a scene.

However, these are two main things to keep in mind and which we shall state here at the beginning.

  1. In many cases you will be better off having a single volume in your scene.
  2. Avoid having multiple VolumeData instance referencing the same voxel database.

so we strongly suggest that you avoid duplicating volume instances.

  • Scenarios

We now look at a few possible scenarios which exemplify the way in which multiple volumes should or should not be used.

 

Modeling a large terrain

It may seem tempting to break a large terrain down into smaller volumes in order to benefit from optimizations such as occlusion culling. However, Cubiquity already implements such optimizations internally and attempting to use multipl volumes in this scenario will likely lead to a loss or performance.

 

Modeling a city

If you have a city with a number of identical buildings then it might seem desirable to represent the building as a read-only volume and then place multiple instances of it. However, it is much better to have a single volume representing your whole world, and to write the voxel data for the buildings directly into this single main volume in multiple locations.

As will as giving better performance, this will make it easier to update the volume in response to events such as explosions. With multiple volumes you would need to iterate over then to determine which one are affected by the explosion, but with a single volume this becomes trivial.

The downside of this is that it may be harder to build such a volume, as you need a world-creation process which can build much larger environments.

 

Modeling planets/asteroids

This is a scenario where we believe the use of multiple volumes is appropriate, and an example is included with Cubiquity for Unity3D (the ‘Solar System’ example). In this case we provide a single voxel database which is referenced by two VolumeData assets, both of which have their read-only flags set. These two VolumeData assets are used by two seperate volumes representing the Earth and the Moon.

The voxel data is actually a series of concentric spheres with different material identifiers, representing the layers which are present in a typical planet. The actual material and set of textures we apply can differ between the volumes, giving the Earch and moon different visual appearances. In play mode it is possible to independantly modify each of the bodies as the changes are stored in a temporary location, but as mentioned previously these changes cannot be written back to the voxel database because it is opened as read-only.

This scenario is appropriate for multiple volumes primarily because we want to have different transforms applied to each (i.e. the Moon orbits the Earth while the Earth orbits the Sun).

 

其他详细见文档

 

补充代码说明:

 

CubiquityDLL.cs:描述的全部都是引擎提供的接口,上层就是来使用这些功能。

 

Volume.cs:两种Volume风格的父亲,Base class representing behaviour common to all volumes.

包括基本的数据存储单元mData (VolumeData定义的结构)

VolumeData::ScriptableObject(unity提供的非Game Objects对象派生类接口):Base class representing the actual 3D grid of voxel values。描述的是创建,连接和操作Voxel数据库的方法。

这里主要的功能方法就是update()用于网格更新。

 

ProceduralTerrainVolume.cs

程序生成地形的一个完整例子!

 

CreateVoxelDataBase.cs

主要都是预置的创建体素对象的方法。

 

TerrainVolumeData.cs

这里全部都是用于修改体素数据的方法。

data.SetVoxel(x, y, z, materialSet);

materialSet:0-255 0表示不存在,源程序值越大就是硬度越高,就会用岩石的纹理,低的话就是草地的纹理。

 

ClickToCarveTerrainVolume.cs

Update()

用来实时获取鼠标状态,如果满足非连续单次点击,则计算出其应该在的射线与voxel的第一个交点

DestroyVoxels()

这里的做法就是通过传入的点击点位置,计算周围一个距离范围内的左右voxel的material值设为0

 

SpawnPhysicsObjectsOnMeshSync.cs

在场景中产生物理对象与体素对象交互。

 

设置纹理颜色

volumeRenderer.material.SetTextureScale(“_Tex0”, new Vector2(0.062f, 0.062f));

这一句替换成设置纹理颜色的函数应该就可以

volumeRenderer.material是unity的纹理接口对象

http://docs.unity3d.com/ScriptReference/Material.html

 

 

AE 结合 Maya 的 Matchmoving 做法

  1. 首先建立AE工程文件,导入影片,建立工作sequence
  2. 选中工作sequence上面的影片,单击右键,选择camera tracking

20160801_01

  1. 在影片中选择参考点来建立一个锚位,最后点击create null and camera来创建新的锚位图层和相机图层

20160801_02

  1. 锚位涂层处理:注意对此图层增加position, scale锚点

20160801_03

  1. 安装AE3D脚本插件

http://www.motion-graphics-exchange.com/after-effects/AE3D-Export-Maya-Max-and-Lightwave/4bff8b4034916

  1. File -> scripts 找到安装的脚本,点击运行
  2. 选中要导出的图层,默认设置导出maya文件
  3. 使用maya打开生成的.ma文件,可以看到outline里面包含相机和锚点位置

20160801_04

  1. 接下来要做的就是将想要的虚拟对象放置到锚点,并作相关的渲染前动作,比如打光,动画

最后开启相机画面来看效果,注意要选一下哪台相机的视角。

20160801_05Machine generated alternative text:
htinq Renderer

  1. 渲染完整的动画序列为.tga格式的图片序列

这里要注意渲染序列要是batch renderer,渲染之前要处理一下渲染设置

20160801_06

20160801_07Machine generated alternative text:
(turing Toon stereo 
Cache 
Help 
\enderin 
Show 
Render Settings 
Render Settings... 
Render Using 
Test Resolution 
set NURas Tessellation 
Run Render Diagnostics 
Export Pre-Compositing 
Rendering 
Render Current Frame 
Redo Previous Render 
IPR Render Current Frame 
Redo Previous 'PR Render 
Satch Render 
Cancel Batch Render 
Show Batch Render 
Create Backburner Job...

  1. AE里面导入上面生成的图片序列,后处理,得到目标影片

Managing Transformations in Hierarchy

  • Introduction

 

One of the most fundamental aspects of 3D engine design is management of spatial relationship between objects. The most intuitive way of handling this issue is to organize objects in a tree structure (hierarchy), where each node stores its local transformation, relative to its parent.

The most common way to define the local transformation is to use a socalled TRS system, where the transformation is composed of translation, rotation, and scale. This system is very easy to use for both programmers using the engine as well as non-technical users like level designers. In this chapter we describe the theory behind such a system.

One problem with the system is decomposition of a matrix back to TRS. It turns out that this problem is often ill-defined and no robust solution exists. We present an approximate solution that works reasonably well in the majority of cases.

 

  • Theory

树结构

Keeping objects in hierarchy is a well-known concept. Every object can have a number of children and only one parent.  It can also be convenient to store and manage a list of pointers to the children so that we have fast access to them. The aforementioned structure is in fact a tree.

节点结构

We assume that a node stores its translation, rotation, and scale (TRS) that are relative to its parent. Therefore, we say these properties are local. When we move an object, we drag all its children with it. If we increase scale of the object, then all of its children will become larger too.

例:

bgt8_1_01

 

变换矩阵与RTS

对于单个节点的变换矩阵和RTS的关系

Local TRS uniquely defines a local transformation matrix M. We transform vector v in the following way:

bgt8_1_02

where S is an arbitrary scale matrix, R is an arbitrary rotation matrix, T is a translation matrix, and T is the vector matrix T is made of.

系统层次结构的变换矩阵关系

To render an object, we need to obtain its global (world) transformation by composing local transformations of all the object’s ancestors up in the hierarchy.

The composition is achieved by simply multiplying local matrices. Given a vector v0, its local matrix M0, and the local matrix M1 of v0’s parent, we can find the global position v2:

bgt8_1_03

Using vector notation for translation, we get

bgt8_1_04

这里需要注意的就是

RS != S’R’

 

Skew Problem

问题描述:

Applying a nonuniform scale (coming from object A) that follows a local rotation (objects B and C) will cause objects (B and C) to be skewed. Skew can appear during matrices composition but it becomes a problem during the decomposition, as it cannot be expressed within a single TRS node. We give an approximate solution to this issue in Section 3.2.4.

bgt8_1_05

解决方法:

Let an object have n ancestors in the hierarchy tree. Let M1,M2, · · · ,Mn be their local transformation matrices, M0 be a local transformation matrix of the considered object, and Mi = SiRiTi.

MTRSΣ = M0M1 · · ·Mn

MTR = R0T0R1T1 · · ·RnTn

TR可以很好的叠加来获得世界坐标的TR

MSΣ = MRSΣMR

here we have the skew and the scale combined. We use diagonal elements of MSΣ to get the scale, and we choose to ignore the rest that is responsible for the skew.

Scale 的话用这边算出来的对角线来表示,其他的结果丢弃采用上面的TR,这样的结果就可以避免skew.

 

父节点变化处理

In a 3D engine we often need to modify objects’ parent-children relationship.

we want to change the local transformation such that the global transformation is still the same. Obviously, that forces us to recompute local TRS values of the object whose parent we’re changing.

To get from the current local space to a new local space (parent changes, global transform stays the same), we first need to find the global transform of the object by going up in the hierarchy to the root node. Having done this we need to go down the hierarchy to which our new parent belongs.

LetM’0 be the new parent’s local transformation matrix. Let that new parent have n’ ancestors in the hierarchy tree with local transformations M’1,M’2, · · · ,M’n’, where M’i = S’iR’iT’i. The new local transformation matrix can thus be found using the following formula:

bgt8_1_06

bgt8_1_07

通过此公式就可以求出新的RTS

 

Alternative Systems

这边主要讲 Scale 处理,和skew相关

做法:除了叶节点存储x,y,z不相同的,各项异的scale值(三维向量)(nonuniform scale in last node),其他节点存储的是uniform scale值(不是三维向量,是值)这样可以有效的解决skew问题且实现简单。

 

  • Implementation

节点结构:

bgt8_1_08

Reducing Texture Memory Usage by 2-channel Color Encoding

原理:

简单地说就是贴图一般所使用到的色域都很小,因此可以利用这个特征来减少表示texture的数据量。

These single-material textures often do not exhibit large color variety and contain a limited range of hues, while using a full range of brightness resulting from highlights and dark (e.g., shadowed), regions within the material surface.

 

基本做法就是传输亮度和饱和度

The method presented here follows these observations and aims to encode any given texture into two channels: one channel preserving full luminance information and the other one dedicated to hue/saturation encoding.

 

Texture Encoding Algorithm

 

编码就是三维映射到二维的过程,就是找出一个平面,使得所有的三维顶点到该平面的距离最小来保证误差最小。

Approximating this space with two channels effectively means that we have to find a surface (two-dimensional manifold) embedded within this unit cube that lies as close as possible to the set of texels from the source texture.

bgt_7_01

 

步骤:

1. 重估颜色空间

 

sRGB颜色值转到线性颜色空间。

RGB值对亮度的贡献的非线性和不同的,因此我们要赋予RGB不同的权重。

上面两步得到新的可以线性运算的三维空间坐标

bgt_7_02

然后在此基础上去计算平面。

点到平面距离:

bgt_7_03

所有点到平面距离的平方和

bgt_7_04

通过如下计算方法计算得到。参考 estimate_image 函数和书本。

2. 算出两个base的颜色向量

 

bgt_7_05

这里很简单如图:已知道 bc1(0,1,m)bc2(1,0,n)初始化以后利用平面的信息求的。见函数

find_components() 求解。

3. 亮度编码

 

公式:

bgt_7_06

4. 饱和度编码

 

bgt_7_07

四步走:首先获得三维点在平面的投影,然后就有(0,0,0)到该投影点的向量,另外还可以计算得到两个base颜色向量,让这个投影向量用两个基本颜色向量表示。最终再通过公式求的饱和度值。

 

Decoding Algorithm

这个很简单,就是也需要两个base颜色值和亮度混合参数,然后反解得到。

bgt_7_08

 

  • 实现:

 

vec3 estimate_image(BITMAP *src) :

整个拆成 rr, gg, bb, rg, rb, gb 六个分量。首先计算整个image的六个分量的均值。

然后暴力法遍历预设好的法线取值范围(例 n.xyz 在 0 到 100)

求出误差公示最小的法线值。

 

void stamp_color_probe(BITMAP *bmp):

这个是图片颜色预处理

 

编码:

BITMAP *encode_image(BITMAP *src,vec3 n):

平面法线normalize,并找出两个基本的颜色。

然后通过这两个基本颜色构建2d位置坐标系和对应的颜色坐标系。

接着创建输出bitmap,然后对于每一个输出mipmap的像素点:

获得RGB值

应用 gamma 2.0

计算 2D 坐标系下面的坐标位置和颜色

构建饱和度 float hue = -da/(db-da+0.00000001f);(da, db是2d坐标下的颜色值)

构建亮度  float lum = sqrt(c.dot(to_scale))

编码成 hue, lum 两个分量。

 

解码

BITMAP *decode_image(BITMAP *src,const vec3 &base_a,const vec3 &base_b):

初始化目标Bitmap对于它的每一个像素点

首先获得传过来的 hue, lum

解码颜色 vec3 c = base_a + (base_b-base_a)*hue;

解码亮度 float clum = 0.2126f*c.x + 0.7152f*c.y + 0.0722f*c.z;

应用 gamma 2.0 回到 sRGB 值