Category: Advanced Game Tech

Phase-Functioned Neural Networks for Character Control

理论部分:

 
 

http://theorangeduck.com/page/phase-functioned-neural-networks-character-control

 
 

视频内容:

讲者背景:角色动画机器学习方向的博士毕业,

动画系统,中间的就是黑盒,就是动画系统

用户输入动画系统的是按钮的指向信息,如图所示的平面的指向,是比较高级的指令,比如我们想走向哪个方向。

动画系统则像是巨大的状态机,不同的状态对应不同的动画,之间会有动画混合。这个复杂的系统直接写代码比较复杂,很难维护,

因此我们重新考虑能不能用一个简单的算法来实现这个复杂的动画交互系统。

我们希望动画系统的输入输出就变成一些参数而已。

我们再来看原来的复杂的动画系统,如果把输入输出看作是动画模型的参数,那么也是可以做到的,就像是在一个巨大的数据库里面挑东西输出。

因此我们希望做到的第二点就是直接输出下一个pose

当然可以,基本思想就是把动画系统当作是黑盒,我们给一些输入就有想要的输出,后面具体讲做法。

输入x:Trajectory Positions, Directions, Heights;Previous Joint Position, Velocities…

输出y:各关节的transform信息

要做训练我们首先需要数据,采用的是动作捕捉,每一段约十分钟,一共约两小时的非结构化的数据。

高度匹配的问题,我们捕捉的数据默认脚就是完全贴合地面的,以此来训练,我们是了大量的各种不同的地面来获得相关的数据。

然后我们来看怎么学习

一个神经网络事实上就是一个函数,

比如这个函数就可以让输入得到相应的输出

我们的函数的输入输出如图所示

而这个可以量化为vectors作为输入输出

如果是单层的最简单的NN举个例子可以像是这样,wb这两个参数就是我们需要学习得到的结果。

输入就是已知的xy

输出就是学习得到的结果wb,如这里所示

最终我们得到的这个函数叫做行为函数

这里可能涉及各种不同的函数,比如这个是非线性混合函数

这两个就是很类似的

如果是深度学习的多层的函数,其表现就是这样

这个例子就是一个三层的神经网络公式

训练的做法就是每次输入值,然后跟据得到的结果衡量错误,然后在调整网络的参数,这就是基本的思路

我们采用了GPU运算节省时间

Phase-functioned NN意思就是我们采用了一种特殊的NN方法,对于不同的motion采用不同的网络权重,避免混合motions,细节请看paper

这是我们最终获得的,简单的动画驱动模型来替代state machineg和blend tree

然后展示demo

性能展示

结论

 
 

首先完整看一遍PPT:

SIGGRAPH上面的演讲PPT

目标:游戏中的角色控制做到快速紧凑的表现力

最终结果展示

第一部分背景

前人的工作存在的可改进之处:

  1. 需要将整个数据库全存放于内存
  2. 需要手动处理数据
  3. 需要一些复杂的加速方法

NN可以带来什么帮助呢?

  1. 虚拟的无限制的数据容量(任意动作)
  2. 快速的实时的低内存使用

但是怎么来生成动作呢?

CNN:学习用户角色控制信号与角色行为的关系

demo

问题是什么?

存在歧义,会发生相同的输入得到不同的角色行为结果

实际上:

  1. 需要特殊处理解决掉歧义
  2. 一开始需要提供所有的输入轨迹情况
  3. 多层CNN对于游戏来讲还是太慢了

RNN:学习从前一帧到后一帧的对应关系

demo

RNN结果质量:

  1. 只能坚持10
  2. 无法避免漂浮
  3. 无法避免歧义

总结我们面对的问题:

  1. 我们怎么去处理大规模的数据
  2. 我们怎么解决歧义的问题
  3. 我们怎么样让生成的结果看上去不错

数据捕捉部分

非结构化的数据捕捉,一共补了差不多两小时的数据,每一段十分钟左右,摆放了很多桌子椅子来模拟复杂地形,使得尽量包含各种复杂的情况

demo

demo

地形匹配

  1. 我们希望地形数据和运动数据一起加入学习
  2. 但是同时捕捉运动和地形数据是比较麻烦的
  3. 制作一个高度图数据库,然后让一段运动匹配高度图中的一块

例子

参数设置:

  1. 最终效果不错
  2. 角色轨迹采用窗口模式
  3. 加上了步态,地形高度等信息

神经网络部分

PFNN:一个phase函数看作是权重的NN

phase是0-2pi的标量,表示的是当前locomotion cycle下当前的pose

图示:输入是当前帧pose,输出是下一帧pose,NN里面的参数是phase function

demo

特征:前回馈NN,有两个隐藏层,每层有512个影藏单元,ELU驱动函数

输出是NN的权重,循环三次方函数差值四个控制点,每个控制点由一组NN权重组成。

训练算法:

  1. 输入phase生成权重
  2. 使用权重和输入值到nn得到输出值
  3. 衡量输出错误
  4. 反向传播nn和phase函数更新控制点的值

结果

demo

结论

phase函数预计算:因为这个函数的计算对于游戏来说是比较耗时的

  1. 控制数值范围 0-2pi,在这个范围内可以预计算
  2. 运行时对于预计算的结果做差值
  3. 得到速度和内存之间的平衡

性能参数

缺点:

  1. 模型的训练时间非常耗时
  2. 对于美术的编辑和修改,不能直接得到正反馈
  3. 很难预测结果,有问题也不能直接知道为什么

优点:

  1. NN很容易处理大量的数据,可以得到万般种结果
  2. 语义分解解决了歧义的问题
  3. 简单的结构和参数化的使用方式很容易控制质量

 
 

 
 

实践部分两步走:

  1. 先看demo怎么实现的
  2. 再看network怎么处理的

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

AIAnimation使用代码分析

历尽艰难跑起来了,问题是没有办法操作,猜测Ubuntu和windows的操作代码不兼容,代码分析部分的主要目标之一就是把操作改掉:

 
 

代码结构分析

一切从main函数开始:

  • 首先是SDL初始化,Simple DirectMedia Layer(SDL) is a cross-platform development library designed to provide low level access to audio, keyboard, mouse, joystick, and graphics hardware via OpenGL and Direct3D. 可以说是一个常用的跨平台opengl支持库。

    这边就是想到应该不会是操作兼容性的问题导致不能动,一看代码原来是只支持手柄。

    可惜的是,windows平台上面跑起来好卡,感觉帧率小于10帧!

  • GLEW初始化

  • 资源加载

    • Options 里面定义的是一些我们可以修改的设置选项:

    • CameraOrbit 则是相机初始化设置

    • LightDirectional 场景光照初始化设置

      这里就是一堆GL设置

    • Character 角色初始化设置

      这部分是这里的角色的定义信息和加载,这部分很重要!

      首先我们来看一下存储的角色数据的文件包涵下面四个,坑爹的采用了二进制存储:

      顶点,三角形,父子关系,xforms信息分别四个文件。

      读文件的函数这边也很简单,将数据直接存进对应的容器,角色数据结构信息如下:

      这部分最后就是一个前向动力学的实现,这个很简单,就是子类跟着父类运动。

    • Trajectory 路径初始化设置

      这里就是定义了运动路径的数据结构,如下所示:

    • IK 初始化设置

      Ik数据结构如下所示:

      这里还提供了一个two_joint函数,这个后面用到再讲,因为暂时也看不出来其功能。

    • Shader 相关

      这部分就是加载shader的函数,并将其运用到opengl

    • Heightmap 高度图初始化设置

      这边主要来看一下高度数据的读取和存储

      高度文件数据示例,包括两个文件:

       
       

       
       

      Load()函数就是一个一个读取float存储在 vector<vector<float>> data 里面:

      xxx.txt 信息就是用来生出 data,xxx_ao.txt 信息则是用来生出 vbo/tbo(坐标,颜色等信息);vbo_data/tbo_data(索引坐标信息)。

    • Areas 区域初始化设置

      这部分的数据结构如下:

    • PFNN模型

      模型的加载和初始化,首先来看其数据结构:

      ArrayXf 这个数据结构是Eigen下面存储 float array的结构。Load函数底下就是加载的文件,很多很多文件啊!

      我们来看上图所示的文件结构就可以发现,pfnn这个网格模型相关的数据内容,主要包含的就是网络模型和角色。

    • 加载游戏世界

      load_world 这些函数,目前来看这些函数里面主要是在做地形标记,所以来说这程序跑起来需要做的地形标记?

  • Game Loop 部分
    • Input处理

      目前只支持手柄,SDL包含跨平台的输入交互模块,细节不解释,见下图

      但事实上不是所有的交互都在这里,在渲染那边很多的主要操作都是直接写在渲染的部分的,但都是用了SDL接口。

    • 渲染

      一共包含前处理,渲染处理,后处理三部分,我们分别来看。

 
 

前处理

  • 更新相机(直接按键确定)

    右手柄摇杆控制相机旋转,LR控制zoomin/zoomout,然后直接作用于相机参数。

  • 更新目标方向和速度(直接按键确定)

    这部分也是直接响应按键输入,按键就确定了用户期望的目标方向和速度。

  • 更新步态(算法数据前处理第一步)

    通过上一时刻的 trajectory 参数 和 options 参数来确定当前时刻 trajectory 的参数。

  • 预测未来的 Trajectory(算法数据前处理第二步)

    通过上一步获得的 trajectory 参数 和 character 参数,来混合获得 trajectory_positions_blend 这个对象

  • 碰撞处理(算法数据前处理第三步)

    根据 areas 的 walls 的信息,来调整 trajectory_positions_blend 的值。

    在这里,又做了一步将 trajectory_positions_blend 的值写回 trajectory

  • 跳跃(算法数据前处理第四步)

    根据 areas 的 jump 的信息,来调整 trajectory 的值。

  • Crouch 区域(算法数据前处理第五步)

    根据 areas 的 crouch_pos 的信息,来调整 trajectory 的值。

  • 墙(算法数据前处理第六步)

    根据 areas 的 walls 的信息,来直接调整 trajectory 的值。

  • Trajectory 旋转(算法数据前处理第七步)

    trajectory->rotations 的值调整

  • Trajectory 高(算法数据前处理第八步)

    根据 heightmap 的值来调整 trajectory 的值

  • 输入的 Trajectory 位置方向(pfnn输入内容第一步)

    Trajectory 信息来获得 pfnn->Xp

  • 输入的 Trajectory 步态(pfnn输入内容第二步)

    Trajectory 信息来获得 pfnn->Xp

  • 输入的 当前的 Joint 位置速度和旋转角度(pfnn输入内容第三步)

    Trajectory 信息来获得 pfnn->Xp

  • 输入的 Trajectory 高度(pfnn输入内容第四步)

    Trajectory 信息来获得 pfnn->Xp

  • Perform Regression 【核心步骤:模型predict

    上面在设置的是pfnn的参数,而这里还需要设置的是predict函数的传入参数,是character->phase

  • 时间处理,这一步就是计算一下predict时间,debug用。
  • Build Local Transformpfnn输出)

    这一步就是运用pfnn的输出结果,来获得角色每个关节的position/velocity/rotation

    这里还需要的一步就是上面得到的关节数据是世界坐标,要转换到局部坐标。

  • IK 处理

    这一步就是对上面获得的关节数据,一个一个的应用到角色的IK关节!

 
 

渲染处理

  • Render Shadow
  • Render Terrain
  • Render Character
  • Render the Rest
  • Render Crouch Area
  • Render Jump Areas
  • Render Walls
  • Render Trajectory
  • Render Joints
  • UI Elements
  • PFNN Visual
  • Display UI

这里都是opengl使用,和AI数据的使用无关,就不在赘述。

 
 

后处理

  • Update Past Trajectory

    Trajectory 数据传递更新

  • Update Current Trajectory

    Trajectory数值计算更新

  • Collide with walls

    Trajectory 碰撞更新

  • Update Future Trajectory

    Trajectory 依据 pfnn结果来做更新

  • Update Phase

  • Update Camera

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

AI4Animation 工程

这里尝试非常多次还是没有办法对于里面的demo信息完整打开且没有报错,可以跑的demo。

因此我们在AIAnimation的基础上首先来看看这个工程怎么使用起来。

 
 

首先来看重点在角色,我们来看其角色的构造:

上面是两个demo的使用结构,可以看到就一个重要的csharp文件,我们来对比分析。

Original对应的是Siggraph17的内容,Adam对应的是Siggraph18的内容,我们先来看17。

 
 

首先看大的结构:

第二个类继承Editor对象,作用是在editor里面形成层级菜单Animation,其余的三个则是分别由另外的三个类来完成。

这三个对象也分别形成了三个子标签菜单项,如上面所示的图。


  • NeuralNetwork 这个类,里面只做了一件事情,就是让用户选择NN模型,也就是说这个类处理的是交互UI表现和逻辑,没有其他。NN里面应该包含的信息都在Model这个类里面。下图就是model里面存储的数据结构:

    然后我们来看接口函数:

    这边是为了兼容和扩展多种的NN方法设置的接口。

    剩下的就是一些Tensor相关的函数,Tensor是对Eigen数据的封装,其真实的计算实现都是由Eigen实现的,这边同时提供了一堆的数据结构关联操作的方法。

    最后model里面涉及的比较重要的内容就是Parameters,这边unity里面主要做的就是加载读取和存储方法。

  • Controller 这个类,处理的是Input,主要就是WSADQE. 还有一个很重要的变量是Styles数组,记录的应该是状态权重。
  • Character 这里做的就是驱动骨架运动。

而作为核心的中介数据 Trajectory 这个类,其就是一组数据点数组,并且包含对这个数组,单个点的操作方法。单个点的数据内容很丰富,包括各种变换和状态等:

 
 

所有的核心使用方法就是在Update函数里面,这边的做法应该是和AIAnimation里面是一模一样的,我们可以对比一下:

  • 只有存在NN模型的情况下,才会执行下面的所有内容。
  • Update Target Direction / Velocity

    这里做的就是:

    TargetDirection = TargetDirection Trajectory定义的当前位置 跟据 TargetBlending 权重混合。

    TargetVelocity = TargetVelocity Controller输入速度 跟据 TargetBlending 权重混合。

  • Update Gait

    Trajectory.Points[RootPointIndex] 的每一个Style的值 = 当前值 和 用户是否点选了要成为该Style 跟据 GaitTransition 权重混合。

  • Predict Future Trajectory

    预测的位置 = 当前位置和前一个位置的差值的延续 和 TargetPosition 差值获得

    预测的style = 延续当前的style

    预测的方向 = 当前的方向 和 TargetDirection 差值获得

  • Avoid Collisions

    保证新的位置可靠,也就是考虑了碰撞。

  • Input Trajectory Positions / Directions

    给NN.Model喂数据,Trajectory的每一个Point的位置和方向(都是xz轴值)

  • Input Trajectory Gaits

    给NN.Model喂数据,Trajectory的每一个Point的Style数组

  • Input Previous Bone Positions / Velocities

    给NN.Model喂数据,Joints的每一个关节的位置和速度

  • Input Trajectory Heights

    给NN.Model喂数据,Trajectory的每一个Point的高度信息(y轴值)

  • Predict【利用模型运算】

  • Update Past Trajectory (轨迹上 i < RootPointIndex 的点)

    更新Trajectory.Points[i] 的每一个点的信息:i位置=i+1位置的值(意思就是向前取一个点)

  • Update Current Trajectory(轨迹上 RootPointIndex 所在的点)

    跟据NN的输出结果来构建一个新的点Trajectory.Points[RootPointIndex]的信息,设置其位置方向

  • Update Future Trajectory(轨迹上 RootPointIndex+1 < i < Trajectory.Points.Length的点)

    每个点新的位置 = 每个点原来位置 + 当前方向 与 跟据模型输出值混合得到的距离和方向 差值(这边做了多重的影响差值考虑)

  • Avoid Collisions

    同 5 做法一致

  • Compute Posture

    positionsrotations两个数组存每一个joint的变换;

    每个 positions[i] = NN返回结果 * 0.5 + 上一个位置按照上一个方向到达的这一个应该在的位置 * 0.5;

    每个 Velocities[i] = NN返回的结果

  • Update Posture

    每个joint的position,rotation直接取上一步数组中对应的值

  • Map to Character

    transform应用在角色上面

     
     

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

AIAnimation 工程安装

Glenn – networked physics

https://gafferongames.com/

 
 

Glenn Fiedler is the founder of Network Next where he’s hard at work as CEO/CTO. Network Next is creating a new internet for games and e-sports.

 
 

第一篇:

Networked Physics in Virtual Reality

https://gafferongames.com/post/networked_physics_in_virtual_reality/

 
 

Introduction

 
 

About a year ago, Oculus approached me and offered to sponsor my research. They asked me, effectively: “Hey Glenn, there’s a lot of interest in networked physics in VR. You did a cool talk at GDC. Do you think could come up with a networked physics sample in VR that we could share with devs? Maybe you could use the touch controllers?”

【一年前,Oculus来找我做网络物理VR相关的研究】

 
 

I replied “F*** yes!” cough “Sure. This could be a lot of fun!”. But to keep it real, I insisted on two conditions. One: the source code I developed would be published under a permissive open source licence (for example, BSD) so it would create the most good. Two: when I was finished, I would be able to write an article describing the steps I took to develop the sample.

【我很开心的答应了,条件是成果必须开源而且可以对外宣讲】

 
 

Oculus agreed. Welcome to that article! Also, the source for the networked physics sample is here, wherein the code that I wrote is released under a BSD licence. I hope the next generation of programmers can learn from my research into networked physics and create some really cool things. Good luck!

【因此就有了这篇博文以及源代码,我希望大家使用这里的相关技术可以做出酷酷的游戏】

 
 

What are we building?

 
 

When I first started discussions with Oculus, we imagined creating something like a table where four players could sit around and interact with physically simulated cubes on the table. For example, throwing, catching and stacking cubes, maybe knocking each other’s stacks over with a swipe of their hand.【Oculus那边希望的场景是四个人围在桌子边,桌上有一堆的方块可以做物理交互。】

 
 

But after a few days spent learning Unity and C#, I found myself actually inside the Rift. In VR, scale is so important. When the cubes were small, everything felt much less interesting, but when the cubes were scaled up to around a meter cubed, everything had this really cool sense of scale. You could make these huge stacks of cubes, up to 20 or 30 meters high. This felt really cool!【但是我在接触VR一段时间后发现 Scale 在VR里面非常重要。如果桌上一堆小的方块你会觉得不那么有趣,但是如果放大到一米一个左右,你会发现整个世界非常的酷。】

 
 

It’s impossible to communicate visually what this feels like outside of VR, but it looks something like this…

 
 

 
 

… where you can select, grab and throw cubes using the touch controller, and any cubes you release from your hand interact with the other cubes in the simulation. You can throw a cube at a stack of cubes and knock them over. You can pick up a cube in each hand and juggle them. You can build a stack of cubes and see how high you can make it go.【这种感觉很难表达,看起来就像这个图所示的这些方块你可以物理的交互他们。】

 
 

Even though this was a lot of fun, it’s not all rainbows and unicorns. Working with Oculus as a client, I had to define tasks and deliverables before I could actually start the work.【在Oculus工作,我需要在开工前确定任务目标】

 
 

I suggested the following criteria we would use to define success:

  1. Players should be able to pick up, throw and catch cubes without latency.
  2. Players should be able to stack cubes, and these stacks should be stable (come to rest) and be without visible jitter.
  3. When cubes thrown by any player interact with the simulation, wherever possible, these interactions should be without latency.

【目标包括这三个点:首先是玩家可以无延迟感知的拾取,扔抓方块们;然后是玩家可以堆叠立方体,并且会保持稳定(静止);做到正式的延迟最小化。】
 

At the same time I created a set of tasks to work in order of greatest risk to least, since this was R&D, there was no guarantee we would actually succeed at what we were trying to do.【与此同时我创造了一系列工作任务,将工作风险最小化。因为这是研发,我们不能保证我们实际上能够成功实现我们的目标。】

 
 

 
 

Network Models

 
 

First up, we had to pick a network model. A network model is basically a strategy, exactly how we are going to hide latency and keep the simulation in sync.【首先我们需要选择网络模型,网络模型就是一套我们怎么隐藏延迟保持同步的机制。】

 
 

There are three main network models to choose from:

  1. Deterministic lockstep
  2. Client/server with client-side prediction
  3. Distributed simulation with authority scheme

【一共三种模型:确定性锁步(帧同步),客户端预测和客户端/服务器模式,采用授权方案的分布式模拟】

 
 

I was instantly confident of the correct network model: a distributed simulation model where players take over authority of cubes they interact with. But let me share with you my reasoning behind this.【我很有信心的选择了我认为正确的网络模型:分布式模拟,每个玩家是其交互方块的authority,我来慢慢解释为什么选这个。】

 
 

First, I could trivially rule out a deterministic lockstep network model, since the physics engine inside Unity (PhysX) is not deterministic. Furthermore, even if PhysX was deterministic I could still rule it out because of the requirement that player interactions with the simulation be without latency.【首先我很确性的排除帧同步的方案,因为unity的物理引擎结果的不确定性。此外,即使PhysX是确定性的,我仍然可以排除它,因为这里我们的要求是玩家模拟交互没有延迟。】

 
 

The reason for this is that to hide latency with deterministic lockstep I needed to maintain two copies of the simulation and predict the authoritative simulation ahead with local inputs prior to render (GGPO style). At 90HZ simulation rate and with up to 250ms of latency to hide, this meant 25 physics simulation steps for each visual render frame. 25X cost is simply not realistic for a CPU intensive physics simulation.【理由是帧同步隐藏延时,我需要保留两个仿真副本,并在渲染之前使用本地输入预测权威仿真。在90HZ的模拟速率下隐藏时间达到250ms,这意味着每个视觉渲染帧需要25个物理模拟步骤。 对于CPU密集型物理模拟来说,25X成本根本不现实。】【这边我的理解:90hz说的就是一秒刷新90帧,250ms延时意味着4帧/s,一帧25个物理模拟步骤意味着一秒一共需要100个物理模拟步骤,这是对不上的。其实这边理解很简单的应该就是每秒需要计算90次物理模拟步骤,对于CPU压力会很大。】

 
 

This leaves two options: a client/server network model with client-side prediction (perhaps with dedicated server) and a less secure distributed simulation network model.【这样就还剩下两种方案】

 
 

Since this was a non-competitive sample, there was little justification to incur the cost of running dedicated servers. Therefore, whether I implemented a client/server model with client-side prediction or distributed simulation model, the security would be effectively the same. The only difference would be if only one of the players in the game could theoretically cheat, or all of them could.【由于这是一个非竞争性的样本,没有理由承担运行专用服务器的成本来保证公平和安全。】

 
 

For this reason, a distributed simulation model made the most sense. It had effectively the same amount of security, and would not require any expensive rollback and resimulation, since players simply take authority over cubes they interact with and send the state for those cubes to other players.【因为这些理由,我们选择分布式模拟的方案。它实际上具有相同的安全性,并且不需要任何昂贵的回滚和重新模拟,因为玩家只需对与其交互的立方体授权并将这些立方体的状态发送给其他玩家。】

 
 

 
 

Authority Scheme

 
 

While it makes intuitive sense that taking authority (acting like the server) for objects you interact can hide latency – since, well if you’re the server, you don’t experience any lag, right? – what’s not immediately obvious is how to resolve conflicts.【尽管直观地表明,为你所交互的对象赋予权限(像服务器一样)可以隐藏延迟因为如果你是服务器,那么你不会遇到任何延迟,对吧?这里存在的明显问题就是如何解决冲突。】

 
 

What if two players interact with the same stack? What if two players, masked by latency, grab the same cube? In the case of conflict: who wins, who gets corrected, and how is this decided?【如果两个玩家与同一个堆栈交互会怎样?
如果两个玩家,在延迟的掩盖下抓住同一个立方体怎么办? 在冲突的情况下:谁获胜,谁得到纠正,这是如何决定的?】

 
 

My intuition at this point was that because I would be exchanging state for objects rapidly (up to 60 times per-second), that it would be best to implement this as an encoding in the state exchanged between players over my network protocol, rather than as events.【因为我会经常快速地交换对象的状态(每秒最多60次),所以直觉上最好在网络中交换状态时使用编码,而不是作为事件。】

 
 

I thought about this for a while and came up with two key concepts:

  1. Authority
  2. Ownership

【这里先列出两个关键的概念:AuthorityOwnership

 
 

Each cube would have authority, either set to default (white), or to whatever color of the player that last interacted with it. If another player interacted with an object, authority would switch and update to that player. I planned to use authority for interactions of thrown objects with the scene. I imagined that a cube thrown by player 2 could take authority over any objects it interacted with, and in turn any objects those objects interacted with, recursively.【每个方块都必须要有Authority,默认的或者就是最后一次与他交互的玩家。如果其他玩家于这个方块交互,authority需要跟新切换到这个新的玩家。我计划使用Authority来处理投掷对象与场景的交互。
我想到玩家2抛出的立方体可以对与其交互的任何其他对象拥有Authority,并且对这些对象进行交互的任何对象都是递归的。】

 
 

Ownership was a bit different. Once a cube is owned by a player, no other player could take ownership until that player reliquished ownership. I planned to use ownership for players grabbing cubes, because I didn’t want to make it possible for players to grab cubes out of other player’s hands after they picked them up.Ownership 则不一样,一旦一个立方体属于一个玩家,其他玩家在这个玩家释放Ownership权限前都无法获得这个立方体的Ownership权限。我计划将Ownership用于抓取立方体的玩家,因为我不想让玩家抢夺其他玩家手中抓取的立方体。】

 
 

I had an intuition that I could represent and communicate authority and ownership as state by including two different sequence numbers per-cube as I sent them: an authority sequence, and an ownership sequence number. This intuition ultimately proved correct, but turned out to be much more complicated in implementation than I expected. More on this later.【我想到可以通过 an authority sequence, and an ownership sequence number 来做表示和同步,这个最后被证明是可行的,后面说实现细节。】

 
 

 
 

State Synchronization

 
 

Trusting I could implement the authority rules described above, my first task was to prove that synchronizing physics in one direction of flow could actually work with Unity and PhysX. In previous work I had networked simulations built with ODE, so really, I had no idea if it was really possible.【要证实我可以实现上述的权威规则,我的第一个任务是证明UnityPhysX情况下物理同步是可行的。】

 
 

To find out, I setup a loopback scene in Unity where cubes fall into a pile in front of the player. There are two sets of cubes. The cubes on the left represent the authority side. The cubes on the right represent the non-authority side, which we want to be in sync with the cubes on the left.【我设置了一个unity测试场景,两坨方块,左边的是Authority,右边的是同步得到的结果,就是Non-Authority。】

 
 

 
 

At the start, without anything in place to keep the cubes in sync, even though both sets of cubes start from the same initial state, they give slightly different end results. You can see this most easily from top-down:【两边一模一样的初始化开始,但是得到了不一样的最终结果,如下图。】

 
 

 
 

This happens because PhysX is non-deterministic. Rather than tilting at non-determinstic windmills, I fight non-determinism by grabbing state from the left side (authority) and applying it to the right side (non-authority) 10 times per-second:【这是因为Physx是不确定性的,我通过每秒10次同步状态来解决不确定性这个问题。】

 
 

 
 

The state I grab from each cube looks like this:【对于每个方块同步的状态组成如下】

 
 

 
 

And when I apply this state to the simulation on the right side, I simply snap the position, rotation, linear and angular velocity of each cube to the state captured from the left side.【当我将这个状态应用到右侧的模拟中时,我只需将左侧每个立方体捕获的位置,旋转,线性和角速度等状态赋给右侧。】

 
 

This simple change is enough to keep the left and right simulations in sync. PhysX doesn’t even diverge enough in the 1/10th of a second between updates to show any noticeable pops.【这个简单的改变就可以保证从左边到右边的模拟同步,Physx不会产生足够的不一致。】

 
 

 
 

This proves that a state synchronization based approach for networking can work with PhysX. (Sigh of relief). The only problem of course, is that sending uncompressed physics state uses way too much bandwidth…【这证实physx支持基于网络状态的同步,唯一的问题当然是发送未压缩的物理状态使用了太多的带宽

 
 

 
 

Bandwidth Optimization

 
 

To make sure the networked physics sample is playable over the internet, I needed to get bandwidth under control.【接下来我们要做的就是优化带宽】

 
 

The easiest gain I found was to simply encode the state for at rest cubes more efficiently. For example, instead of repeatedly sending (0,0,0) for linear velocity and (0,0,0) for angular velocity for at rest cubes, I send just one bit:【一个简单的高收益方法就是更有效地编码静态方块。例如下方所示,velocity选项变的可有可无。】

 
 

 
 

This is lossless technique because it doesn’t change the state sent over the network in any way. It’s also extremely effective, since statistically speaking, most of the time the majority of cubes are at rest.【这是无损的一种方法,没有改变任何的网络状态,但同时非常的有效,因为场景中存在的大量方块都是静态的。】

 
 

To optimize bandwidth further we need to use lossy techniques. For example, we can reduce the precision of the physics state sent over the network by bounding position in some min/max range and quantizing it to a resolution of 1/1000th of a centimeter and sending that quantized position as an integer value in some known range. The same basic approach can be used for linear and angular velocity. For rotation I used the smallest three representation of a quaternion.【为了进一步优化带宽,我们需要使用有损技术。
例如我们通过在某些最小/最大范围内限定并量化为1/1000厘米的分辨率,其可以用整数来替代,有效的减少数据量。 相同的基本方法可以用于速度和角速度。 旋转我使用了通过三个值来表示的四元数来表示。】

 
 

But while this saves bandwidth, it also adds risk. My concern was that if we are networking a stack of cubes (for example, 10 or 20 cubes placed on top of each other), maybe the quantization would create errors that add jitter to that stack. Perhaps it would even cause the stack to become unstable, but in a particularly annoying and hard to debug way, where the stack looks fine for you, and is only unstable in the remote view (eg. the non-authority simulation), where another player is watching what you do.【这虽然节省了带宽,但也增加了风险。
我担心的是如果我们联网了一堆立方体(例如,1020个立方体放在另一个立方体上),也许量化会产生jitter这类的错误。 也许甚至会导致堆栈变得不稳定(自己是好的,别人看到的你都是有问题的)。】

 
 

The best solution to this problem that I found was to quantize the state on both sides. This means that before each physics simulation step, I capture and quantize the physics state exactly the same way as when it’s sent over the network, then I apply this quantized state back to the local simulation.【我发现这个问题的最佳解决方案是量化双方的状态。
就是说在每个物理模拟步骤之前,我捕获和量化物理状态,同时用于本地模拟和网络发送。】

 
 

Now the extrapolation from quantized state on the non-authority side exactly matches the authority simulation, minimizing jitter in large stacks. At least, in theory.【现在,Non-Authority方的状态和Authority的模拟完全匹配,最大限度的减少了Jitter。

 
 

 
 

Coming To Rest

 
 

But quantizing the physics state created some very interesting side-effects!【但量化物理状态创造了一些非常有趣的副作用!】

 
 

  1. PhysX doesn’t really like you forcing the state of each rigid body at the start of every frame and makes sure you know by taking up a bunch of CPU.PhysX并不真的喜欢你在每一帧开始时强制每个刚体的状态,且这事CPU占用严重。】
  2. Quantization adds error to position which PhysX tries very hard to correct, snapping cubes immediately out of penetration with huge pops!【量化会增加PhysX尝试纠正的位置错误,这个有时候会导致方块失控。】
  3. Rotations can’t be represented exactly either, again causing penetration. Interestingly in this case, cubes can get stuck in a feedback loop where they slide across the floor!【旋转不能这么搞,会导致立方体可能会卡在反馈回路中,并在地面上滑动!】
  4. Although cubes in large stacks seem to be at rest, close inspection in the editor reveals that they are actually jittering by tiny amounts, as cubes are quantized just above surface and falling towards it.【尽管大堆中的立方体似乎处于静止状态,但编辑人员仔细检查发现,实际上它们实际上在微小的抖动。】

 
 

There’s not much I could do about the PhysX CPU usage, but the solution I found for the depenetration was to set maxDepenetrationVelocity on each rigid body, limiting the velocity that cubes are pushed apart with. I found that one meter per-second works very well.【关于PhysX CPU使用情况我没有太多可以做的,但是我发现的解决方案的解决方案是在每个刚体上设置maxDepenetrationVelocity,限制立方体被推开的速度。 我发现每秒一米的效果非常好。】

 
 

Getting cubes to come to rest reliably was much harder. The solution I found was to disable the PhysX at rest calculation entirely and replace it with a ring-buffer of positions and rotations per-cube. If a cube has not moved or rotated significantly in the last 16 frames, I force it to rest. Boom. Perfectly stable stacks with quantization.【让立方体完全静止是非常困难的。
我找到的解决方案是完全停用PhysX,并将其替换为每个立方体的位置和旋转的ring-buffer。 如果一个立方体在最后的16帧中没有明显的移动旋转,我强迫它休息(静止)。 】

 
 

Now this might seem like a hack, but short of actually getting in the PhysX source code and rewriting the PhysX solver and at rest calculations, which I’m certainly not qualified to do, I didn’t see any other option. I’m happy to be proven wrong though, so if you find a better way to do this, please let me know 🙂【这可能看起来不是完美的解决方案,最好的就是进入PhysX源代码并重新编写PhysX解算器,加入休眠计算,我没有办法这么做,但我没有看到任何其他好的替代方案,你能解决的话求告诉我。】

 
 

 
 

Priority Accumulator

 
 

The next big bandwidth optimization I did was to send only a subset of cubes in each packet. This gave me fine control over the amount of bandwidth sent, by setting a maximum packet size and sending only the set of updates that fit in each packet.【我做的下一个带来巨大带宽优化的做法是每个网络包发送一组方块的数据。
这使我能够很好地控制发送带宽量,方法是设置最大数据包并只发送每个数据包的更新集。(优先级分包发送)】

 
 

Here’s how it works in practice:【具体做法】

 
 

  1. Each cube has a priority factor which is calculated each frame. Higher values are more likely to be sent. Negative values mean “don’t send this cube”.【每个立方体都具有优先系数,更高的值更可能被发送,负值意味着”不要发送这个立方体”。】
  2. If the priority factor is positive, it’s added to the priority accumulator value for that cube. This value persists between simulation updates such that the priority accumulator increases each frame, so cubes with higher priority rise faster than cubes with low priority.【如果方块的优先级因子为正,则将其添加到其优先级累加器值。
    该值在模拟期间持续存在,每帧不清0
  3. Negative priority factors clear the priority accumulator to -1.0.【否定优先因子将优先累加器清除为-1.0。】
  4. When a packet is sent, cubes are sorted in order of highest priority accumulator to lowest. The first n cubes become the set of cubes to potentially include in the packet. Objects with negative priority accumulator values are excluded.【按照优先级累加器数值从高到低排列方块,前n个立方体成为一个数据包中的立方体集合,排除具有负优先级累加器值的对象。】
  5. The packet is written and cubes are serialized to the packet in order of importance. Not all state updates will necessarily fit in the packet, since cube updates have a variable encoding depending on their current state (at rest vs. not at rest and so on). Therefore, packet serialization returns a flag per-cube indicating whether it was included in the packet.【立方体按照重要性排序串行化到数据包,不是所有的方块状态都需要更新。】
  6. Priority accumulator values for cubes sent in the packet are cleared to 0.0, giving other cubes a fair chance to be included in the next packet.【数据包中发送的方块的优先累加器值被清除为0.0,使其他多维数据集有机会被包含在下一个数据包中。】

 
 

For this demo I found some value in boosting priority for cubes recently involved in high energy collisions, since high energy collision was the largest source of divergence due to non-deterministic results. I also boosted priority for cubes recently thrown by players.【因为高能碰撞是非确定性结果引起的最大分歧源,因此我提高了最近抛出的立方体的优先级。】

 
 

Somewhat counter-intuitively, reducing priority for at rest cubes gave bad results. My theory is that since the simulation runs on both sides, at rest cubes would get slightly out of sync and not be corrected quickly enough, causing divergence when other cubes collided with them.【有点反直觉,降低休息立方体的优先级给出了不好的结果。
我猜测是由于收发双方都在模拟运行,静止立方体的略有的不同步不能很快纠正的话,其他立方体与它们碰撞时就会引起分歧。】

 
 

 
 

Delta Compression

 
 

Even with all the techniques so far, it still wasn’t optimized enough. With four players I really wanted to get the cost per-player down under 256kbps, so the entire simulation could fit into 1mbps for the host.【即使采用了上面所讲的所有技术,仍然不够。
我希望每人带宽低于256kbps当四个人在场景里面玩耍的时候,这样主机也就控制在1mbps。】

 
 

I had one last trick remaining: delta compression.【我还剩下最后一招:增量压缩】

 
 

First person shooters often implement delta compression by compressing the entire state of the world relative to a previous state. In this technique, a previous complete world state or ‘snapshot’ acts as the baseline, and a set of differences, or delta, between the baseline and the current snapshot is generated and sent down to the client.【第一人称射击者通常对相对于先前状态的整个世界的所有状态来实施增量压缩。
在这种技术中,先前的完整世界状态充当基准,并且生成过去和当前之间的一组差异或增量,并将其发送到客户端。】

 
 

This technique is (relatively) easy to implement because the state for all objects are included in each snapshot, thus all the server needs to do is track the most recent snapshot received by each client, and generate deltas from that snapshot to the current.【由于所有对象的状态都包含在每个快照中,所以该技术相对容易实现,因此所有服务器需要执行的操作是跟踪每个客户端接收到的最新快照,并生成从该快照到当前的增量。】

 
 

However, when a priority accumulator is used, packets don’t contain updates for all objects and delta encoding becomes more complicated. Now the server (or authority-side) can’t simply encode cubes relative to a previous snapshot number. Instead, the baseline must be specified per-cube, so the receiver knows which state each cube is encoded relative to.【但是使用了优先级累加器后,数据包不包含所有对象的更新,这使得增量编码变得更加复杂。
现在服务器不能简单地将立方体相对于先前的快照进行编码。 而是必须指定每个多维数据集的基线,以便接收器知道每个多维数据集相对于哪个进行编码。】

 
 

The supporting systems and data structures are also much more complicated:【支持系统和数据结构也复杂得多】

  1. A reliability system is required that can report back to the sender which packets were received, not just the most recently received snapshot.【系统需要能够向发送方报告接收到哪些数据包,而不仅仅是最近接收到的快照】
  2. The sender needs to track the states included in each packet sent, so it can map packet level acks to sent states and update the most recently acked state per-cube. The next time a cube is sent, its delta is encoded relative to this state as a baseline.【发送者需要跟踪发送的每个数据包中包含的状态,因此它可以将数据包级别映射到发送状态,并更新最近每个多维数据集的最近查询状态。
    下一次发送多维数据集时,它的增量将相对于此状态编码为基准。】(这边意思就是说数据包级别和发送状态是一个变量标识的,也由此来识别编码基准)
  3. The receiver needs to store a ring-buffer of received states per-cube, so it can reconstruct the current cube state from a delta by looking up the baseline in this ring-buffer.【接收器的环形缓冲区需要存储每个立方体接收到的状态,因此它可以在该环形缓冲区中查找任意一个当前的立方体状态的基线来重建当前状态。】

 
 

But ultimately, it’s worth the extra complexity, because this system combines the flexibility of being able to dynamically adjust bandwidth usage, with the orders of magnitude bandwidth improvement you get from delta encoding.【但这一切的复杂性是值得的,因为这方案能够动态调整带宽的使用,同时获得数量级级别的带宽改进。】

 
 

 
 

Delta Encoding

 
 

Now that I have the supporting structures in place, I actually have to encode the difference of a cube relative to a previous baseline state. How is this done?【上面讲到了整个增量压缩的架构,实际上怎么进行差异编码这里来讲。】

 
 

The simplest way is to encode cubes that haven’t changed from the baseline value as just one bit: not changed. This is also the easiest gain you’ll ever see, because at any time most cubes are at rest, and therefore aren’t changing state.【最简单的方法是用一位来标记有没有变化。
这其实收益非常大,因为在任何时候大多数立方体都不会改变状态。】

 
 

A more advanced strategy is to encode the difference between the current and baseline values, aiming to encode small differences with fewer bits. For example, delta position could be (-1,+2,+5) from baseline. I found this works well for linear values, but breaks down for deltas of the smallest three quaternion representation, as the largest component of a quaternion is often different between the baseline and current rotation.【更高级的策略是对当前值和基准值之间的差异进行编码,目的就是用较少的位对差异进行编码。
例如标记起点从基线(-1+ 2+ 5)开始。 我发现这适用于线性值,但是对于用三个值表示的四元数作用不大。】

 
 

Furthermore, while encoding the difference gives some gains, it didn’t provide the order of magnitude improvement I was hoping for. In a desperate, last hope, I came up with a delta encoding strategy that included prediction. In this approach, I predict the current state from the baseline assuming the cube is moving ballistically under acceleration due to gravity.【此外,虽然编码差异带来了一些收益,但它达不到数量级的提升。
最后我想出了一种包含预测的增量编码策略。 在这种方法中,假设立方体由于重力而在加速度下运动,我可以基于此来预测基线的当前状态】

 
 

Prediction was complicated by the fact that the predictor must be written in fixed point, because floating point calculations are not necessarily guaranteed to be deterministic. But after a few days of tweaking and experimentation, I was able to write a ballistic predictor for position, linear and angular velocity that matched the PhysX integrator within quantize resolution about 90% of the time.【预测因子写入固定值这做法很难,因为浮点计算不一定保证确定性。
但经过几天的调整和实验之后,我可以写出一个弹道预测器,与PhysX结果匹配的位置,速度和角速度相似性大于90%。】

 
 

These lucky cubes get encoded with another bit: perfect prediction, leading to another order of magnitude improvement. For cases where the prediction doesn’t match exactly, I encoded small error offset relative to the prediction.【这些幸运的cubes用一个bit来表示,使得压缩效果达到了数量级的提升。
这种情况下我的编码结果和真实的结果有一个小的错误偏移量。】

 
 

In the time I had to spend, I not able to get a good predictor for rotation. I blame this on the smallest three representation, which is highly numerically unstable, especially in fixed point. In the future, I would not use the smallest three representation for quantized rotations.【但是对于旋转量是没有办法这么做的,因为四元数的三个值数值上是非常不稳定的。 将来,我不会使用三个值表示的四元数来表示旋转。】

 
 

It was also painfully obvious while encoding differences and error offsets that using a bitpacker was not the best way to read and write these quantities. I’m certain that something like a range coder or arithmetic compressor that can represent fractional bits, and dynamically adjust its model to the differences would give much better results, but I was already within my bandwidth budget at this point and couldn’t justify any further noodling 🙂【编码差异和错误补偿也是非常明显的,因为这种封装方式并不是读取和写入这些数值的最佳方式。
我确定采用范围编码器或算术压缩器,并根据差异动态调整这两个工具模型会得到更好的结果,但此时我已经达到了我带宽目标,不再继续处理了🙂

 
 

 
 

Synchronizing Avatars

 
 

After several months of work, I had made the following progress:【经过几个月的工作,我取得了以下进展:】

 
 

  • Proof that state synchronization works with Unity and PhysX【证明状态同步适用于UnityPhysX
  • Stable stacks in the remote view while quantizing state on both sides【远程和本地状态同步且稳定】
  • Bandwidth reduced to the point where all four players can fit in 1mbps【四个玩家的情况下带宽控制在了1mbps

 
 

The next thing I needed to implement was interaction with the simulation via the touch controllers. This part was a lot of fun, and was my favorite part of the project 🙂【我需要实现的下一件事是通过Touch控制器交互,这部分我非常喜欢且非常有趣】

 
 

I hope you enjoy these interactions. There was a lot of experimentation and tuning to make simple things like picking up, throwing, passing from hand to hand feel good, even crazy adjustments to ensure throwing worked great, while placing objects on top of high stacks could still be done with high accuracy.【我希望你喜欢这些交互,这里我们为了保证拾取,投掷,传球等动作效果好,做了疯狂的实验和调整】

 
 

But when it comes to networking, in this case the game code doesn’t count. All the networking cares about is that avatars are represented by a head and two hands driven by the tracked headset and touch controller positions and orientations.【一般情况下头和双手都是由头盔和手柄的位置和方向驱动的。但是对于网络传输过去的avatar,本地设备驱动是不奏效的。】

 
 

To synchronize this I captured the position and orientation of the avatar components in FixedUpdate along the rest of the physics state, and applied this state to the avatar components in the remote view.【我获取了avatar组件的位置和旋转量来同步传输和应用给远端的avatar。】

 
 

But when I first tried this it looked absolutely awful. Why?【但是当我第一次尝试这个时,它看起来非常糟糕。
为什么?】

 
 

After a bunch of debugging I worked out that the avatar state was sampled from the touch hardware at render framerate in Update, and was applied on the other machine at FixedUpdate, causing jitter because the avatar sample time didn’t line up with the current time in the remote view.【debug的时候发现,硬件设备的位置跟新和基于渲染的frame更新,时间是不一致的,这个会导致远程跟新jitter。】

 
 

To fix this I stored the difference between physics and render time when sampling avatar state, and included this in the avatar state in each packet. Then I added a jitter buffer with 100ms delay to received packets, solving network jitter from time variance in packet delivery and enabling interpolation between avatar states to reconstruct a sample at the correct time.【为了解决这个问题,我存储物理和渲染的时候的avatar的状态差异,并将其包含在每个avatar状态的数据包中。
然后我添加了一个100ms的延迟抖动缓冲区接收数据包,解决数据包传输时间差异造成的网络抖动,并在avatar状态之间启用插值,确保效果。】

 
 

To synchronize cubes held by avatars, while a cube is parented to an avatar’s hand, I set the cube’s priority factor to -1, stopping it from being sent with regular physics state updates. While a cube is attached to a hand, I include its id and relative position and rotation as part of the avatar state. In the remote view, cubes are attached to the avatar hand when the first avatar state arrives with that cube parented to it, and detached when regular physics state updates resume, corresponding to the cube being thrown or released.【为了同步由Avatar持有的cubes,当一个cube处于化身状态时,我将其优先因子设置为-1,从而阻止它以常规物理状态更新发送。当一个cube附着在手上时,我会将其ID和相对位置以及旋转作为avatar的状态的一部分。 在远端当第一个avatar状态到达时,立方体与该虚拟形体相连,并在常规物理状态更新恢复时分离,对应于正在抛出或释放的立方体。】

 
 

 
 

Bidirectional Flow

 
 

Now that I had player interaction with the scene working with the touch controllers, it was time to start thinking about how the second player can interact with the scene as well.【现在我已经可以使用控制器与场景进行交互了,现在开始考虑第二个玩家如何与场景互动。】

 
 

To do this without going insane switching between two headsets all the time (!!!), I extended my Unity test scene to be able to switch between the context of player one (left) and player two (right).【要做到这一点,为了测试的时候无需一直在两个头盔之间进行疯狂的切换,我扩展了Unity测试场景,以便能够在玩家1(左)和玩家2(右)之间切换。】

 
 

I called the first player the “host” and the second player the “guest”. In this model, the host is the “real” simulation, and by default synchronizes all cubes to the guest player, but as the guest interacts with the world, it takes authority over these objects and sends state for them back to the host player.【我把第一个玩家称为”host”,第二个玩家称为”guest”。
在这个模型中,host是”真实”模拟,默认情况下所有立方体都会同步到guest玩家,但是随着guest与世界的交互,它需要对这些对象进行控制,并将状态发送回host玩家。】

 
 

To make this work without inducing obvious conflicts the host and guest both check the local state of cubes before taking authority and ownership. For example, the host won’t take ownership over a cube already under ownership of the guest, and vice versa, while authority is allowed to be taken, to let players throw cubes at somebody else’s stack and knock it over while it’s being built.【为了解决冲突,host和guest在获得权限和所有权之前都会检查立方体的本地状态。例如,host不会对guest拥有所有权的cubes拥有所有权,反之亦然。】

 
 

Generalizing further to four players, in the networked physics sample, all packets flow through the host player, making the host the arbiter. In effect, rather than being truly peer-to-peer, a topology is chosen that all guests in the game communicate only with the host player. This lets the host decide which updates to accept, and which updates to ignore and subsequently correct.【进一步推广到四个玩家,在网络物理示例中,所有数据包都会流经host玩家,让host仲裁。
这实际上不是真正的点对点,而是选择游戏中的所有guest仅与host玩家通信的拓扑方式。 这让host可以决定接受哪些更新以及忽略哪些更新并随后进行更正。】

 
 

To apply these corrections I needed some way for the host to override guests and say, no, you don’t have authority/ownership over this cube, and you should accept this update. I also needed some way for the host to determine ordering for guest interactions with the world, so if one client experiences a burst of lag and delivers a bunch of packets late, these packets won’t take precedence over more recent actions from other guests.【要应用这些更正,我需要某种方式让host覆盖guest的结果。我还需要一些方法让host确定这些guests与整个世界互动的顺序。因此如果一个客户端经历了一段时间的延迟并延迟发送大量数据包,这些数据包的优先级将低于来自其他客户的数据包。】

 
 

As per my hunch earlier, this was achieved with two sequence numbers per-cube:【按照我之前的做法,这一切通过每个立方体的两个序列号实现的:】

  1. Authority sequence
  2. Ownership sequence

 
 

These sequence numbers are sent along with each state update and included in avatar state when cubes are held by players. They are used by the host to determine if it should accept an update from guests, and by guests to determine if the state update from the server is more recent and should be accepted, even when that guest thinks it has authority or ownership over a cube.【这些序号随着每个状态更新一起发送,并包含玩家持有的cubes的状态。
host使用它们来确定是否应接受来自guest的更新,对于guest来讲则是强制更新来自host的状态信息。】

 
 

Authority sequence increments each time a player takes authority over a cube and when a cube under authority of a player comes to rest. When a cube has authority on a guest machine, it holds authority on that machine until it receives confirmation from the host before returning to default authority. This ensures that the final at rest state for cubes under guest authority are committed back to the host, even under significant packet loss.【每次玩家取得和释放cube的权限,权限序列都会增加。
当guest具有一个cube的权限时,它将保持这个权限直到它从host接收到返回默认状态的信息。 这确保了即使在大量丢包的情况下,guest权限下的cubes处于静止状态也会被提交给主机(因为主机不确认就一直随着avatar发送信息)。】

 
 

Ownership sequence increments each time a player grabs a cube. Ownership is stronger than authority, such that an increase in ownership sequence wins over an increase in authority sequence number. For example, if a player interacts with a cube just before another player grabs it, the player who grabbed it wins.【每次玩家抓取cube时,ownership序列都会增加。
ownership比authority更强大,因此ownership序列的增加会赢得authority 序列号的增加。 例如,如果玩家在另一个玩家抓住它之前与立方体进行交互,那么抓住它的玩家将获胜。】

 
 

In my experience working on this demo I found these rules to be sufficient to resolve conflicts, while letting host and guest players interact with the world lag free. Conflicts requiring corrections are rare in practice even under significant latency, and when they do occur, the simulation quickly converges to a consistent state.【根据我在这个demo中的经验,我发现这些规则足以解决冲突,同时让host和guest与世界进行互动。
即使在严重的等待时间下,需要更正的冲突在实践中也很少见,而且当它们确实发生时,仿真会迅速收敛到一致的状态。】

 
 

 
 

Conclusion

 
 

High quality networked physics with stable stacks of cubes is possible with Unity and PhysX using a distributed simulation network model.UnityPhysX使用分布式仿真网络模型,可以实现高品质的网络物理和稳定的立方体堆栈。】

 
 

This approach is best used for cooperative experiences only, as it does not provide the security of a server-authoritative network model with dedicated servers and client-side prediction.【这种方法最适合仅用于协作体验,因为它不能提供具有专用服务器和客户端预测的服务器权威网络模型的安全性。(也就是安全性是不够的)】

 
 

Thanks to Oculus for sponsoring my work and making this research possible!【感谢Oculus赞助我的工作并使这项研究成为可能!】

 
 

The source code for the networked physics sample can be downloaded here.

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

GPU Gems – Animation in the “Dawn” Demo

4.1 Introduction

 
 

“Dawn” is a demonstration(示范) that was created by NVIDIA Corporation to introduce the GeForce FX product line and illustrate how a high-level language (such as HLSL or Cg) could be used to create a realistic human character. The vertex shaders deform a high-resolution mesh through indexed skinning and morph targets, and they provide setup for the lighting model used in the fragment shaders. The skin and wing fragment shaders offer both range and detail that could not have been achieved before the introduction of advanced programmable graphics hardware. See Figure 4-1.

【Dawn是Nvidia新产品中怎样将HLSL应用于真人角色的示范,主要涉及Vertex shader用于morphing, fragment shader用于光照。】

 
 


Figure 4-1 A Screen Capture of the Real-Time Dawn

 
 

This chapter discusses how programmable graphics hardware was used to accelerate the animation of the Dawn character in the demo.

【这里讲的就是如何使用图形硬件编程来加速Draw的角色动画。】

 
 

 
 

 
 

4.2 Mesh Animation

 
 

Traditionally, mesh animation has been prohibitively expensive for complex meshes because it was performed on the CPU, which was already burdened with physical simulation, artificial intelligence, and other computations required by today’s applications. Newer graphics hardware has replaced the traditional fixed-function pipeline with programmable vertex and fragment shaders, and it can now alleviate some of that burden from the CPU.

【传统的网格动画开销非常贵因为局限于CPU顶点计算,而CPU还承担其他的大量的工作,新的图形硬件带来的vertex/fragment shader可以分担部分CPU工作。】

 
 

Sometimes it is still necessary to perform such operations on the CPU. Many stencil-based shadow volume techniques must traverse the transformed mesh in order to find the silhouette edges, and the generation of the dynamic shadow frustum is often best done on the CPU (see Chapter 9, “Efficient Shadow Volume Rendering”). In scenes where the character is drawn multiple times per frame into shadow buffers, glow buffers, and other such temporary surfaces, it may be better to perform the deformations on the CPU if the application becomes vertex-limited. Deciding whether to perform mesh deformations on the CPU or on the GPU should be done on a per-application or even on a per-object basis.

【有时候还是需要将网格动画计算放在CPU执行,因为像体阴影需要实时去找到阴影遮挡关系来计算阴影,这些工作是在CPu上面做的。当一个character需要很多次的去渲染到模版/texture的时候就比较不适合使用GPU运算。】

 
 

The modeling, texturing, and animation of the Dawn character were done primarily in Alias Systems’ Maya package. We therefore based our mesh animation methods on the tool set the software provides. We have since created a similar demo (“Dusk,” used to launch the GeForce FX 5900) in discreet’s 3ds max package, using the same techniques; these methods are common to a variety of modeling packages and not tied to any single workflow. The methods used in these two demos are (indexed) skinning, where vertices are influenced by a weighted array of matrices, and weighted morph targets, used to drive the emotions on Dawn’s face.

【dawn美术资源来源于maya,用于这里的morphing demo】

 
 

 
 

4.3 Morph Targets

 
 

Using morph targets is a common way to represent complex mesh deformation, and the NVIDIA demo team has created a variety of demos using this technique. The “Zoltar” demo and the “Yeah! The Movie” demo (content provided by Spellcraft Studio) started with 30 mesh interpolants per second, then removed mesh keys based on an accumulated error scheme. This allowed us to reduce the file size and the memory footprint—up to two-thirds of the original keys could be removed with little to no visible artifacts. In this type of mesh interpolation, there are only two interpolants active at any given time, and they are animated sequentially.

【morphing 常用于网格变形,nvidia也做过很多相关demo。】

 
 

Alternatively, morph targets can be used in parallel. Dawn is a standard example of how this approach can be useful. Beginning with a neutral head (27,000 triangles), our artist created 50 copies of that head and modeled them into an array of morph targets, as shown in Figure 4-2. Approximately 30 of those heads corresponded to emotions (such as happy, sad, thoughtful, and so on), and 20 more were modifiers (such as left eyebrow up, right eyebrow up, smirk, and so on). In this style of animation, the morph target weights will probably not add to 1, because you may have (0.8 * happy + 1.0 * ear_wiggle), for example—Dawn is a fairy, after all.

【另外,morph target也可是并行的。Dawn 的头包含27000个三角形,做了50个头来作为morph的target array。下图展示了一些,最终morphing的结果可以是多个morph target的加权和。】

 
 


Figure 4-2 Emotional Blend Targets (Blend Shapes)

 
 

Although such complex emotional faces could have been made entirely of blends of more elemental modifiers, our artist found it more intuitive to model the face in the pose he desired, because it is hard to model an element such as an eyebrow creasing, without seeing how the eyes, cheeks, and mouth work together. This combination also helps with hardware register limitations, described later.

【要合成复杂的表情动画还是非常难的,最终的结果要看是否自然,是否会出一些明显的错误是不被允许的。譬如眼睛突出来这样的人不可能会有的行为,要加以约束,如何处理约束后面会讲。】

 
 

 
 

4.3.1 Morph Targets in a High-Level Language

 
 

Luckily, the implementation of morph targets in HLSL or Cg is simple. Assuming that vertexIn is our structure containing per-vertex data, applying morph targets in a linear or serial fashion is easy:

【幸运的是硬件实现morpg target很简单,首先来看先后时间位置的差值做法会是如下。】

 
 

float4 position = (1.0f – interp) * vertexIn.prevPositionKey + interp * vertexIn.nextPositionKey;

 
 

In this code, interp is a constant input parameter in the shader, but prevPositionKey and nextPositionKey are the positions at the prior time and next time, respectively. When applying morph targets in parallel, we find the spatial difference between the morph target and the neutral pose, which results in a difference vector. We then weight that difference vector by a scalar. The result is that a weight of 1.0 will apply the per-vertex offsets to achieve that morph target, but each morph target can be applied separately. The application of each morph target is just a single “multiply-add” instruction:

【interp 是常数输入值,prevPositionKey/nextPositionKey 是前后时刻的位置。同一时间多个morph target做法也是类似,如下加权平均。】

 
 

// vertexIn.positionDiffN = position morph target N – neutralPosition

 
 

float4 position = neutralPosition;

position += weight0 * vertexIn.positionDiff0;

position += weight1 * vertexIn.positionDiff1;

position += weight2 * vertexIn.positionDiff2;

 
 

 
 

4.3.2 Morph Target Implementation

 
 

We wanted our morph targets to influence both the vertex position and the basis (that is, the normal, binormal, and tangent) so that they might influence the lighting performed in the fragment shader. At first it would seem that one would just execute the previous lines for position, normal, binormal, and tangent, but it is easy to run out of vertex input registers. When we wrote the “Dawn” and “Dusk” demos, the GPU could map a maximum of 16 per-vertex input attributes. The mesh must begin with the neutral position, normal, binormal, texture coordinate, bone weights, and bone indices (described later), leaving 10 inputs open for morph targets. We might have mapped the tangent as well, but we opted to take the cross product of the normal and binormal in order to save one extra input.

【我们想要morph target影响顶点位置和basis,相应的影响fragment shader的光照性能。这里要注意的是GPU寄存器的数量是有限的,除去渲染要用的寄存器剩下的就是morph可以使用的寄存器。只用normal,binormal就可以,可以叉乘得到tengent,节约寄存器。】

 
 

Because each difference vector takes one input, we might have 10 blend shapes that influence position, five blend shapes that influence position and normal, three position-normal-binormal blend shapes, or two position-normal-binormal-tangent blend shapes. We ultimately chose to have our vertex shader apply five blend shapes that modified the position and normal. The vertex shader would then orthonormalize the neutral tangent against the new normal (that is, subtract the collinear elements of the new normal from the neutral tangent and then normalize) and take the cross product for the binormal. Orthonormalization is a reasonable approximation for meshes that do not twist around the surface normal:

【每一个vector作为一个输入,通过blend都会影响到最终的position。我们最终选的方案是应用五个shape blend出最终的shape。计算新的tangent如下:】

 
 

// assumes normal is the post-morph-target result

// normalize only needed if not performed in fragment shader

 
 

float3 tangent = vertexIn.neutralTangent – dot(vertexIn.neutralTangent, normal) * normal;

tangent = normalize(tangent);

 
 

Thus, we had a data set with 50 morph targets, but only five could be active (that is, with weight greater than 0) at any given time. We did not wish to burden the CPU with copying data into the mesh every time a different blend shape became active, so we allocated a mesh with vertex channels for neutralPosition, neutralNormal, neutralBinormal, textureCoord, and 50 * (positionDiff, NormalDiff). On a per-frame basis, we merely changed the names of the vertex input attributes so that those that should be active became the valid inputs and those that were inactive were ignored. For each frame, we would find those five position and normal pairs and map those into the vertex shader, allowing all other vertex data to go unused.

【因此我们有了50个morph目标但是只能在同一时刻激活使用5个。我们不希望每一次做差值都需要重新拷贝这五个目标的数据,因此我们为mesh分配相关的vertex channel包括neutralPosition…信息。在每一帧的基础上,我们只是改变vertex input的属性名字来决定其是否激活,在进行计算。】

 
 

Note that the .w components of the positionDiff and normalDiff were not really storing any useful interpolants. We took advantage of this fact and stored a scalar self-occlusion term in the .w of the neutralNormal and the occlusion difference in each of the normal targets. When extracting the resulting normal, we just used the .xyz modifier to the register, which allowed us to compute a dynamic occlusion term that changed based on whether Dawn’s eyes and mouth were open or closed, without any additional instructions. This provided for a soft shadow used in the lighting of her skin (as described in detail in Chapter 3, “Skin in the ‘Dawn’ Demo”).

【positionDiff/normalDiff 的 .w 分量在差值中用不到,我们根据这个来让这个值存储遮蔽信息,这样就可以做到跟据w判读是否启用这里的.xyz,节省空间时间。】

 
 

On the content-creation side, our animator had no difficulty remaining within the limit of five active blend shapes, because he primarily animated between three or so emotional faces and then added the elemental modifiers for complexity. We separated the head mesh from the rest of the body mesh because we did not want the added work of doing the math or storing the zero difference that, say, the happy face would apply to Dawn’s elbow. The result remained seamless—despite the fact that the head was doing morph targets and skinning while the body was doing just skinning—because the outermost vertices of the face mesh were untouched by any of the emotional blend shapes. They were still modified by the skinning described next, but the weights were identical to the matching vertices in the body mesh. This ensured that no visible artifact resulted.

【在内容创建的部分,其实五个差值已经足够用来差出目标效果了。我们这里切分出头和身体,一般身体不参与这里的运算驱动。】

 
 

 
 

4.4 Skinning

 
 

Skinning is a method of mesh deformation in which each vertex of that mesh is assigned an array of matrices that act upon it along with weights (that should add up to 1.0) that describe how bound to that matrix the vertex should be. For example, vertices on the bicep may be acted upon only by the shoulder joint, but a vertex on the elbow may be 50 percent shoulder joint and 50 percent elbow joint, becoming 100 percent elbow joint for vertices beyond the curve of the elbow.

【蒙皮就是骨骼驱动网格数据,就是去定义一个mesh顶点怎样根据其骨骼权重差值得到新的位置。】

 
 

Preparing a mesh for skinning usually involves creating a neutral state for the mesh, called a bind pose. This pose keeps the arms and legs somewhat separated and avoids creases as much as possible, as shown in Figure 4-3. First, we create a transform hierarchy that matches this mesh, and then we assign matrix influences based on distance—usually with the help of animation tools, which can do this reasonably well. Almost always, the result must be massaged to handle problems around shoulders, elbows, hips, and the like. This skeleton can then be animated through a variety of techniques. We used a combination of key-frame animation, inverse kinematics, and motion capture, as supported in our content-creation tool.

【准备好一些bind pose,就是预定义的一些关键帧,这些关键帧就是人为的去除一些不自然的情况。然后的做法就是上述的蒙皮得到变形网格。】

 
 


Figure 4-3 Dawn’s Bind Pose

 
 

A skinned vertex is the weighted summation of that vertex being put through its active joints, or:

【公式描述:vertex最终位置由joint的加权乘结果得到,存在矩阵乘法是因为骨骼间的继承关系。】

 
 


 
 

Conceptually, this equation takes the vertex from its neutral position into a weighted model space and back into world space for each matrix and then blends the results. The concatenated 


 matrices are stored as constant parameters, and the matrix indices and weights are passed as vertex properties. The application of four-bone skinning looks like this:

【上面的计算存在在模型空间完成计算,然后结果再应用到世界空间这一个过程。实现如下】

 
 

float4 skin(float4x4 bones[98],

float4 boneWeights0,

float4 boneIndices0)

{

float4 result = boneWeights0.x * mul(bones[boneIndices.x], position);

result = result + boneWeights0.y * mul(bones[boneIndices.y],

position);

result = result + boneWeights0.z * mul(bones[boneIndices.z],

position);

result = result + boneWeights0.w * mul(bones[boneIndices.w],

position);

return result;

}

 
 

In the “Dawn” demo, we drive a mesh of more than 180,000 triangles with a skeleton of 98 bones. We found that four matrices per vertex was more than enough to drive the body and head, so each vertex had to have four bone indices and four bone weights stored as vertex input attributes (the last two of the 16 xyzw vertex registers mentioned in Section 4.3.2). We sorted bone weights and bone indices so that we could rewrite the vertex shader to artificially truncate the number of bones acting on the vertex if we required higher vertex performance. Note that if you do this, you must also rescale the active bone weights so that they continue to add up to 1.

【在 Dawn 的例子中,我们的网格超过 180000 三角形, 骨骼有 98 根。 我们发现一个顶点被四根骨头驱动就已经足够了,这里就是这么应用的,在这里要注意的一点是要保证权重合为一。】

 
 

4.4.1 Accumulated Matrix Skinning

 
 

When skinning, one must apply the matrix and its bind pose inverse not only to the position, but also to the normal, binormal, and tangent for lighting to be correct. If your hierarchy cannot assume that scales are the same across x, y, and z, then you must apply the inverse transpose of this concatenated matrix. If scales are uniform, then the inverse is the transpose, so the matrix remains unchanged. Nonuniform scales create problems in a variety of areas, so our engine does not permit them.

【skin的时候,我们不仅仅要对pose应用matrix, 其他信息一样要这么做。 一定要注意 uniform scale是必要的。】

 
 

If we call the skin function from the previous code, we must call mul for each matrix for each vertex property. In current hardware, multiplying a point by a matrix is implemented as four dot products and three adds, and vector-multiply is three dot products and two adds. Thus, four-bone skinning of position, normal, binormal, and tangent results in:

【统计一下四骨头驱动那些信息的计算量:一个顶点乘上矩阵就是下面第一个小括号的计算量,再乘上四个顶点共88条指令。】

 
 


 
 

An unintuitive technique that creates the sum of the weighted matrices can be trivially implemented in HLSL or Cg as follows:

【GPU处理矩阵运算:】

 
 

float4x4 accumulate_skin(float4x4 bones[98],

 

float4 boneWeights0,

float4 boneIndices0)

{

float4x4 result = boneWeights0.x * bones[boneIndices0.x];

result = result + boneWeights0.y * bones[boneIndices0.y];

result = result + boneWeights0.z * bones[boneIndices0.z];

result = result + boneWeights0.w * bones[boneIndices0.w];

return result;

}

 
 

Although this technique does burn instructions to build the accumulated matrix (16 multiplies and 12 adds), it now takes only a single matrix multiply to skin a point or vector. Skinning the same properties as before costs:

【这样可以减少总数的指令】

 
 


 
 

 
 

4.5 Conclusion

 
 

It is almost always beneficial to offload mesh animation from the CPU and take advantage of the programmable vertex pipeline offered by modern graphics hardware. Having seen the implementation of skinning and morph targets using shaders, however, it is clear that the inner loops are quite easy to implement using Streaming SIMD Extensions (SSE) instructions and the like, and that in those few cases where it is desirable to remain on the CPU, these same techniques work well.

 
 

In the case of the “Dawn” demo, morph targets were used to drive only the expressions on the head. If we had had more time, we would have used morph targets all over the body to solve problems with simple skinning. Even a well-skinned mesh has the problem that elbows, knees, and other joints lose volume when rotated. This is because the mesh bends but the joint does not get “fatter” to compensate for the pressing of flesh against flesh. A morph target or other mesh deformation applied either before or after the skinning step could provide this soft, fleshy deformation and create a more realistic result. We have done some work on reproducing the variety of mesh deformers provided in digital content-creation tools, and we look forward to applying them in the future.

【废话不翻译了。】

 
 

【这里没有很值得让人记住的技术点,最主要的贡献在于N的显卡的能力强大到如此大计算量的蒙皮人物也能跑的起来,如此复杂的avatar实际应用价值有限,GPU蒙皮的优化方案的效果理论上都达不到50%的优化,实际效果应该更加不如人意。】

 
 

4.6 References

 
 

Alias Systems. Maya 5.0 Devkit. <installation_directory>/devkit/animEngine/

 
 

Alias Systems. Maya 5.0 Documentation.

 
 

Eberly, David H. 2001. 3D Game Engine Design, pp. 356–358. Academic Press.

 
 

Gritz, Larry, Tony Apodaca, Matt Pharr, Dan Goldman, Hayden Landis, Guido Quaroni, and Rob Bredow. 2002. “RenderMan in Production.” Course 16, SIGGRAPH 2002.

 
 

Hagland, Torgeir. 2000. “A Fast and Simple Skinning Technique.” In Game Programming Gems, edited by Mark DeLoura. Charles River Media.

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

SIGGRAPH 15 – The Real-time Volumetric Cloudscapes of Horizon: Zero Dawn


 
 

Forslides with proper formatting and video/audio use the PPTX version.

 
 

The following was presented at SIGGRAPH 2015 as part of the Advances in Real-time rendering Course. http://advances.realtimerendering.com

 
 

Authors: Andrew Schneider –Principal FX Artist, Nathan Vos –Principal Tech Programmer

 
 


 
 

Thank you for coming.

 
 

Over the next half hour I am going to be breaking down and explaining the cloud system for Horizon Zero Dawn.

【接下来介绍cloud system】

 
 

As Natasha mentioned, my background is in Animated film VFX, with experience programming for voxel systems including clouds.

【作者原来是做动画电影特效的,有voxel system基础】

 
 

This was co-developed between myself and a programmer named Nathan Vos. He could not be here today, but his work is an important part of what we were able to achieve with this.

 
 

Horizon was just announced at E3 this year, and this is the first time that we are sharing some of our new tech with the community. What you are seeing here renders in about 2 milliseconds, takes 20 mbof ram and completely replaces our asset based cloud solutions in previous games.

 
 

Before I dive into our approach and justification for those 2 milliseconds, let me give you a little background to explain why we ended up developing a procedural volumetric system for skies in the first place.

【现讲一下使用procedural volumetric system for skies的背景】

 
 

In the past, Guerrilla has been known for the KILLZONE series of games, which are first person shooters .

 
 


 
 

FPS usually restrict the player to a predefined track, which means that we could hand place elements like clouds using billboards and highly detailed sky domes to create a heavily art directed sky.

【FPS经常限制玩家在一个预定的轨道上,因此像云这样的系统使用高质量的天空盒和billboard就可以了。】

 
 

These domes and cards were built in Photoshop by one artist using stock photography. As Time of day was static in the KILLZONE series, we could pre-bake our lighting to one set of images, which kept ram usage and processing low.

【上面这个demo就是这干的】

 
 

By animating these dome shaderswe could create some pretty detailed and epic sky scapesfor our games.

 
 

Horizon is a very different kind of game…

【Horizon 则是一款非常不一样的游戏】

 
 


 
 

Horizon trailer (Horizon 预告片)

 
 


 
 

So, from that you could see that we have left the world of Killzonebehind.

【从这你可以看到我们放弃了killzone的世界构造做法】

 
 

【horizon特点】

•Horizon is a vastly open world where you can prettymuch go anywhere that you see, including the tops of mountains.【超大自由世界随意走动,包括山顶】

•Since this is a living real world, we simulate the spinning of the earth by having a time of day cycle.【模拟的昼夜循环系统】

•Weather is part of the environment so it will be changing and evolving as well.【天气系统】

•There’s lots of epic scenery: Mountains, forests, plains, and lakes.【史诗般的风景:山,平原,湖泊,森林】

•Skies are a big part of the landscape of horizon. They make up half of the screen. Skies are also are a very important part of storytelling as well as world building.【天空是非常重要的一个部分,一般都占有了屏幕的一半来显示,也是非常重要的故事推进背景元素。】

 
 


 
 

They are used to tell us where we are, when we are, and they can also be used as thematic devices in storytelling.

【天空可以告诉你在哪,什么时候等信息】

 
 


 
 

For Horizon, we want the player to really experience the world we are building. So we decided to try something bold. We prioritized some goals for our clouds.

【我们希望玩家在我们创造的虚拟世界中有真实的体验,因此我们打算做一些大胆的尝试,我们给clouds列了下面这些目标】

 
 

•Artdirect-able【美术可以直接编辑】

•Realistic Representing multiple cloud types【真实的描述多变的云的形状】

•Integrate with weather【整合天气】

•Evolve in some way【存在演变方式】

•And of course, they needed to be Epic!【美】

 
 


 
 

Realistic CG clouds are not an easy nut to crack. So, before we tried to solve the whole problem of creating a sky full them, we thought it would be good to explore different ways to make and light individual cloud assets.

【realistic CG云并不是一件容易啃的骨头。因此在开始处理这个问题前,我们首先浏览目前的所有云的制作方法。】

 
 


 
 

Our earliestsuccessful modeling approach was to use a custom fluid solver to grow clouds. The results were nice, but this was hard for artists to control if they had not had any fluid simulation experience. Guerrilla is a game studio after all.

【流体模拟的方法:效果不错,但是美术很难去独立控制实现】

 
 


 
 

We ended up modeling clouds from simple shapes,

Voxelizing them and then ?

Running them through our fluid solver ?

Until we got a cloud like shape .

【我们最后做了个云的模型,像飞船】

 
 


 
 

And then we developed a lighting model that we used to pre-compute primary and secondary scattering,

•Ill get into our final lighting model a little later, but the result you see here is computed on the cpuin Houdini in 10 seconds.

【然后我们使用预计算的一二级散射开发了一个cloud模型,这个模型的预渲染花了10秒!】

 
 


 
 

We explored 3 ways to get these cloud assets into game.

【我们探索了三种方式来使得云加入游戏】

 
 

•For the first, we tried to treat our cloud as part of the landscape, literally modeling them as polygons from our fluid simulations and baking the lighting data using spherical harmonics. This only worked for the thick clouds and not whispyones …

【第一种是多边形对象的方式,只适合厚厚的云层和不形变的云】

 
 


 
 

So, we though we should try to enhance the billboard approach to support multiple orientations and times of day . We succeeded but we found that we couldn’t easily re-produce inter cloud shadowing. So…

【billboard的方式,不能处理阴影】

 
 


 
 

•We tried rendering all of our voxel clouds as one cloud set to produce sky domes that could also blend into the atmosphere over depth. Sort of worked.

【尝试把所有的voxel cloud按照深度排序被大气blend,当作一个整体看作是天空穹顶。】

 
 

•At this point we took a step back to evaluate what didn’t work. None of the solutions made the clouds evolve over time. There was not a good way to make clouds pass overhead. And there was high memory usage and overdraw for all methods.

【然后回过头来看,voxel clouds对最终结果没有做出贡献占大部分,pass overhead严重,性能非常不好,不是一种好的选择】

 
 

•So maybe a traditional asset based approach was not the way to go.

【因此传统的方法不是一种好的选择】

 
 


 
 

Well, What about voxel clouds?

OK we are crazy we are actually considering voxel clouds now…

As you can imagine this idea was not very popular with the programmers.

【接下来我们疯了,考虑体素云,一般来说这不是一个好选择,原因如下:】

 
 

Volumetrics are traditionally very expensive

With lots of texture reads

Ray marches

Nested loops

【原因】

 
 

However, there are many proven methods for fast, believable volumetric lighting

There is convincing work to use noise to model clouds . I can refer you to the 2012 Production Volume Rendering course.

Could we solve the expense somehow and benefit from all of the look advantages of volumetrics?

【然而,可以做出令人惊艳的效果,也有一些方法在改进效率。这方法的成败就在于效率的处理能否达到要求。】

 
 


 
 

Our first test was to stack up a bunch of polygons in front of the camera and sample 3d Perlin noise with them. While extremely slow, This was promising, but we want to represent multiple clouds types not just these bandy clouds.

【我们首先测试的方法是在相机前堆了一堆多边形对它们取sample,超级慢,而且我们不希望只有这样的罗圈云。】

 
 


 
 

So we went into Houdini and generated some tiling 3d textures out of the simulated cloud shapes. Using Houdini’s GL extensions, we built a prototype GL shader to develop a cloud system and lighting model.

【然后我们采用Houdini来生成3D纹理,利用 Houdini’s GL extensions来开发一个cloud system和光照模型】

【Houdini软件介绍 https://zh.wikipedia.org/wiki/Houdini

 
 


 
 

In The end, with a LOT of hacks, we got very close to mimicking our reference. However, it all fell apart when we put the clouds in motion. It also took 1 second per frame to compute. For me coming from animated vfx, this was pretty impressive, but my colleagues were still not impressed.

【最终加上其他一些东西我们得到的效果和真实效果已经非常相似,但是在云的运动过程中模块感强烈,而且渲染一张需要一秒,这还是不够】

 
 

So I thought, Instead of explicitly defining clouds with pre-determined shapes, what if we could develop some good noises at lower resolutions that have the characteristics we like and then find a way to blend between them based on a set of rules. There has been previous work like this but none of it came close to our look goals.

【因此,不采用明确的已经预定义的云的形状的组合,而是通过noise产生特征并通过一定的方法混合来获得云的形状】

 
 


 
 

This brings us to the clouds system for horizon. To explain it better I have broken it down into 4 sections: Modeling, Lighting, Rendering and Optimization.

【cloud system工作流分成四个阶段:modeling, lighting, rendering, optimization】

 
 

Before I get into how we modeled the cloud scapes, it would be good to have a basic understanding of what clouds are and how they evolve into different shapes.

【了解一下自然界云的形状的情况】

 
 


 
 

Classifying clouds helped us better communicate what we were talking about and Define where we would draw them.

【这有助于让我们知道把特定的云画在哪里】

The basic cloud types are as follows.【基本的云形状】

•The stratoclouds including stratus, cumulus and stratocumulus【云分类】

•The alto clouds, which are those bandy or puffy clouds above the stratolayer【层云(低)】

•And the cirroclouds those big arcing bands and little puffs in the upper atmosphere.【卷云(中)】

•Finally there is the granddaddy of all cloud types, the Cumulonimbus clouds which go high into the atmosphere.【积雨云(高)】

•For comparison, mount Everest is above 8,000 meters.【设定最高高度8000m】

 
 


 
 

After doing research on cloud types, we had a look into the forces that shape them. The best source we had was a book from 1961 by two meteorologists, called “The Clouds” as creatively as research books from the 60’s were titled. What it lacked in charm it made up for with useful empirical results and concepts that help with modeling a cloud system.

【做云类型的研究之后,我们去看一下塑造clouds的力量。来源是1961年的一本由两位气象学家写的书。它弥补了造型云系相关的概念成果对我们有所帮助。】

 
 

§Density increases at lower temperatures【低温下密度增加】

§Temperature decreases over altitude【海拔下降温度升高】

§High densities precipitate as rain or snow【高密度沉淀为雨雪】

§Wind direction varies over altitude【不同海拔高度的风】

§They rise with heat from the earth【保温作用】

§Dense regions make round shapes as they rise【密度决定形状】

§Light regions diffuse like fog【漫反射性质像雾一样】

§Atmospheric turbulence further distorts clouds.【大气湍流进一步扭曲了云】

 
 

These are all abstractions that are useful when modeling clouds

【这些对于构造云系统非常有用】

 
 


 
 

Our modeling approach uses ray marching to produce clouds.

【我们的模型使用光线跟踪生产云。】

 
 

We march from the camera and sample noises and a set of gradients to define our cloud shapes using a sampler

【我们通过sampler来确定云的形状】

 
 


 
 

In a ray march you use a sampler to…

 
 

Build up an alpha channel….

And calculate lighting

【sampler用来确定alpha值和光照计算】

 
 


 
 

There are many examples of real-time volume clouds on the internet. The usual approach involves drawing them in a height zone above the camera using something called fBm, Fractal Brownian Motion(分形布朗运动). This is done by layering Perlin noises of different frequencies until you get something detailed.

【网络上很多的体素云的例子,大部分是在相机的上半部分采用FBM绘制,就是分层perlin noise直到达到满意效果】

 
 

(pause)

 
 

This noise is then usually combined somehow with a gradient to define a change in cloud density over height

【这种noise通过梯度合并的方式来处理云的密度岁高度变化的问题】

 
 


 
 

This makes some very nice but very procedural looking clouds.

What’s wrong?

There are no larger governing shapes or visual cues as to what is actually going on here. We don’t feel the implied evolution of the clouds from their shapes.

【这样的结果非常的程序化的感觉】

【问题在于没有给与真实的存在于那空间的感觉,我们感受不到云的形状的演变趋势。】

 
 


 
 

By contrast, in this photograph we can tell what is going on here. These clouds are rising like puffs of steam from a factory. Notice the round shapes at the tops and whispyshapes at the bottoms.

【相反,在照片中我们能感受到云的运动方向,还有注意照片中底部和顶部的云的不同】

 
 


 
 

This fBm approach has some nice whispy shapes, but it lacks those bulges and billows that give a sense of motion. We need to take our shader beyond what you would find on something like Shader Toy.

【FBm方法可以带来云的稀疏的分布和形状,但是没有那种跌宕起伏的运动感,这是我们想要解决的问题。】

 
 


 
 

These billows, as Ill call them?

…are packed, sometimes taking on a cauliflower shape.

Since Perlin noise alone doesn’s cut it, we developed our own layered noises.

【云很多的时候,需要这种菜花状,这个perlin noise做不到,我们开发了自己的layered noises】

 
 


 
 

Worley noise was introduced in 1996 by Steven Worley and is often used for caustics and water effects. If it is inverted as you see here:

It makes tightly packed billow shapes.

We layered it like the standard Perlin fBm approach

【Worley noise 是这种紧凑的枕头形状,我们首先把它层次化了】

 
 

Then we used it as an offset to dilate Perlin noise. this allowed us to keep the connectedness of Perlin noise but add some billowy shapes to it.

We referred to this as Perlin-Worley noise

【然后混合Perlin noise做offset,效果如下】

 
 


 
 

In games, it is often best for performance to store noises as tiling 3d textures.

【游戏中一般都是用生成好的3D noise textures,为了性能】

 
 

You want to keep texture reads to a minimum?

And keep the resolutions as small as possible.

In our case we have compressed our noises to?

two 3d textures?

And 1 2d texture.

【分辨率:够用的最小化处理】

 
 


 
 

The first 3d Texture…

 
 

has 4 channels…

it is 128^3 resolution…

The first channel is the Perlin-Worley noise I just described.

The other 3 are Worley noise at increasing frequencies. Like in the standard approach, This 3d texture is used to define the base shape for our clouds.

【第一层情况】

 
 


 
 

Our second 3d texture…

 
 

has 3 channels…

it is 32^3 resolution…

and uses Worley noise at increasing frequencies. This texture is used to add detail to the base cloud shape defined by the first 3d noise.

【第二层:降低分辨率】

 
 


 
 

Our 2D texture…

 
 

has 3 channels…

it is 128^2 resolution…

and uses curl noise. Which is non divergent and is used to fake fluid motion. We use this noise to distort our cloud shapes and add a sense of turbulence.

【2D纹理存储情况】

 
 


 
 

Recall that the standard solution calls for a height gradient to change the noise signal over altitude. Instead of 1, we use…

【回想下前面讲到的网络上的标准方法通过梯度改变noise signal来实现海拔的考虑。我们这边也是这么采用的】

 
 

3 mathematical presets that represent the major low altitude…

cloud types when we blend between them at the sample position.

We also have a value telling us how much cloud coverage we want to have at the sample position. This is a value between zero and 1.

【3表示低空,预设;云形状会根据所在位置混合,这里主要说的应该是高度不同不混合,同时这个值也就决定了云层间的覆盖关系】

 
 


 
 

What we are looking at on the right side of the screen is a view rotated about 30 degrees above the horizon. We will be drawing clouds per the standard approach in a zone above the camera.

【右边是仰角三十度仰看的天空,下面在这视角下绘制云。】

 
 

First, we build a basic cloud shape by sampling our first 3dTexture and multiplying it by our height signal.

【首先我们绘制基本的云的形状通过 sampling 前面的3dtexture 乘上 高度信号,见PPT公式。】

 
 

The next step is to multiply the result by the coverage and reduce density at the bottoms of the clouds.

【然后是乘上coverage来减少云的密度】

 
 


 
 

This ensures that the bottoms will be whispy and it increases the presence of clouds in a more natural way. Remember that density increases over altitude. Now that we have our base cloud shape, we add details.

【这样就有了一个比较自然的基本的云的情况,下面添加细节】

 
 


 
 

The next step is to…

 
 

erode the base cloud shape by subtracting the second 3d texture at the edges of the cloud.

Little tip, If you invert the Worley noise at the base of the clouds you get some nice whispy shapes.

【通过第二层的3D texture来侵蚀云层的形状,小技巧说的是你可以直接取反来做侵蚀效果同样好。】

 
 

We also distort this second noise texture by our 2d curl noise to fake the swirly distortions from atmospheric turbulence as you can see here…

【我们同时使用2D 纹理噪音来模拟大气流动带来的云层扭曲】

 
 


 
 

Here’s that it looks like in game. I’m adjusting the coverage signal to make them thicker and then transitioning between the height gradients for cumulus to stratus.

【游戏中的效果,coverage调整的是云层的厚度,height gradient调整的是高度】

 
 

Now that we have decent stationary clouds we need to start working on making them evolve as part of our weather system.

【现在我们的云本身已经差不多了,我们要把它搞进我们的天气系统】

 
 


 
 

These two controls, cloud coverage and cloud type are a FUNCTION of our weather system.

【控制一:云的覆盖程度】

 
 

There is an additional control for Precipitation that we use to draw rain clouds.

【控制二:降水量值用来控制rain cloud的绘制量】

 
 


 
 

Here in this image you can see a little map down in the lower left corner. This represents the weather settings that drive the clouds over our section of world map. The pinkish white pattern you see is the output from our weather system. Red is coverage, Green is precipitation and blue is cloud type.

【左下角的小图表示当前世界天气驱动的云层设置,您所看到的粉红色相间的花纹是从我们的天气系统的输出。红色是Coverage,绿色是降水,蓝色是云的类型。】

 
 

The weather system modulates these channels with a simulation that progresses during gameplay. The image here has Cumulus rain clouds directly overhead (white) and regular cumulus clouds in the distance. We have controls to bias the simulation to keep things art direct-able in a general sense.

【这个图就是可视化的天气系统和云的渲染的接口】

 
 


 
 

The default condition is a combination of cumulus and stratus clouds. The areas that are more red have less of the blue signal, making them stratus clouds. You can see them in the distance at the center bottom of the image.

【默认情况是积云和层云的组合,图中大红色的区域】

 
 


 
 

The precipitation signal transitions the map from whatever it is to cumulonimbus clouds at 70% coverage

【积雨云覆盖超过7成自动启动降雨信号】

 
 


 
 

The precipitation control not only adjusts clouds but it creates rain effects. In this video I am increasing the chance of precipitation gradually to 100%

【降雨信号同时启动雨的特效】

 
 


 
 

If we increase the wind speed and make sure that there is a chance of rain, we can get Storm clouds rolling in and starting to drop rain on us. This video is sped up, for effect, btw. Ahhh… Nature Sounds.

【增加了风的效果加上下雨,我们获得的是暴风雨的效果】

 
 


 
 

We also use our weather system to make sure that clouds are the horizon are always interesting and poke above mountains.

【我们还保证天边总是有云的】

 
 

We draw the cloudscapes with in a 35,000 meter radius around the player….

and Starting at a distance of 15,000 meters…

we start transitioning to cumulus clouds at around 50% coverage.

【我们绘制一个cloudscapes 半径为35000米绕在用户周围,距离用户15000米的时候开始过渡到50%覆盖率的积云】

 
 


 
 

This ensures that there is always some variety and ‘epicness’ to the clouds on the horizon.

So, as you can see, the weather system produces some nice variation in cloud type and coverage.

【这保证了天边总是有云的而且云的类型也符合自然效果】

 
 


 
 

In the case of the e3 trailer, We overrode the signals from the weather system with custom textures. You can see the corresponding textures for each shot in the lower left corner. We painted custom skies for each shot in this manner.

【e3 trailer上面的例子的做法:自定义右下角云图】

 
 


 
 

So to sum up our modeling approach…

【总结一下自家的方法】

 
 

we follow the standard ray-march/ sampler framework but we build the clouds with two levels of detail

a low frequency cloud base shape and high frequency detail and distortion

Our noises are custom and made from Perlin, Worley and Curl noise

We use a set of presets for each cloud type to control density over height and cloud coverage

These are driven by our weather simulation or by custom textures for use with cut scenes and it is all animated in a given wind direction.

 
 


 
 

Cloud lighting is a very well researched area in computer graphics. The best results tend to come from high numbers of samples. In games, when you ask what the budget will be for lighting clouds, you might very well be told “Zero”. We decided that we would need to examine the current approximation techniques to reproduce the 3 most important lighting effects for us.

【cloud lighting是一个非常好的研究领域,因为可以得到很好的效果,但是大量的sample带来的计算量巨大,需要找到很好的近似方法来应用于游戏这样的real time rendering领域】

 
 


 
 

The directional scattering(散射) or luminous(发光) quality of clouds…

The sliver lining when you look toward the sun through a cloud…

And the dark edges visible on clouds when you look away from the sun.

【解释一下光学效应:云会发生散射从而可以看到云的黑边,透过云看到阳光则会感受到云在发光一样的效果】

 
 

The first two have standard solutions but the third is something we had to solve ourselves.

【实现方案:下面先介绍两种标准解决方案,最后第三种是我们的方案】

 
 


 
 

When light enters a cloud

The majority of the light rays spend their time refracting off of water droplets and ice inside of the cloud before heading to our eyes.

【一束光射入云层到你眼睛之间大部分时间都花在了水滴间的折射】

 
 

(pause)

By the time the light ray finally exits the cloud it could have been out scattered absorbed by the cloud or combined with other light rays in what is called in-scattering.

【最终射出的光线能量集合了射入的去掉散射掉的再加上其他光线散射过来的同一方向的能量】

 
 

In film vfx we can afford to spend time gathering light and accurately reproducing this, but in games we have to use approximations. These three behaviors can be thought of as probabilities and there is a Standard way to approximate the result you would get.

【特效电影就是这么实打实的来算的,但是我们游戏中必须采用近似计算,下面介绍方法。】

 
 


 
 

Beer’s law states that we can determine the amount of light reaching a point based on the optical thickness of the medium that it travels through. With Beers law, we have a basic way to describe the amount of light at a given point in the cloud.

 
 

If we substitute energy for transmittance ad depth in the cloud for thickness, and draw this out you can see that energy exponentially decreases over depth. This forms the foundation of our lighting model.
 

【Beer’s law:揭示的是云层厚度和能量损失的关系,这是我们采用的光照模型的基础】

 
 


 
 

but there is a another component contributing to the light energy at a point. It is the probability of light scattering forward or backward in the cloud. This is responsible for the silver lining in clouds, one of our look goals.

【影响最终结果的还有,云层前后表面的散射】

 
 


 
 

In clouds, there is a higher probability of light scattering forward. This is called Anisotropic scattering.

【光线进入云层时存在 各向异性散射】

 
 

In 1941, the Henyey-Greenstein model was developed to help astronomers with light calculations at galactic scales, but today it is used to reliably reproduce Anisotropy in cloud lighting.

【Henyey-Greenstein model: 最初用于天文学的测量,这里用于云的各向异性的亮度处理】

 
 


 
 

Each time we sample light energy, we multiply it by The Henyey-Greenstein phase function.

【每一时刻我们sample light energy,把它应用于Henyey-Greenstein phase function】

 
 


 
 

Here you can see the result. On the left is Just the beers law portion of our lighting model. On the right we have applied the Henyey-Greenstein phase function. Notice that the clouds are brighter around the sun on the right.

【效果展示:左边只是beer’s law的效果,右边加上Henyey-Greenstein phase function处理后的效果】

 
 


 
 

But we are still missing something important, something that is often forgotten. The dark edges on clouds. This is something that is not as well documented with solutions so we had to do a thought experiment to understand what was going on.

【但是我们还是忘了很多重要的部分。云层的黑边,我们不得不去看看怎么解决】

 
 


 
 

Think back to the random walk of a light ray through a cloud.

【考虑一随机光线怎样通过云层】

 
 

If we compare a point inside of the cloud to one near the surface, the one inside would receive more in scattered light. In other words, Cloud material, if you want to call it that, is a collector for light. The deeper you are in the surface of a cloud, the more potential there is for gathered light from nearby regions until the light begins to attenuate, that is.

【你眼睛接收到的云的一点反射出的能量是集合了大量的光线的反射结果,换句话说,云的材质是光的集合,云层越深这集合就越大】

 
 

This is extremely pronounced in round formations on clouds, so much so that the crevices appear…

【因此才会出现云的黑边的问题,边界的眼睛直线方向上的云的深度比较小。】

 
 

to be lighter that the bulges and edges because they receive a small boost of in-scattered light.

Normally in film, we would take many many samples to gather the contributing light at a point and use a more expensive phase function. You can get this result with brute force. If you were in Magnus Wrenninge’s multiple scattering talk yesterday there was a very good example of how to get this. But in games we have to find a way to approximate this.

【电影里面采用大量的sample直接来模拟得到效果,但是游戏中我们就得想办法来得到近似效果。】

 
 


 
 

A former colleague of mine, Matt Wilson, from Blue Sky, said that there is a similar effect in piles of powdered sugar. So, I’ll refer to this as the powdered sugar look.

【一堆云和一堆糖在这一层面的效果和原理是一致的】

 
 


 
 

Once you understand this effect, you begin to see it everywhere. It cannot be un-seen.

Even in light whispyclouds. The dark gradient is just wider.

【你明白了原理你会发现这效果无处不在】

 
 


 
 

The reason we do not see this effect automatically is because our transmittance function is an approximation and doesn’t take it into account.

【我们原来的模型忽略了这一点原理,因此看不到效果】

 
 

The surface of the cloud is always going to have the same light energy that it receives. Let’s think of this effect as a statistical probability based on depth.

【我们来重新考虑lighting模型】

 
 


 
 

As we go deeper in the cloud, our potential for in scattering increases and more of it will reach our eye.

【云层越深,潜在的散射越强到达眼睛的能量越多】

 
 

If you combine the two functions you get something that describes this?

【我们合并这两个公式会得到什么样的结果呢?】

 
 

Effect as well as the traditional approach.

I am still looking for the Beer’s-Powder approximation method in the ACM digital library and I haven’t found anything mentioned with that name yet.

【我们没有找到相关创新的做法,在学界。就是说混合这两种原理作者是第一个这么做的】

 
 


 
 

Lets visually compare the components of our directional lighting model

The beer’s law component which handles the primary scattering?

The powder sugar effect which produces the dark edges facing the light?

And their combination in our final result.

【比较图中的四种方法的效果】

 
 


 
 

Here you can see what the beer’s law and combined beer’s law and powder effect look like when viewed from the light source. This is a pretty good approximation of our reference.

【混合Beer’s-Powder approximation得到了非常好的效果】

 
 


 
 

In game, it adds a lot of realism to the Thicker clouds and helps sell the scale of the scene.

【在游戏中增加了云的真实感】

 
 


 
 

But we have to remember that this is a view dependent effect. We only see it where our view vector approaches the light vector, so the powder function should account for this gradient as well.

【但是我们还是要记住这是一个基于视点的效果,这种效果适应的是光源直接在当前看得到的范围内且不被遮挡,上图所示的情况则是不适合的】

 
 


 
 

Here is a panning camera view that shows this effect increasing as we look away from the sun.

【效果展示,非常漂亮】

 
 


 
 

The last part of our lighting model is that we artificially darken the rain clouds by increasing the light absorption where they exist.

【最后要提到的是我们对于下雨的云曾加了暗度通过增加云对光的吸收量来实现】

 
 


 
 

So, in review our model has 3 components:

【总结一下模型,包含以下四个原理】

Beer’s Law

Henyen-Greenstein

our powder sugar effect

And Absorption increasing for rain clouds

 
 

 
 


 
 

I have outlined How our sampler is used to model clouds and how our lighting algorithm simulates the lighting effects associated with them. Now I am going to describe how and where we take samples to build an image. And how we integrate our clouds into atmosphere and our time of day cycle.

【接下来介绍怎么取sample来产生image,以及如何运用云在大气模拟中】

 
 


 
 

The first part of rendering with a ray march is deciding where to start. In our situation, Horizon takes place on Earth and as most of you are aware… the earth ….. Is round.

The gases that make up our atmosphere wrap around the earth and clouds exists in different layers of the atmosphere.

【大气层中由于气体的原因,云层分为好几种】

 
 


 
 

When you are on a “flat” surface such as the ocean, you can clearly see how the curvature of the earth causes clouds to descend into the horizon.

【当你在一个”平”的表面,如海洋,你可以清楚地看到地球的曲率如何使云彩降入地平线。】

 
 


 
 

For the purposes of our game we divide the clouds into two types in this spherical atmosphere.

 
 

•The low altitude volumetric stratoclass clouds between 1500 and 4000 meters…

•and the high altitude 2D alto and cirroclass clouds above 4000 meters. The upper level clouds are not very thick so this is a good area to reduce expense of the shaderby making them scrolling textures instead of multiple samples in the ray march.

 
 

【游戏里面我们将大气层分为两部分,按照高度区分,两部分分别存在不同类型的云层,超过4km的部分的云层忽略不模拟】

 
 


 
 

By ray marching through spherical atmosphere we can?

ensure that clouds properly descend into the horizon.

It also means we can force the scale of the scene by shrinking the radius of the atmosphere.

【如何模拟光线通过球形的大气层来确保云层远处看上去会降入地平线。这意味着我们要调整一下场景比例来做到】

 
 


 
 

In our situation we do not want to do any work or any expensive work where we don’t need to. So instead of sampling every point along the ray, we use our samplers two levels of detail as a way to do cheaper work until we actually hit a cloud.

【我们不希望做任何无用功来提高性能,因此我们不是sample光线的每一个点,而是采用了2层的LOD来降低工作量】

 
 


 
 

Recall that the sampler has a low detail noise that make as basic cloud shape

And a high detail noise that adds the realistic detail we need.

The high detail noise is always applied as an erosion from the edge of the base cloud shape.

【首先sample低精度的noise来获得基本的云的形状,然后sample高精度的noise来获得真实感的云,见上图所示。高精度的noise用来侵蚀已建好的低精度的云层。】

 
 


 
 

This means that we only need to do the high detail noise and all of its associated instructions where the low detail sample returns a non zero result.

This has the effect of producing an isosurface that surrounds the area that our cloud will be that could be.

【跟据体素的等值面的概念,我们的高精度处理只需要发生下0值也就是体素表面就行,节省了大量的计算】

 
 


 
 

So, when we take samples through the atmosphere, we do these cheaper samples at a larger step size until we hit a cloud isosurface. Then we switch to full samples with the high detail noise and all of its associated instructions. To make sure that we do not miss any high res samples, we always take a step backward before switching to high detail samples.

【因此在取sample的时候我们首先在低精度的场景下做光线跟踪,只需要在特定的部分采用高精度sample来光线跟踪处理,而且高精度的部分可以采用独立线程来做,确保帧率】

 
 


 
 

Once the alpha of the image reaches 1 we don’t need to keep sampling so we stop the march early.

【一旦alpha值到达1,则停止光线继续传播】

 
 


 
 

If we don’t reach an alpha of one we have another optimization.

After several consecutive samples that return zero density, we switch back to the cheap march behavior until we hit something again or reach the top of the cloud layer.

【如果alpha值到不了1,我们切换回低精度的光线跟踪,知道光线射出大气层停止追踪。】

 
 


 
 

Because of the fact that the ray length increases as we look toward the horizon, we start with

an initial potential 64 samples and end with a potential 128 at the horizon. I say potential because of the optimizations which can cause the march to exit early. And we really hope they do.

This is how we take the samples to build up the alpha channel of our image. To calculate light intensity we need to take more samples.

【因为从眼睛发出的光穿越在球形大气的距离不一样,我们设置了64-128个sample的处理范围】

 
 


 
 

Normally what you do in a ray march like this is to take samples toward the light source, plug the sum into your lighting equation and then attenuate this value using the alpha channel until you hopefully exit the march early because your alpha has reached 1.

【正常境况下光线追踪的做法就是眼睛发出的光线到达光源的路径能量总和,1为上限】

 
 


 
 

In our approach, we sample 6 times in a cone toward the sun. This smooth’s the banding we would normally get with 6 simples and weights our lighting function with neighboring density values, which creates a nice ambient effect. The last sample is placed far away from the rest in order to capture shadows cast by distant clouds.

【我们的方法里面,每一次我们在朝向光源的一个椎体范围内sample6次作为一次折射结果。这种方式可以很好的表现环境光】

 
 


 
 

Here you can see what our clouds look like with just alpha samples with our 5 cone samples for lighting and the long distance cone sample.

To improve performance of these light samples, we switched to sampling the cheap version of our shader once the alpha of the image reached 0.3. , this made the shader 2x faster

【这里你可以看到的云就是上面那种alpha椎体sample方式的结果。为了提高性能,当alpha值达到0.3后,我们完全采用粗粒度的sample方式。】

 
 


 
 

The lighting samples replace the lower case d, or depth in the beers law portion of our lighting model. This energy value is then attenuated(衰减) by the depth of the sample in the cloud to produce the image as per the standard volumetric ray-marching approach.

【能量公式,我们改掉了bear’s law的部分的能量实现方式】

 
 


 
 

The last step of our ray march was to sample the 2d cloud textures for the high altitude clouds

【光线追踪的最后一步就是纹理的采用】

 
 


 
 

These were a collection of the various types of cirrus and alto clouds that were tiling and scrolling at different speeds and directions above the volumetric clouds.

【我们采用了一系列的云层纹理,他们之间的区别在于tiling,scroling的速度和方向不同】

 
 


 
 

In reality light rays of different frequencies are mixing in a cloud producing very beautiful color effects. Since we live in a world of approximations, we had to base cloud colors on some logical assumptions.

We color our clouds based on the following model:

【现实中光线的频率不同带来的混合会出现漂亮的颜色特效,我们通过逻辑上的模拟来实现,我们的云层的颜色基于下面这些模块】

 
 

Ambient sky contribution increases over height

Direct lighting would be dominated by the sun color

Atmosphere would occlude clouds over depth.

 
 

We add up our ambient and direct components and attenuate to the atmosphere color based on the depth channel.

【上面三种结果相加混合】

 
 


 
 

Now, you can change the time of day in the game and the lighting and colors update automatically. This means no pre-baking and our unique memory usage for the entire sky is limited to the cost of 2 3d textures and 1 2d texture instead of dozens of billboards or sky domes.

【现在你可以根据游戏中的时间来调整大气的颜色亮度了,这里没有采用预烘焙的方式节约了大量的存储空间】

 
 


 
 

To sum up what makes our rendering approach unique:

【总结一下渲染方面用到的方法特点】

 
 

Sampler does “heap” work unless it is potentially in a cloud

64-128 potential march samples, 6 light samples per march in a cone, when we are potentially in a cloud.

Light samples switch from full to cheap at a certain depth

 
 

 
 


 
 

The approach that I have described so far costs around 20 milliseconds.

(pause for laughter)

Which means it is pretty but, it is not fast enough to be included in our game. My co-developer and mentor on this, Nathan Vos, Had the idea that…

【到上面为止每一帧这一部分的渲染时长还是在20毫秒,作为游戏这还不够快】

 
 


 
 

Every frame we could use a quarter res buffer to update 1 out of 16 pixels for each 4×4 pixel block with in our final image.

We reproject the previous frame to ensure we have something persistent.

【把我们最终的画面的4*4个pixel组成一个block,每次更新1/16,其他的重用上一帧的】

 
 


 
 

…and where we could not reproject, like the edge of the screen, We substitute the result from one of the low res buffers.

Nathan’s idea made the shader10x faster or more when we render this at half res and use filters to upscale it.

It is pretty much the whole reason we are able to put this in our game. Because of this our target performance is around 2 milliseconds, most of that coming from the number of instructions.

【这样足够快了,最终渲染时间在2ms左右】

 
 


 
 

In review we feel that

We have largely achieved our initial goals. This is still a work in progress as there is still time left in the production cycle so we hope to improve performance and direct-ability a bit more. We’re also still working on our atmospheric model and weather system and we will be sharing more about this work in the future on our website and at future conferences.

【回头看我们算是达到了目标,期间最大的问题就是性能处理,我们将继续优化以及将更多的细节分享给大家】

 
 

All of this was captured on a playstation4

And this solution was written in PSSL and C++

 
 


 
 

A number of sources were utilized in the development of this system. I have listed them here.

I would like to thank My co-developer, Nathan vosmost of all

【这里涉及到的资源分享给大家】

 
 

Also some other Guerrillas..

Elco–weather system and general help with transition to games

Michal –supervising the shader development with me and Nathan

Jan Bart, -for keeping us on target with our look goals

Marijn–for allowing me the time in the fxbudget to work on this and for his guidance

Maarten van der Gaagfor some optimization ideas

Felix van den Bergh for slaving away at making polygon clouds and voxel clouds in the early days

Vlad Lapotin, for his work testing out spherical harmonics

And to HermenHulst, manager of Guerrilla for hiring me and for allowing us the resources and time to properly solve this problem for real-time.

 
 


 
 

Are there any questions?

 
 


 
 

Peace out.

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

SIGGRAPH 15 – Learning from Failure: a Survey of Promising, Unconventional and Mostly Abandoned Renderers for ‘Dreams PS4’, a Geometrically Dense, Painterly UGC Game’


 
 

this talk is about showing you some of the approaches that we tried and failed to make stick for our project. if you’re looking for something to take away, hopefully it could be inspiration or some points are places to start, where we left off. I also just think it’s interesting to hear about failures, and the lessons learnt along the way. it’s a classic story of the random walk of R&D…

此片讲的是作者在这个领域的尝试和失败,希望能对你有所启发。

 
 


 
 

spoiler section! 搅局部分

================

this is where we’re headed if you didnt see it at e3 {e3 trailer}

https://www.youtube.com/watch?v=4j8Wp-sx5K0

 
 


back to the beginning

=====================

it all began with @antonalog doing an experiment with move controllers, and a DX11 based marching cubes implementation.

 
 

额外连接:

http://paulbourke.net/geometry/polygonise/

https://github.com/smistad/GPU-Marching-Cubes

 
 


here he is! this was on PC, using playstation move controllers. the idea was to record a series of add & subtraction using platonic shapes with simple distance field functions

PC平台使用PS手柄来实现UGC

方法: the idea was to record a series of add & subtraction using platonic shapes with simple distance field functions

 
 

 
 


we use (R to L) cubic strokes(笔触), cylinders, cones, cuboids, ellipsoids, triangular prisms, donuts, biscuits, markoids*, pyramids.

(markoids are named for our own mark z who loves them; they’re super ellipsoids with variable power for x,y,z)

 
 


here’s the field for the primitives…

 
 


we called each primitive an ‘edit’,

we support a simple list, not tree of CSG edits. 没有使用场景树

and models are made up of anything from 1 to 100,000 edits

with add, subtract or ‘color’ only, along with…

 
 


soft blend, which is effectively soft-max and soft-min functions.

 
 

 
 


here’s the field for the hard blend. 【硬混合】

 
 


and the soft. I’ll talk more about the function for this in a bit. note how nicely defined and distance-like it is, everywhere! 【软混合】

 
 


[timelapse of dad’s head, with randomised colours] he’s 8,274 edits.

(an side: MM artists Kareem, Jon B and Francis spent a LONG time developing artistic techniques like the ‘chiselled’ look you see above, with half made early versions of this tech. It’s their artistry which convinced us to carry on down this path. It can’t be understated how important it is when making new kinds of tools, to actually try to use them in order to improve them. Thanks guys!).

这里作者想说的是:工欲善其事必先利其器

 
 

Anyway:

the compound SDF function , was stored in 83^3 fp16 volume texture blocks, incrementally(渐近,增量) updated as new edits arrived. each block was independently meshed using marching cubes on the compute shader;

at the time this was a pretty advanecd use of CS( as evidenced by frequent compiler bugs/driver crashes) – many of the problems stemmed from issues with generating index buffers dynamically on the GPU. (这是现在相当高级的使用方式,使用中会频繁的编译错误和驱动崩溃,原因是动态生成IB用于GPU)

the tech was based on histopyramids(历史金字塔), which is a stream compaction(压缩) technique where you count the number of verts/indices each cell needs, iteratively halve(一半的) the resolution building cumulative ‘summed area’ tables, then push the totals back up to full resolution, which gives you a nice way to lookup for each cell where in the target VB/IB its verts should go. there’s lots of material online, just google it.

【解释做法:动态VB/IB修改,GPU处理】

 
 


the core idea of lists of simple SDF elements, is still how all sculptures are made in dreams, and is the longest living threads. this was the opposite of a failure! it was our first pillar in the game.

【作者游戏中最成功的第一支柱】

 
 

Anton worked with Kareem, our art director, to get some pretty cool gestural UI going too; there’s minimal UI intrusion so artists can get into flow state. I think he was planning to implement classic z-brush style pull/smear/bend modifications of the field – which is probably what some of you may have thought we did first- but luckily he didn’t. Why? welllll………..

【一开始想全盘模拟zbrush的操作方式,最后不是这样】

 
 


 
 

some early animation tests were done around this time to see what could be achieved – whether with purely with semi- or fully rigid pieces, or some other technique. The results were varied in quality and all over the place in art style – we didn’t know what we wanted to do, or what was possible; so we imagined lots of futures:

【一开始我们也不知道想要什么,也就不知道要用什么技术,我们想象了很多情况】

 
 


rigid-ish pieces (low resolution FFD deformer over rigid pieces):

【刚体块】

 
 


competing with that was the idea of animating the edits themselves. the results were quite compelling(引人注目) –

 
 


this was an offline render using 3DS Max’s blob mode to emulate soft blends. but it shows the effect.

【3ds max的软混合效果】

 
 


this was in Anton’s PC prototype, re-evaluating and re-meshing every frame in realtime.

【每一帧都需要re-evaluation & re-meshing】

 
 


and there was a visual high bar, which everyone loved, inspired by the work of legendary claymation animator & film maker jan svankmajer

【受粘土动画启发】

 
 


here we made stop motion by scrubbing(擦洗) through the edit history, time lapse style (just like the earlier dad’s head). and on a more complex head model… pretty expensive to re-evaluate every frame though!

【对于动作的每帧变化都re-evaluation很贵】

 
 


 
 

however to achieve this, the SDF would need to be re-evaluated every frame. in the first pc prototype, we had effectively added each edit one at a time to a volume texture – it was great for incremental edits, but terrible for loading and animation. the goal of dreams is for UGC to be minimal size to download, so we can’t store the SDF fields themselves anyway – we need a fast evaluator!

【但是为了达到效果,SDF就得那么做。在我们最开始的例子里面,我们可以有效的编辑 volume texture,但是对于动画和loading还不可以。我们需要一种快速的evaluator方法】

 
 


 
 

Nevertheless, a plan was forming! the idea was this

{‘csg’ edit list => CS of doom => per object voxels => meshing? => per object poly model => scene graph render! profit!

【思路】

 
 

Before getting to rendering, I’d like to talk about the CS of doom, or evaluator as we call it. The full pipeline from edit list to renderable data is 40+ compute shaders in a long pipeline, but the “CS of doom” are a few 3000+ instruction shaders chained together that make the sparse SDF output. fun to debug on early PS4 hardware!

【先来看 Constructive solid of doom(构造solid的厄运),性能问题严重 】

 
 

here are some actual stats on dispatch counts(调度数) for the a model called crystal’s dad to be converted from an edit list to a point cloud and a filtered brick tree:

eval dispatch count: 60

sweep dispatch count: 91

points dispatch count: 459

bricker dispatch count: 73

【调度数比较】

 
 


 
 

We had limited the set of edits to exclude domain deformation or any non-local effects like blur (much to the chagrin of z-brush experienced artists), and our CSG trees were entirely right leaning, meaning they were a simple list. Simple is good!

so in *theory* we had an embarrassingly parallel problem(尴尬的并行问题) on our hands. take a large list of 100k edits, evaluate them at every point in a ~1000^3 grid, mesh the result, voila! one object!

【基本版问题:100K大小的操作量 * 场景大小 1000^3 依次evaluation = 100 billion】

 
 


 
 

alas, that’s 100 billion evaluations, which is too many.

 
 


 
 

anton wrote the first hierarchical prototype, which consisted of starting with a very coarse voxel grid, say 4x4x4

{slide}

【改版一代:使用层次化的 voxel grid】

 
 


 
 

building a list of edits that could possibly overlap each voxel, and then iteratively refining the voxels by splitting them and shortening the lists for each.

【讲的是层次化结构怎么运作】

 
 


 
 

empty cells and full cells are marked early in the tree; cells near the boundary are split recursively(递归) to a resolution limit. (the diagram shows a split in 2×2, but we actually split by 4x4x4 in one go, which fits GCN’s 64 wide wavefronts and lets us make coherent scalar branches on primitive type etc) the decision to split a given cell and when not to, is really tricky(狡猾).

【决定什么时候细分cell非常不容易】

 
 

if you err on the ‘too little split’ side, you get gaps in the model. most of the renderering backends we were trying required at least 1 to 1.5 voxels of valid data on each side of the mesh.

if you err on the ‘too much split’ side, you can easily get pathological cases where the evaluator ends up doing orders of magnitude too much work.

 
 

Also, the splits must be completely seamless(无缝). The quality constraints are much, much more stringent than what you’d need for something like sphere tracing.

 
 

Both Anton and I had a crack at various heuristic evaluators(破解各种启发式评估), but neither was perfect. And it was made worse by the fact that even some of our base primitives, were pretty hard to compute ‘good’ distances for!

【这种方式的实现难度和理论缺陷都存在,不完美】

 
 


 
 

an aside on norms. everyone defaults to the L2 distance (ie x^2+y^2+z^2) because it’s the length we’re used to.

 
 


 
 

the L2 norm for boxes and spheres is easy. but the ellipsoid… not so much. Most of the public attempts at ‘closest point on an ellipsoid’ are either slow, unstable in corner cases, or both. Anton spent a LONG time advancing the state of the art, but it was a hard, hard battle.

【距离衡量:L2 norm: X^2 + Y^2 + Z^2】

 
 

Ellipsoid: https://www.shadertoy.com/view/ldsGWX

Spline: https://www.shadertoy.com/view/XssGWl

 
 


 
 

luckily, anton noticed that for many primitives, the max norm was simpler and faster to evaluate.

 
 

Insight from “Efficient Max-Norm Distance Computation and Reliable Voxelization” http://gamma.cs.unc.edu/RECONS/maxnorm.pdf

 
 

  • Many non-uniform primitives have much simpler distance fields under max norm, usually just have to solve some quadratics!
  • Need to be careful when changing basis as max norm is not rotation-invariant, but a valid distance field is just a scaling factor away

 
 

So evaluator works in max norm i.e. d = max(|x|,|y|,|z|). The shape of something distance ‘d’ away from a central origin in max norm is a cube, which nicely matches the shape of nodes in our hierarchy. 🙂

【距离衡量:Max norm: 简单快速】

【这里所说的距离衡量指的是soft计算会作用的范围】

 
 


 
 

Soft blend breaks ALL THE CULLING, key points:

– Soft min/max needs to revert(还原) to hard min/max once distance fields are sufficiently far apart(一旦距离场有足够的相距甚远) (otherwise you can never cull either side)

  • Ours is for some radius r: soft_min(a, b, r) { float e = max(r – abs(a – b), 0); return min(a, b) – e*e*0.25/r; }, credit to Dave Smith @ media molecule 【radius计算】
  • Has no effect once abs(a – b) > r 【两对象没有接触】
  • Need to consider the amount of ‘future soft blend’ when culling, as soft blend increases the range at which primitives can influence the final surface (skipping over lots of implementation details!) 【考虑soft融合方式时候的影响范围】
  • Because our distance fields are good quality, we can use interval arithmetic for additional culling (skipping over lots of implementation details!) 【影响范围由距离来衡量】

 
 


 
 

this is a visualisation of the number of edits affecting each voxel; you can see that the soft blend increases the work over a quite large area.

 
 

【下面考虑culling的效率比较】

 
 

however, compared to the earlier, less rigorous evaluators(缺少严格的评估), simon’s interval-arithmetic and careful-maxnorm-bounds was a tour-de-force of maths/engineering/long dependent compute shader chains/compiler bug battling.

 
 


 
 

thanks for saving the evaluator sjb!

 
 


 
 

STATS! for some test models, you can see a range of edits (‘elements’) from 600 – 53000 (the worst is around 120k, but thats atypical); this evaluates to between 1m and 10m surface voxels (+-1.5 of surface),

 
 


 
 

… the culling rates compared to brute force are well over 99%. we get 10m – 100m voxels evaluated per second on a ps4, from a model with tens of thousands of edits.

 
 


 
 

this is one of those models… (crystals dad, 8274 edits, 5.2m voxels)

 
 


 
 

…and this is a visualisation of the number of edits that touch the leaf voxels

 
 


 
 


 
 

moar (head40) (22k edits, 2.4m voxels)

note the colouring is per block, so the voxel res is much higher than the apparent color res in this debug view(voxel密度远高于颜色表示)

 
 


 
 

the meshes output from the blob prototype, as it was called, were generally quite dense – 2m quads at least for a large sphere, and more as the thing got more crinkly(皱巴巴). In addition, we wanted to render scenes consisting of, at very least, a ‘cloud’ of rigidly oriented blob-meshes.

 
 

at this point anton and I started investigating(调查) different approaches. anton looked into adaptive variants of marching cubes, such as dual marching cubes, various octree schemes, and so on. let’s call this engine – including the original histopyramids marching cubes, engine 1: the polygon edition.

 
 

从这边开始介绍引擎的mesh生成方法

 
 


 
 

here are some notes from the man himself about the investigations SDF polygonalization

 
 


 
 

【MC算法:网格太密,边没有棱角,slivers ,输出不对称代码不利于GPU实现】

Marching cubes: Well it works but the meshes are dense and the edges are mushy and there are slivers and the output makes for asymmetrical code in a GPU implementation.

 
 


 
 

I dont know if you can tell but that’s the wireframe!

oh no

 
 


 
 

【DC算法:GPU实现容易,但是不能保证sharp的sharp,smooth的smooth,对于边缘的流动变化没有对齐的方法】

Dual Contouring(双轮廓): Hey this is easy on GPU. Oh but it’s kind of hard to keep sharp edges sharp and smooth things smooth and it doesn’t really align to features for edge flow either.

 
 

http://www.frankpetterson.com/publications/dualcontour/dualcontour.pdf

‘Dual Contouring of Hermite Data’

Ju, Losasso, Schaefer and Warren

 
 


 
 

note the wiggly(扭动的) edge on the bottom left of the cuboid – really hard to tune the hard/soft heuristics when making animated deathstars.

【动画过程中不能保证边的正确变化,会扭动】

 
 


 
 

more complex model….

 
 


 
 

the DC mesh is still quite dense in this version, but at least it preserves edges.

【这例子里面虽然网格一样的密,但是可以保留边】

 
 

however it shows problems: most obviously, holes in the rotor due to errors in the evaluator we used at this stage (heuristic(启发式) culling -> makes mistakes on soft blend; pre simon eval!) – also occasionally what should be a straight edge ends up wobbly because it cant decide if this should be smooth or straight. VERY tricky to tune in the general case for UGC.

【但也显示出一些问题:有错误的洞,原来直的边变的摇晃】

 
 


 
 

【自交叉的解决】

ALSO! Oh no, there are self intersections! This makes the lighting look glitched – fix em:

 
 

http://www.cs.wustl.edu/~taoju/research/interfree_paper_final.pdf

‘Intersection-free Contouring on An Octree Grid’

Tao Ju, Tushar Udeshi

 
 


 
 

【不必要manifold解决】

Oh no, now it’s not necessarily manifold(合成), fix that.

 
 

http://faculty.cs.tamu.edu/schaefer/research/dualsimp_tvcg.pdf

Manifold Dual Contouring

Scott Schaefer, Tao Ju, Joe Warren

 
 


 
 

【这里可能包含个人因素,MC算法可能不是那么不好】

Oh no, it’s self intersecting again. Maybe marching cubes wasn’t so bad after all… and LOD is still hard (many completely impractical papers).

 
 


 
 

the ability to accumulate to an ‘append buffer’ via DS_ORDERED_COUNT * where the results are magically in deterministic order based on wavefront dispatch(调度) index* is …

magical and wonderful feature of GCN. it turns this…

【这没大明白】

 
 


(non deterministic vertex/index order on output from a mesher, cache thrashing(抖动) hell:)

【mesher的不确定的VB/IB输出顺序】

 
 


 
 

into this – hilbert ordered dual contouring! so much better on your (vertex) caches.

we use ordered append in a few places. it’s a nice tool to know exists!

【hilbert 顺序的DC,非常有用】

 
 


 
 

back to the story! the answer to Isla’s question is,

【你喜欢多边形么?】

 
 


 
 

no, I do not like polygons.

【不喜欢】

 
 

I mean, they are actually pretty much the best representation of a hard 2D surface embedded in 3D, especially when you consider all the transistors(晶体管) and brain cells(脑细胞) dedicated(专用) to them.

【多边形是最好的来描述3D对象内置的2D表面的方法】

 
 

but… they are also very hard to get right automatically (without a human artist in the loop), and make my head hurt. My safe place is voxels and grids and filterable representations.

【但是这种多边形很难自动化生成】

 
 

Plus, I have a real thing for noise, grain, ‘texture’ (in the non texture-mapping sense), and I loved the idea of a high resolution volumetric representation being at the heart of dreams. it’s what we are evaluating, after all. why not try rendering it directly? what could possibly go wrong?

【那么我们为什么不去直接渲染体素呢】

 
 

so while anton was researching DC/MC/…, I was investigating alternatives(调查替代方案).

 
 


 
 

there was something about the artefacts(工艺品) of marching cubes meshes that bugged me.

I really loved the detailed sculpts, where polys were down to a single pixel and the lower res / adaptive res stuff struggled(挣扎) in some key cases.

so, I started looking into… other techniques.

【这里作者提到他特别喜欢看高分辨率细节,那些每个多边形只作用于一个pixel了】

 
 


 
 

【体素billboard,特别适合作者想要的】

since the beginning of the project, I had been obsessed by this paper:

http://phildec.users.sourceforge.net/Research/VolumetricBillboards.php

by Philippe Decaudin, Fabrice Neyret.

 
 

it’s the spiritual(精神) precursor to gigavoxels, SVOs, and their even more recent work on prefiltered voxels. I became convinced(相信) around this time that there was huge visual differentiation to be had, in having a renderer based not on hard surfaces, but on clouds of prefiltered, possibly gassy looking, models. and our SDF based evaluator, interpreting the distances around 0 as opacities, seemed perfect. this paper still makes me excited looking at it. look at the geometric density, the soft anti-aliased look, the prefiltered LODs. it all fitted!

 
 


 
 

the paper contributed a simple LOD filtering scheme based on compositing ‘over’ along each axis in turn, and taking the highest opacity of the three cardinal directions. this is the spiritual precursor to ‘anisotropic’ voxels used in SVO. I love seeing the lineage of ideas in published work. ANYWAY.

【这paper还实现了相关的LOD算法,对作者来说特别有用】

 
 


 
 

the rendering was simple too: you take each rigid object, slice it screen-aligned along exponentially spaced z slices, and composite front to back or back to front. it’s a scatter-based, painters algorithm style volume renderer. they exploit the rasterizer to handle sparse scenes with overlapping objects. they also are pre-filtered and can handle transparent & volumetric effects. this is quite rare – unique? – among published techniques. it’s tantalising. I think a great looking game could be made using this technique.

【渲染方式也很简单:对于所有的对象按相机空间Z轴排列,然后由画家算法来渲染。他们利用光栅化处理重叠的对象在稀疏的场景,通过pre-filtered来处理透明和特效。所有技术都是现成的,特别适合用来做游戏。】

 
 

I have a small contribution – they spend a lot of the paper talking about a complex Geometry shader to clip the slices to the relevant object bounds. I wish it was still 2008 so I could go back in time and tell them you don’t need it! 😉 well, complex GS sucks. so even though I’m 7 years late I’m going to tell you anyway 😉

【原作者用大量篇幅来讲述一个复杂的几何shader来对对象裁减slices,我相信因为那是08年所以操作那么复杂,现在2015年了可以简单的重写一下,下面就是讲做法】

 
 


 
 

to slice an object bounded by this cube…

 
 


 
 

pick the object axis closest to the view direction, and consider the 4 edges of the cube along this axis.

 
 


 
 

generate the slices as simple quads with the corners constrained to these 4 edges,

 
 


 
 

some parts of the slice quads will fall outside the box. that’s what the GS was there for! but with this setup, we can use existing HW:

 
 


 
 

just enable two user clipping planes for the front and back of the object. the hardware clipping unit does all the hard work for you.

【现在可以硬件实现】

 
 


 
 

ANYWAY. this idea of volumetric billboards stuck with me. and I still love it.

 
 

fast forward a few years, and the french were once again rocking it.

http://maverick.inria.fr/Members/Cyril.Crassin/

Cyril Crassin, Fabrice Neyret, Sylvain Lefebvre (note: neyret is the secondary author on VBs) had put out gigavoxels.

 
 

this is the next precursor to SVOs. seen through the lens of the earlier VB work, I loved that it kept that pre-filtered look, the geometric density from having a densely sampled field. it layered on top a heirachical, sparse representation – matching very well the structure of our evaluator. hooray! however it dispensed with the large number of overlapping objects, which makes it less immediately applicable to Dreams/games. But I did implement a quick version of gigavoxels, here are some shots.

【原来版本主要应用于大量的重复对象,这样不能马上应用于我们的游戏,因此作者实现了一个quick gigavoxel 版本】

 
 


 
 

its impossible to resist domain repetition when you’re just raytracing a field…

【本游戏不存在大量重复对象】

 
 


 
 

add some lighting as per my earlier siggraph advances talk (2006 was it?), the sort of thing that has since been massively refined e.g. in the shadertoy community (sampling mip mapped/blurred copies of the distance field – a natural operation in gigavoxel land, and effectively cone tracing) I think it has a lovely alabaster look.

however it focussed on a single large field that (eye) rays were traced through, and I needed the kind of scene complexity of the earlier VB paper – a cloud of rigid voxels models.

【灯光处理,我们不需要光线跟踪,而是采用早期的一些做法。】

 
 

 
 


 
 

the idea is to take the brick tree from gigavoxels, but instead of marching rays from the eye, directly choose a ‘cut’ through the tree of bricks based on view distance (to get nice LOD), then rasterise each brick individually. The pixel shader then only has to trace rays from the edge of the bricks(砖) to any surface.

【直接根据view distance来LOD,然后每个brick独立光栅化,光线追踪从brick的边缘开始】

 
 

As an added advantage, the bricks are stored in an atlas(图册), but there is no virtual-texturing style indirection needed in the inner loop (as it is in gigavoxels), because each rastered cube explicitly(明确的) bounds each individual brick, so we know which bit of the atlas to fetch(取) from at VS level.

【纹理的处理:每一个cube明确绑定到独立的brick,因此我们知道从atlas的哪个位置去取】

 
 


 
 

here you can see the individual cubes that the VS/PS is shading. each represents an 8x8x8 little block of volume data, giga-voxels style. again: rather than tracing eye rays for the whole screen, we do a hybrid scatter/gather: the rasteriser scatters pixels in roughly the right places (note also that the LOD has been adapted so that the cubes are of constant screen space size, ie lower LOD cut of the brick tree is chosen in the distance) then the Pixelshader walks from the surface of the cubes to the SDF surface.

【你看到的每一个独立的cube,包含64个 little block of volume data(gigavoxel)。我们不采用反向光线追踪,而是hybird scatter/gather: LOD后的cube尺寸(这就是hybird的概念)光栅化,然后通过pixelshader对cube表面转化成SDF表面】

 
 

also, I could move the vertices of the cubes around using traditional vertex skinning techniqes, to get animation and deformation… oh my god its going to be amazing!

【使用传统的vertex skinning技术来移动vubes顶点来做动画,效果很好!】

 
 


 
 

(sorry for the bad screenshot – I suck at archiving my work)

It sort of amounts to POM/tiny raymarch inside each 8x8x8 cube, to find the local surface. with odepth to set the zbuffer.

it has the virtue of being very simple to implement.

【按视线深度对cube排序】

 
 


 
 

Because of that simplicity(简单), This technique actually ended up being the main engine a lot of the artists used for a couple of years; you’ll see a couple more shots later. So while the ‘bricks’ engine as it was known, went into heavy use, I really wanted more.

【因为简单好用,这项技术慢慢的流行起来】

 
 


 
 

I wasn’t happy! why not? I also wanted to keep that pre-filtered look from Volumetric Billboards. I felt that if we pursued(追求) just hard z buffered surfaces, we might as well just do polys, or at least, the means didn’t lead to a visual result that was different enough. so I started a long journey(旅程) into OIT.

【我还是不那么开心,我也想保持预过滤Volumetric Billboards。我感到如果我只追求z buffered surfaces,那么只适用多边形】

 
 


 
 

I immediately found that slicing every cube into 8-16 tiny slices, ie pure ‘VB’, was going to burn way too much fill rate.

so I tried a hybrid where: when the PS marched the 8x8x8 bricks, I had it output a list of fuzzy ‘partial alpha’ voxels, as well as outputting z when it hit full opacity. then all I had to do was composite the gigantic (10s of millions) of accumulated fuzzy samples onto the screen… in depth sorted order. Hmm

【我马上发现把每个cube切成8-16个微笑的切片,就是纯粹的”VB”,要烧掉太多填充率。因此采用hybird的方式:PS匹配64个bricks,我们需要它输出a list of fuzzy ‘partial alpha’ voxels,全透明的时候输出Z。那么我要做的就是在depth排序的基础上复合积累模糊样本到屏幕上。】

 
 


 
 

so it was ‘just’ a matter of figuring out how to composite all the non-solid voxels. I had various ground truth images, and I was particularly excited about objects overlapping each other with really creamy falloff(平滑衰减)

  • e.g. between the blue arch and the grey arch thats just the two overlapping and the ‘fuzz’ around them smoothly cross-intersecting.

【这里想搞清楚如何合成所有的non-solid voxels. overlap的对象可以平滑的衰减,下面就是在说这件事情】

 
 


 
 

and pre filtering is great for good LOD! this visualizes the pre-filtered mips of dad’s head, where I’ve added a random beard to him as actual geometry in the SDF.

【pre filtering 适用于LOD】

 
 


 
 

and here’s what it looks like rendered.

【rendered效果】

 
 


 
 

but getting from the too-slow ground truth to something consistently fast-enough was very, very hard.

【让一个事实上很慢的东西变快是非常非常难的】

prefiltering is beautiful, but it generates a lot of fuzz(模糊), everywhere. the sheer(绝对) number of non-opaque(不透明) pixels was getting high – easily 32x 1080p

【prefiltering 会导致模糊】

I spent over a year trying everything – per pixel atomic bubble sort, front k approximations, depth peeling(剥落)..

【我花了一年时间尝试每个pixel自动冒泡排序】

one thing I didn’t try because I didn’t think of it and it hadn’t been published yet, was McGuire style approximate commutative(可交换) OIT. however it wont work in its vanilla form

  • it turns out the particular case of a very ‘tight’ fuzz around objects is very unforgiving of artefacts
  • for example, if adjacent pixels in space or time made different approximations (eg discarded or merged different layers), you get really objectionable visible artefacts.

【有一个我没有想到的也没有出版的是 McGuire style approximate commutative OIT。但因该也不管用】

 
 


 
 

it’s even worse because the depth complexity changes drastically(大幅度) over 2 orders of magnitude between pixels that hit a hard back and ‘edge on’ pixels that spend literally hundred of voxels skating through fuzz. this is morally the same problem that a lot of sphere tracing approaches have, where edge pixels are waaaay harder than surface pixels.

【它甚至会在边界深度大幅变化的情况下更糟糕更模糊】

 
 

I did have some interesting CS load balancing experiments(负载均衡实验), based on wavefronts peeling off 8 layers at a time, and re-circulating pixels for extra passes that needed it a kind of compute shader depth peel(剥离) but with load balancing its goal.

 
 


 
 

【sort/merge足够多的层次,效果好】

here’s a simpler case. fine when your sort/merge algo has enough layers. but if we limit it to fewer blended voxels than necessary…

 
 


 
 

【如果我们限制更少的体素混合,会出错】

I couldn’t avoid ugly artefacts.

 
 

in the end, the ‘hard’ no-fuzz/no-oit shader was what went over the fence to the designers, who proceeded to work with dreams with a ‘hard’ look while I flailed in OIT land.

【最后,清晰的没有模糊没有OIT的shader那是梦想,不是现实】

 
 


 
 

see what I mean about failure?

and this is over the period of about 2 years, at this point

【到此我们花了超过两年时间】

 
 


 
 

I think this is a really cool technique, its another one we discarded(丢弃) but I think it has some legs for some other project.

I call it the refinement renderer.

【精致渲染,作者最后放弃使用了,但是可能很多项目会用到。】

 
 


 
 

there are very few screenshots of this as it didn’t live long, but its interestingly odd. have this sort of image in your mind for the next few slides. note the lovely pre-filtered AA, the soft direct lighting (shadows but no shadow maps!). but this one is pure compute, no rasterised mini cubes.

the idea is to go back to the gigavoxels approach of tracing eye rays through fuzz directly… but find a way to make it work for scenes made out of a large number of independently moving objects. I think if you squint(斜眼) a bit this technique shares some elements in common with what Daniel Wright is going to present in the context of shadows; however since this focuses on primary-ray rendering, I’m not going to steal any of his thunder! phew.

【idea是回到gigavoxel的方法,专注光线渲染】

 
 


 
 

a bit of terminology(术语) – we call post projection voxels(后投影像素) – that is, little pieces of view frustum- ‘froxels’ as opposed to square voxels. The term originated at the sony WWS ATG group, I believe.

if you look at a ray marcher like many of the ones on shadertoy , like iq’s famous cloud renderer, you can think of the ray steps as stepping through ‘froxels’.

【froxels定义:视锥体里面分割的小voxel】

 
 


 
 

https://www.shadertoy.com/view/XslGRr – clouds by iq

typically you want to step the ray exponentially so that you spend less time sampling in the distance.

 
 

Intuitively(直观的) you want to have ‘as square as possible’ voxels, that is, your step size should be proportional(成比例的) to the inverse of the projected side length, which is 1/1/z, or z. so you can integrate and you get slices spaced as t=exp(A*i) for some constant A (slice index i), or alternatively write it iteratively as t+=K*t at each step for some constant K.

【直观上你想要的是立方体形状的voxels,透视投影以后froxels在相机空间的数学上应该是成比例】

 
 


 
 

the only problem with this is that near the eye, as t goes to 0, you get infinitely small froxel slices. oh dear. if you look at iq’s cloud example, you see this line:

【问题在于非常靠近你眼睛的部分按照上面的方法会获得无限小,当然不能这么做,iq的做法如下:】

 
 


 
 

t += max(0.1,0.02*t);

which is basically saying, let’s have even slicing up close then switch to exponential after a while.

I’ve seen this empirically(经验) used a few times. here’s an interesting (?) insight. what would real life do? they dont have pinhole cameras.

【做法就是限制最小值】

 
 


 
 

so, consider a thin lens DOF model for a second. what if you tuned your froxel sampling rate not just for projected pixel size, but for projected bokeh(背景虚化) size. the projected bokeh radius is proportional to (z-f)/z, so we want A(z-f)/z + 1/z where A is the size in pixels of your bokeh at infinity. (the +1/z is the size of single ‘sharp’ pixel, i.e. the footprint of your AA filter)

【考虑镜头相机而不是针孔相机,镜头虚化效果】

 
 

if you put this together, you can actually compute two exponential slicing rates – one for in front of the focal plane, and one for behind.

at the focal plane, it’s the same step rate you would have used before, but in the distance it’s a little sparser, and near to the camera it’s WAY faster. extra amusingly, if you work through the maths, if you set A to be 1 pixel, then the constant in the ‘foreground’ exponential goes to 0 and it turns out that linear slicing is exactly what you want. so the empirical ‘even step size’ that iq uses, is exactly justified if you had a thin lens camera model with aperture such that bokeh-at-infinity is 1pixel across on top of your AA. neat! for a wider aperture, you can step faster than linear.

【同时考虑上面两个问题,你要考虑的是两个指数切片率:从焦平面到behand面。在焦平面就是不用考虑对焦的指数关系,和第一个问题一样。最终结果是焦平面周围比较密,两头稀疏。】

 
 


 
 

ANYWAY.

how does this relate to rendering lots of objects?

the idea I had was to borrow from the way the evaluator works. you start by dividing your frustum into coarse froxels. I chose 64th res, that is about 32×16 in x and y, with 32-64 in z depending on the far z and the DOF aperture. (blurrier dof = fewer slices needed, as in previous slides).

then you do a simple frustum vs object intersection test, and build a list per froxel of which objects touch it.

{pic}

 
 

【好开始考虑如何渲染大量的objects:首先切分frustum成coarse froxels,然后做相交测试找出所有的与object相交的froxels】

 
 


 
 

then, you recursively subdivide your froxels!

for each froxel, in a compute shader you split them into 8 children. as soon as your froxel size matches the size of gigavoxel prefiltered voxels, you sample the sparse octree of the object (instead of just using OBBs) to futher cull your lists.

【然后再对froxel细分8个孩子(八叉树),通过这种方式进一步缩小你的froxel列表】

 
 


 
 

as you get finer and finer, the lists get shorter as the object’s shape is more accurately represented. it’s exactly like the evaluator, except this time we have whole objects stored as gigavoxel trees of bricks (instead of platonic SDF elements in the evaluator), we don’t support soft blend, and our domain is over froxels, not voxels.

【分得越来越细则对你想标识的对象描述得越来越精确,直到每一个froxel都被gigavoxel树的一个brick来表示】

 
 


 
 

for the first few steps, I split every froxel in parallel using dense 3d volume textures to store pointers into flat tables of per froxel lists. however at the step that refines from 1/16th res to 8th res (128x64x128 -> 256x128x256) the dense pointer roots get too expensive so I switch to a 2d representation, where every pixel has a single list of objects, sorted by z.

the nice thing is that everything is already sorted coming out of the dense version, so this is really just gluing together a bunch of small lists into one long list per screen pixel.

each refine step is still conceptually splitting froxels into 8, but each pixel is processed by one thread, serially, front to back.

that also means you can truncate the list when you get to solid – perfect, hierarchical occlusion culling!.

SHOW ME THE PICTURES! OK

the results were pretty

【一开始我们采用3D volume texture来表示纹理,但是太耗资源,后来我们改用2d来表示:每个pixel有一个object列表,按z值排序。】

 
 

 
 


 
 

and the pre-filtered look is really special.

Look how yummy(美味) the overlap of the meshes is! Really soft, and there’s no ‘post’ AA there. It’s all prefiltered.

【这样对于overlap的区域得到了很好的效果】

 
 

so I did a bit of work on lighting; a kind of 3d extension of my siggraph 2006 advances talk.

【接下来是灯光的处理,是对我的06年的siggraph paper的高级拓展】

 
 


 
 

imagine this setup. this is basically going to be like LPV with a voxelized scene, except we use froxels instead of voxels, and we propagate(传播) one light at a time in such a way that we can smear(涂抹) light from one side of the frustum to another in a single frame, with nice quality soft shadows. ‘LPV for direct lights, with good shadows’, if you

will.

【其实这个场景基本就是a voxelized scene,只是我们用froxel的概念代替了voxel。那么基本做法也就和voxel类似。】

 
 


 
 

imagine a single channel dense froxel grid at low resolution, I think I used 256x128x256 with 8 bits per froxel. We will have one of those for the ‘density’ of the scene – defined everywhere inside the camera frustum.

– As a side effect of the refinement process I write that ‘density’ volume out, more or less for free. Now we are also going to have one extra volume texture for each ‘hero’ light. (I did tests with 4 lights).

STOP PRESS – as far as I can tell from the brilliant morning session by frostbite guys, they have a better idea than the technique I present on the next few slides. They start from the same place -a dense froxel map of ‘density’, as above, but they resample it for each light into a per light 32^3 volume, in light-space. then they can smear density directly in light space. This is better than what I do over the next few slides, I think. See their talk for more!

【考虑一种低分辨率下的场景,只考虑视域范围内的东西】

 
 


 
 

To wipe(擦拭) the light around, you set the single froxel where the light is to ‘1’ and kick a compute shader in 4 froxel thick ‘shells’ radiating out from that central light froxel.

(with a sync between each shell). Each thread is a froxel in the shell, and reads (up to) 4 trilinear taps from the density volume, effectively a short raycast towards the light.

Each shell reads from the last shell, so it’s sort of a ‘wipe’ through the whole frustum.

【考虑从光源所在位置向外发散,直到扩散到视域内的每一个区域。】

 
 


 
 

here come the shells! each one reads from the last. yes, there are stalls(摊位,档位). no, they’re not too bad as you can do 4 lights and pipeline it all.

 
 


 
 


 
 

The repeated feedback causes a pleasant blur in the propagated shadows.

it’s like LPV propagation, except that it’s for a single light so you have no direction confusion(混乱), and you can wipe from one side of the screen to the other with a frame, since you process the froxels strictly in order radiating out from the light.

You can jitter the short rays to simulate area lights. You do 4 lights at once, to overlap the syncs, and you do it on an async pipe to mop up space on your compute units so the syncs don’t actually hurt that much. (offscreen lights are very painful to do well and the resolution is brutally low). However the results were pretty, and the ‘lighting’ became simple coherent volume texture lookups.

PICS PLZ:

【这可以在传播阴影的过程中有一个完美的模糊,这种方式类似LPV,只是不会方向混乱。】

【这里要看的是LPV是什么方法。】

 
 


 
 

Look ma! no shadowmaps!

would be super cool for participating media stuff, since we also have the brightness of every light conveniently stored at every froxel in the scene. I didn’t implement it

though….

【效果,秒杀shadowmap】

 
 


 
 

Ambient occlusion was done by simply generating mip-maps of the density volume and sampling it at positions offset from the surface by the normal, ie a dumb very wide cone trace. (大锥痕迹)

【Ambient occlusion效果也好】

 
 

The geometric detail and antialiasing was nice:

【几何细节效果也很好】

 
 


 
 

You could also get really nice subsurface effects by cone tracing the light volumes a little and turning down the N.L term:

【 subsurface effects 效果也很好】

 
 


 
 


 
 

However- the performance was about 4x lower than what I needed for PS4 (I forget the timings, but it was running at 30 for the scenes above ñ but only just! For more complex scenes, it just died). The lighting technique and the refinement engine are separate ideas, but they both had too many limitations and performance problems that I didn’t have time to fix.

【上面说了那么多,但是性能不行,白说】

 
 


 
 

(ie I still think this technique has legs, but I can’t make it work for this particular game)

in particular, since edge pixels could still get unboundedly ‘deep’, the refinement lists were quite varied in length, I needed to jump through quite a few hoops to keep the GPU well load balanced. I also should have deferred lighting a bit more – I lit at every leaf voxel, which was slow. however everything I tried to reduce (merge etc) led to visible artefacts. what I didn’t try was anything stochastic(随机). I had yet to fall in love with ‘stochastic all the things’…. definitely an avenue to pursue.

We were also struggling with the memory for all the gigavoxel bricks.

【目前的游戏未采用,但还是有一些改进方向分享给大家。每个pixel的list长度不一样,因此GPU处理的时候需要考虑balance。另外这种方法内存也是个问题。】

 
 


 
 

The nail in the coffin was actually to do with art direction.

【把棺材钉起来的活事实上都是艺术指导做的】

 
 

directly rendering the distance field sculptures was leaving very little to the imagination . So it was very hard to create ‘good looking’ sculptures; lots of designers were creating content that basically looked like untextured unreal-engine, or ‘crap’ versions of what traditional poly engines would give you, but slower. It was quite a depressing time because as you can see it’s a promising tech, but it was a tad too slow and not right for this project.

TL;DR:

this is the start of 2014. we’re 3 years in, and the engine prototypes have all been rejected, and the art director (rightly) doesn’t think the look of any of them suits the

game.

argh.

SO……..

【要达到好的效果目前性能是很大的问题。我们这货干了三年,中间很多的引擎版本都被艺术指导枪毙了。】

 
 


 
 

there was a real growing uneasiness(担心) in the studio. I had been working on OIT – refinement and sorting and etc for a LONG time; in the meantime, assets were being made using the ‘hard’ variant of the bricks engine, that simply traced each 8x8x8 rasterised brick for the 0 crossing and output raw pixels which were forward lit. at its best, it produced some lovely looking results (above) – but that was more the art than the engine! It also looked rather like ‘untextured poly engine’ – why were we paying all this runtime cost (memory & time) to render bricks if they just gave us a poly look?

【我们的工作越来越让人担心,尝试了很多艺术风格很好看但是引擎上面还是难度很大。】

 
 


 
 

also, there was a growing disparity(差距) between what the art department – especially art director kareem and artist jon – were producing as reference/concept work. it was so painterly!

【艺术总监的理想与我们的现实差距越来越大】

there was one particular showdown with the art director, my great friend kareem, where he kept pointing at an actual oil painting and going ‘I want it to look like this’ and I’d say ‘everyone knows concept art looks like that but the game engine is a re-interpretation of that’ and kareem was like ‘no literally that’. it took HOURS for the penny to drop, for me to overcome my prejudice.

【比如他想要油画风格然后我说我也想要,但是引擎实现不了】

 
 


 
 

So after talking to the art director and hitting rock bottom in January 2014, he convinced me to go with a splat based engine, intentionally made to look like 3d paint strokes. I have a strong dislike of ‘painterly post fx’ especially 2d ones, so I had resisted this direction for a looooooooooong time.

(btw this is building on the evaluator as the only thing that has survived all this upheaval)

【因此到了2014年1月,我们开始搞 a splat based engine。故意把它弄的像 3d paint strokes。算是一种妥协吧】

 
 


 
 

I had to admit that for our particular application of UGC, it was *brutal(野蛮的)* that you saw your exact sculpture crisply(简明的) rendered, it was really hard to texture & model it using just CSG shapes. (we could have changed the modelling primitives to include texturing or more noise type setups, but the sculpting UI was so loved that it was notmovable. The renderer on the other hand was pretty but too slow, so it got the axe instead).

【我们采用了简明的渲染风格,是因为性能和实现方面的考虑】

 
 

So I went back to the output of the evaluator, poked simon a bit, and instead of using the gigavoxel style bricks, I got point clouds, and had a look at what I could do.

There’s a general lesson in here too – that tech direction and art direction work best when they are both considered, both given space to explore possibilities; but also able to give different perspectives on the right (or wrong) path to take.

【采用上面的评估结果来实现,同时这边也学到了:技术和艺术导演同时考虑好了以后工作最好,他们会从不同的角度去考虑最好的选择。】

 
 


 
 

So! now the plan is: generate a nice dense point cloud on the surface of our CSG sculpts.

EVERYTHING is going to be a point cloud. the SDF becomes an intermediate representation, we use it to spawn the points at evaluation time, (and also for collision. But thats another talk)

【接下来我们的计划是:在我们雕塑的表面生成密集的点云。】

【SDF又是要看的技术点】

 
 

we started from the output of the existing evaluator, which if you remember was hierarchically refining lists of primitives to get close to voxels on the surface of the SDF. as it happens, the last refinement pass is dealing in 4x4x4 blocks of SDF to match GCN wavefronts of 64 threads.

【层次细化,最后的4*4*4则匹配64个线程执行。】

 
 


 
 

We add one point to the cloud per leaf voxel (remember, thats about 900^3 domain, so for example, a sphere model will become a point cloud with diameter 900 and one point per integer lattice cell that intersects the sphere surface)

【对于每一个叶节点的voxel add a point】

 
 

actually we are using a dual grid IIRC so that we look at a 2x2x2 neighbourhood of SDF values and only add points where there is a zero crossing.

So now we have a nice fairly even, dense point cloud. Since the bounding voxel grid is up to around 900^3 voxels -> around 2 million surface voxels -> around 2 million points.

【最终我们将得到匹配边界表面的密集的点云】

 
 


 
 

The point cloud is sorted into Hilbert order (actually, 4^3 bricks of voxels are in Hilbert order and then the surface voxels inside those bricks are in raster order, but I digress) and cut into clusters of approximately 256 points (occasionally there is a jump in the hilbert brick order so we support partially filled clusters, to keep their bounding boxes tight).

【点云是按照Hilbert order排序好的,然后切成点集群,每个包含大约256个点。】

 
 


 
 

Each cluster is tightly bounded in space, and we store for each a bounding box, normal bounds. then each point within the cluster is just one dword big, storing bitpacked pos,normal,roughness, and colour in a DXT1 texture. All of which is to say, we now have a point cloud cut into lumps of 256 points with a kind of VQ compression per point. We also compute completely independent cluster sets for each LOD – that is, we generate point clouds and their clusters for a ‘mip pyramid’ going from 900 voxels across, to 450, to 225, etc.

【每个簇紧贴空间边界,我们存储其包围盒和normal bounds。簇中每一个点也要存一些信息。这样我们就把点云层次化了,簇还可以用来实现LOD,用来压缩数据和提高性能。】

 
 


 
 

I can’t find many good screenshots but here’s an example of the density, turned down by a factor of 2x to see what’s going on.

【density的例子】

 
 

my initial tests here were all PS/VS using the PS4 equivalent of glPoint. it wasn’t fast, but it showed the potential. I was using russian roulette(俄罗斯轮盘赌) to do ‘perfect’ stochastic LOD, targeting a 1 splat to 1 screen pixel rate , or just under.

【一开始我们尝试的方法不够快,所以我们尝试了LOD的方式,俄罗斯轮盘赌是用来LOD的删点机制】

 
 

At this point we embraced(拥抱) TAA *bigtime* and went with ‘stochastic all the things, all the time!’. Our current frame, before TAA, is essentially verging on white noise. It’s terrifying. But I digress!

【这里一个关键点是我们采用了TAA技术,没有采用TAA会导致存在很大的Noise,效果不可接受。】

 
 


 
 

for rendering, we arranged the clusters for each model into a BVH. we also computed a separate point cloud, clustering and BVH for each mipmap (LOD) of the filtered SDF. to smooth the LOD transitions, we use russian roulette to adapt the number of points in each cluster from 256 smoothly down to 25%, i.e. 256 down to 64 points per cluster, then drop to the next LOD.

simon wrote some amazingly nicely balanced CS splatters that hierarchically culled and refined the precomputed clusters of points, computes bounds on the russian roulette rates, and then packs reduced cluster sets into groups of ~64 splats.

【再次解释了一遍LOD,一次删点到原来的25%】

【SDF,BVH要译者注】

 
 

so in this screenshot the color cycling you can see is visualizing the steps through the different degrees of decimation(不同程度的抽取), from <25%, <50%, <75%, then switching to a completely different power of 2 point cloud;

【就是删点程度不同的效果展示】

 
 


 
 

What you see is the ‘tight’ end of our spectrum. i.e. the point clouds are dense enough that you see sub pixel splats everywhere. The artist can also ‘turn down’ the density of points, at which point each point becomes a ‘seed’ for a traditional 2d textured quad splat. Giving you this sort of thing:

【首先看到的是我们的范围的’严密’端,就是不删。】

 
 


 
 


 
 

We use pure stochastic transparency(纯随即透明度), that is, we just randomly discard pixels based on the alpha of the splat, and let TAA sort it out. It works great in static scenes.

However the traditional ‘bounding box in color space’ to find valid history pixelsí starts breaking down horribly with stochastic alpha, and we have yet to fully solve that.

So we are still in fairly noisy/ghosty place. TODO!

We started by rendering the larger strokes – we call them megasplats – as flat quads with the rasterizer. thats what you see here, and in the E3 trailer.

【随机删的效果】

 
 


 
 

Interestingly , simon tried making a pure CS ‘splatting shader’, that takes the large splats, and instead of rasterizing a quad, we actually precompute a ‘mini point cloud’ for the splat texture, and blast(爆破) it to the screen using atomics, just like the main point cloud when it’s in ‘microsplat’ (tight) mode.

【继续废话,不用管】

 
 


 
 

So now we have a scene made up of a whole cloud of sculpts…

【到此为止我们全部采用点云来刻画场景】

 
 


 
 

which are point clouds,

 
 


 
 

and each point is itself, when it gets close enough to the camera, an (LOD adapted) ‘mini’ point cloud – Close up, these mini point clouds representing a single splat get ‘expanded’ to a few thousand points (conversely, In the distance or for ‘tight’ objects, the mini points clouds degenerate to single pixels).

Amusingly, the new CS based splatter beats(飞溅的节拍) the rasterizer due to not wasting time on all the alpha=0 pixels. That also means our ‘splats’ need not be planar any more, however, we don’t yet have an art pipe for non-planar splats so for now the artists don’t know this! Wooahaha!

【并且采用了点云LOD技术来提高效率:根据点位置来判断其渲染方式,对于透明的pixel不去浪费时间处理。】

 
 


 
 

That means that if I were to describe what the current engine is, I’d say it’s a cloud of clouds of point clouds. 🙂

【如果让我来描述引擎的特点: it’s a cloud of clouds of point clouds】

 
 


 
 

Incidentally, this atomic based approach means you can do some pretty insane things to get DOF like effects: instead of post blurring, this was a quick test where we simply jittered the splats in a screenspace disc based on COC, and again let the TAA sort it all out.

It doesn’t quite look like blur, because it isn’t – its literally the objects exploding a little bit – but it’s cool and has none of the usual occlusion artefacts 🙂

【顺便提一下,这种原子级的技术给与了特别高的自由度,比如实现特效也特别方便。像这个景深特效效果就非常好。】

 
 

We’ve left it in for now as our only DOF.

【我们将它留在现在作为我们唯一的自由度。】

 
 


 
 

I should at this point pause to give you a rough outline of the rendering pipe – it’s totally traditional and simple at the lighting end at least.

We start with 64 bit atomic min (== splat of single pixel point(单个像素点的图示)) for each point into 1080p buffer, using lots of subpixel jitter and stochastic(随机) alpha. There are a LOT of points to be atomic-min’d! (10s of millions per frame) Then convert that from z+id into traditional 1080 gbuffer, with normal, albedo, roughness, and z. then deferred light that as usual.

Then, hope that TAA can take all the noise away. 😉

【到此展示一下渲染流水线,其简单和传统。】

【对于每个点 64 bit 来表示的时候处理subpixel jitter和随机透明度(就是上面讲的过程),然后把z+id转到传统的gbuffer(with normal, albedo, roughness, and z),再采用光照,最后noise交给TAA处理。】

 
 


 
 

I’m not going to go into loads of detail about this, since I don’t have time, but actually for now the lighting is pretty vanilla – deferred shading, cascaded shadow map sun.

there are a couple of things worth touching on though.

【这边没时间将太多细节了关于光照和阴影】

【这里是 the lighting is pretty vanilla – deferred shading, cascaded shadow map sun 的效果】

 
 


 
 

ISMs: Now we are in loads-of-points land, we did the obvious thing and moved to imperfect shadow maps. We have 4 (3?) cascades for a hero sun light, that we atomicsplat into and then sample pretty traditionally (however, we let the TAA sort out a LOT of the noise since we undersample and undersplat and generally do things quite poorly)

【阴影进化效果:ISM 】

 
 

We have a budget of 64 small (128×128) shadowmaps, which we distribute over the local lights in the scene, most of which the artists are tuning as spotlights. They are brute force splatted and sampled, here were simonís first test, varying their distribution over an area light:

【我们对于shadowmaps的做法,数量大小和使用方法】

 
 


 
 

these images were from our first test of using 64 small ISM lights, inspired by the original ISM paper and the ‘ManyLODs’ paper. the 3 images show spreading a number of low quality lights out in an area above the object.

【ISM我们尝试的两种方法】

 
 

Imperfect Shadow Maps for Efficient Computation of Indirect Illumination

T. Ritschel, T. Grosch, M. H. Kim, H.-P. Seidel, C. Dachsbacher, J. Kautz

http://resources.mpi-inf.mpg.de/ImperfectShadowMaps/ISM.pdf

 
 

ManyLoDs http://perso.telecom-paristech.fr/~boubek/papers/ManyLoDs/

Parallel Many-View Level-of-Detail Selection for Real-Time Global Illumination

Matthias Holländer, Tobias Ritschel, Elmar Eisemann and Tamy Boubekeur

 
 


 
 

I threw in solid-angle esque equi-angular sampling of participating media for the small local lights. See https://www.shadertoy.com/view/Xdf3zB for example implementation. Just at 1080p with no culling and no speedups, just let TAA merge it. this one will DEFINITELY need some bilateral blur and be put into a separate layer, but for now It ís not:

【对灯光的采样方法:3D等角采样】

 
 


 
 

(just a visualisation classic paraboloid projection on the ISMs)

sorry for the quick programmer art, DEADLINES!

【ISM效果】

 
 


 
 

this ‘vanilla’ approach to lighting worked surprisingly well for both the ‘tight’ end… (single pixel splats, which we call microsplats)… as well as

【this ‘vanilla’ approach对于灯光的处理在 microsplats 和 gigasplates 一样的好】

 
 


 
 

…the loose end (‘megasplats’).

 
 


 
 

this was the first time I got specular in the game! two layers of loose splats, the inner layer is tinted red to make it look like traditional oil underpainting. then the specular hi lights from the environment map give a real sense of painterly look. this was the first image I made where I was like ‘ooooh maybe this isn’t going to fail!’

【我们第一次尝试在游戏中加入镜面光,两层的loose splats,里面一层加入红色元素模拟传统油画,外面那层镜面反射环境贴图来模拟真实的画家的感觉。】

 
 


 
 

At this point you’ll notice we have painterly sky boxes. I wanted to do all the environment lighting from this. I tried to resurrect my previous LPV tests, then I tried ‘traditional’ Kapalanyan style SH stuff, but it was all too muddy and didn’t give me contact shadows nor did it give me ‘dark under the desk’ type occlusion range.

【sky box:尝试了很多中,我们希望环境光从这里得到,但是这些方法最后都没错采用,因为引入光照阴影模型比较麻烦。】

 
 

For a while we ran with SSAO only, which got us to here (point clouds give you opportunities to do ridiculous geometrical detail, lol)

【最终只采用了SSAO】

 
 


 
 

the SSAO we started with was based on Morgan McGuire’s awesome alchemy spiral style SSAO, but then I tried just picking a random ray direction from the cosine weighted hemisphere above each point and tracing the z buffer, one ray per pixel (and let the TAA sort it out ;)) and that gave us more believable occlusion, less like dirt in

the creases.

【我们的SSAO:一开始是:Morgan McGuire’s awesome alchemy spiral style SSAO,对于ray的选择做了修改,为了使画面看起来更脏。】

 
 

From there it was a trivially small step to output either black (occluded) or sky colour (from envmap) and then do a 4×4 stratified dither. here it is without TAA (above).

However this is still just SSAO in the sense that the only occluder is the z buffer.

【SSAO without TAA】

 
 

(random perf stat of the atomic_min splatter: this scene shows 28.2M point splats, which takes 4.38ms, so thats about 640 million single pixel splats per second)

【效率统计】

 
 


 
 

For longer range, I tried voxelizing the scene – since we have point clouds, it was fairly easy to generate a work list with LOD adapted to 4 world cascades, and atomic OR each voxel – (visualised here, you can see the world space slices in the overlay) into a 1 bit per voxel dense cascaded volume texture

【一开始我们想要采用的方法是体素,最终我们采用了点云。容易LOD等行为。】

 
 


 
 

then we hacked the AO shader to start with the z buffer, and then switch to the binary voxelization, moving through coarser and coarser cascades. it’s cone-tracing like, in that I force it to drop to lower cascades (and larger steps), but all the fuzziness is from stochastic sampling rather than prefiltered mip maps. The effect is great for mid range AO – on in the left half, off in the right.

 
 

That gets us to more or less where we are today, rough and noisy as hell but extremely simple.I really like the fact you get relatively well defined directional occlusion(遮挡) , which LPV just can’t give you due to excessive diffusion(过度扩散).

 
 

【AO的细节:通过z buffer的随机采样的AO效果比pre filtered mip map的AO好很多。】

 
 


 
 

(at this point we’re in WIP land! like, 2015 time!)

The last test, was to try adding a low resolution world space cascade that is RGB emissive, and then gather light as the sky occlusion rays are marched. The variance is INSANELY high, so it isn’t usable, and this screenshot is WITH taa doing some temporal averaging! But it looks pretty cool. It might be enough for bounce light (rather than direct light, as above), or for extremely large area sources. I don’t know yet. I’m day dreaming about maybe making the emissive volume lower frequency (-> lower variance when gathered with such few samples) by smearing it around with LPV, or at least blurring it. but I haven’t had a chance to investigate.

【对于low resolution的世界空间的点云的自发光处理】

 
 


 
 

Oh wait I have! I just tried bilateral filtering and stratified sampling over 8×8 blocks, it does help a lot.

I think the general principle of z buffer for close, simple bitmask voxelization for further range gather occlusion is so simple that it’s worth a try in almost any engine. Our voxel cascades are IIRC 64^3, and the smallest cascade covers most of the scene, so they’re sort of mine-craft sized voxels or just smaller at the finest scale. (then blockier further out, for the coarser cascades). But the screenspace part captures occlusion nicely for smaller than voxel distances.

【做法就是filter 模糊: bilateral filtering and stratified sampling over 8×8 blocks】

 
 


 
 

another bilateral test pic. WIP 😉

 
 


 
 

and that’s pretty much where we are today!

as a palette cleanser, here’s some non-testbed, non-programmer art

 
 


 
 


 
 


 
 

It feels like we’re still in the middle of it all; we still have active areas of R&D; and as you can see, many avenues didn’t pan out for this particular game. But I hope that you’ve found this journey to be inspiring in some small way. Go forth and render things in odd ways!

【我们之做完了一半的工作量】

 
 


 
 

The artwork in this presentation is all the work of the brilliant art team at MediaMolecule. Kareem, Jon (E & B!), Francis, Radek to name the most prominent authors of the images in this deck. But thanks all of MM too! Dreams is the product of at least 25 fevered minds at this point.

And of course @sjb3d and @antonalog who did most of the engine implementation, especially of the bits that actually weren’t thrown away 🙂

Any errors or omissions are entirely my own, with apologies.

if you have questions that fit in 140 chars I’ll do my best to answer at @mmalex.

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

SIGGRAPH 15 – Physically Based and Unified Volumetric Rendering in Frostbite

作者:

Sebastien Hillaire – Electronic Arts / frostbite

sebastien.hillaire@frostbite.com

https://twitter.com/SebHillaire

 
 

 
 

  • introduction

 
 

Physically based rendering in Frostbite


基于物理渲染的效果非常好!

 
 

Volumetric rendering in Frostbite was limited

  • Global distance/height fog
  • Screen space light shafts
  • Particles

体素渲染在这里还是受到限制的,主要受限于这三点


 
 

 
 

Real-life volumetric 真实的体素


我们期望做到的就是自然界中的这些 云与大气层,雾,光线散射等效果

 
 

 
 

  • Related Work

 
 

Billboards

 
 

Analytic fog [Wenzel07]

Analytic light scattering(散射) [Miles]

特点:Fast,Not shadowed,Only homogeneous media

http://blog.mmacklin.com/2010/05/29/in-scattering-demo/

http://research.microsoft.com/en-us/um/people/johnsny/papers/fogshop-pg.pdf

http://indus3.org/atmospheric-effects-in-games/

 
 


 
 

Screen space light shaft 屏幕空间的光轴

  • Post process [Mitchell07]
  • Epipolar sampling [Engelhardt10]

特点

  • High quality
  • Sun/sky needs to be visible on screen
  • Only homogeneous media 均匀介质
  • Can go for Epipolar sampling but this won’t save the day

 
 


 
 

Splatting(泼溅)

  • Light volumes
    • [Valliant14][Glatzel14][Hillaire14]
  • Emissive volumes [Lagarde13]

This can result in high quality scattering but usually it does not match the participating media of the scene. (这种方法已经很常用了,但是相对独立处理)


 
 


 
 

 
 

Volumetric fog [Wronski14] 体积雾

  • Sun and local lights
  • Heterogeneous media

allowing spatially varying participating media and local lights to scatter.

spatially 参与 (scatter)散射,此做法与这边作者的想法一致

However it did not seem really physically based at the time and some features we wanted were missing.

缺点是不是很符合物理规则

 
 


 
 

 
 

  • Scope and motivation

 
 

Increase visual quality and give more freedom to art direction!(更好的视觉效果)

 
 

Physically based volumetric rendering (物理)

  • Meaningful material parameters
  • Decouple(去耦合) material from lighting
  • Coherent(一致性) results

We want it to be physically based: this means that participating media materials are decoupled from the light sources (e.g. no scattering colour on the light entities). Media parameters are also a meaningful set of parameters. With this we should get more coherent results that are easier to control and understand.

 
 

Unified volumetric interactions(交互)

  • Lighting + regular and volumetric shadows
  • Interaction with opaque, transparent and particles

Also, because there are several entities interacting with volumetric in Frostbite (fog, particles, opaque&transparent surfaces, etc). We also want to unify the way we deal with that to not have X methods for X types of interaction.

 
 


 
 

This video gives you an overview of what we got from this work: lights that generate scattering according to the participating media, volumetric shadow, local fog volumes, etc.

And I will show you now how we achieve it.

先秀结果(视频见投影片)


 
 

 
 

 
 

  • Volumetric rendering

 
 

  • Single Scattering

 
 

As of today we restrict ourselves to single scattering when rendering volumetric. This is already challenging to get right. (单看一条)

 
 

When a light surface interact with a surface, it is possible to evaluate the amount of light bounce to the camera by evaluating for example a BRDF. But in the presence of participating media, things get more complex. (一条光线与物理世界的交互是很复杂的)

 
 

  1. You have to take into account transmittance when the light is traveling through the media(考虑光源到物体的传输介质影响)
  2. Then you need to integrate the scattered light along the view ray by taking many samples(物体表面整合散射过来的光)
  3. For each of these samples, you also need to take into account transmittance to the view point(考虑光从物体到相机的传输介质的影响)
  4. You also need to integrate the scattered light at each position(相机各个位置收集所有散射结果)
  5. And take into account phase function, regular shadow map (opaque objects) and volumetric shadow map (participating media and other volumetric entity)(考虑相位函数,普通阴影贴图(不透明的物体)和体积阴影贴图(与会媒体和其他体积实体))

 
 

 
 


 
 


 
 

公式里面存在两个积分标识的就是上面2,4两条解释的散射整合。

求和表示的是sample光线

 
 

  • Clip Space Volumes

 
 

Frustum aligned 3D textures [Wronski14]

  • Frustum voxel in world space => Froxel J

As in Wronski, All our volumes are 3d textures that are clip space aligned (such voxels become Froxels in world space, Credit Alex Evans and Sony ATG J, see Learning from Failure: a Survey of Promising, Unconventional and Mostly Abandoned Renderers for ‘Dreams PS4′, a Geometrically Dense, Painterly UGC Game’, Advances in Real-Time Rendering course, SIGGRAPH 2015).

 
 

Note: Frostbite is a tiled-based deferred lighting(平铺的延迟光照)

  • 16×16 tiles with culled light lists

 
 

Align volume tiles on light tiles

  • Reuse per tile culled light list
  • Volume tiles can be smaller (8×8, 4×4, etc.)
  • Careful correction for resolution integer division

 
 

This volume is also aligned with our screen light tiles. This is because we are reusing the forward light tile list culling result to accelerate the scattered light evaluation (remember, Frostbite is a tile based deferred lighting engine).

 
 

Our volume tiles in screen space can be smaller than the light tiles (which are 16×16 pixels).

 
 

By default we use

Depth resolution of 64

8×8 volume tiles

 
 

720p requires 160x90x64 (~7mb per rgbaF16 texture)

1080p requires 240x135x64 (~15mb per rgbaF16 texture)

 
 


 
 

 
 

  • Data flow

 
 


 
 

This is an overview of our data flow.

We are using clip space volumes(使用裁剪空间体素) to store the data at different stages of our pipeline.

 
 

We have material properties(材料特性) which are first voxelised from participating media entities.

 
 

Then using light sources of our scene(场景光源) and this material property volume(材料特性体素) we can generate scattered light data per froxel. This data can be temporally upsampled to increase the quality. Finally, we have an integration(积分) step that prepares the data for rendering.

 
 

  1. Participating media material definition (对应图上第一部分)

 
 

Follow the theory [PBR]

  • Absorption 𝝈𝒂 (m^-1) 【吸收】

Absorption describing the amount of light absorbed by the media over a certain path length

  • Scattering 𝝈𝒔 (m^-1) 【散射】

Scattering describing the amount of light scattered over a certain path length

  • Phase 𝒈 【相位】

And a single lobe phase function describing how the light bounces on particles (uniformly, forward scattering, etc.). It is based on Henyey-Greenstein (and you can use the Schlick approximation).

  • Emissive 𝝈𝒆 (irradiance.m-1) 【自发光】

Emissive describing emitted light

  • Extinction 𝝈𝒕 = 𝝈𝒔 + 𝝈𝒂 【消失】
  • Albedo 𝛒 = 𝝈𝒔 / 𝝈𝒕 【返照光】

 
 

Artists can author {absorption, scattering} or {albedo, extinction}

  • Train your artists! Important for them to understand their meaning!

As with every physically based component, it is very important for artists to understand them so take the time to educate them.

(美术需要相关物理知识!)

 
 


 
 

Participating Media(PM) sources

  • Depth fog
  • Height fog
  • Local fog volumes
    • With or W/o density textures

 
 

Depth/height fog and local fog volumes are entities(实体的) that can be voxelized. You can see here local fog volumes as plain or with varying density(密度) according to a density texture.

 
 

下面解释 数据结构及存储。

 
 

Voxelize PM properties into V-Buffer

  • Add Scattering, Emissive and
    Extinction
  • Average Phase g (no multi lobe)
  • Wavelength independent 𝝈𝒕 (for now)

 
 

We voxelize(体素化) them into a Vbuffer analogous(类似的) to screen Gbuffer but in Volume (clip space). We basically add all the material parameters together since they are linear. Except the phase function which is averaged. We only also only consider a single lobe for now according to the HG phase function.

 
 

We have deliberately(故意) chosen to go with wavelength independent(波长无关) extinction(消失) to have cheaper volumes (material, lighting, shadows). But it would be very easy to extend if necessary at some point.

 
 

Supporting emissive is an advantage for artist to position local fog volume that emit light as scattering would do but that do not match local light. This can be used for cheap ambient lighting. (自发光是可选项)

 
 

 
 


 
 

V-Buffer (per Froxel data)

  

  

  

Format

Scattering R

Scattering G

Scattering B

Extinction

RGBA16F

Emissive R

Emissive G

Emissive B

Phase (g)

RGBA16F

 
 

 
 

  1. 1 Froxel integration (对应图上第二部分)

 
 

Per froxel

  • Sample PM properties data
  • Evaluate
    • Scattered(稀疏的) light 𝑳𝒔𝒄𝒂𝒕(𝒙𝒕,𝝎𝒐)
    • Extinction

 
 

For each froxel, one thread will be in charge of gathering scattered light and extinction.

 
 

Extinction is simply copied over from the material. You will see later why this is important for visual quality in the final stage (to use extinction instead of transmittance for energy conservative scattering). Extinction is also linear so it will be better to temporally integrate it instead of the non linear transmittance value. (线性的 Extinction就够了)

 
 

Scattered light:

  • 1 sample per froxel
  • Integrate all light sources: indirect light + sun + local lights

 
 


 
 

Sun/Ambient/Emissive

 
 

Indirect light on local fog volume

  • From Frostbite diffuse SH light probe
    • 1 probe(探测) at volume centre
    • Integrate w.r.t. phase function as a SH cosine lobe [Wronski14]

 
 

Then we integrate the scattered light. One sample per froxel.

 
 

We first integrate ambient the same way as Wronski. Frostbite allows us to sample diffuse SH light probes. We use one per local fog volume positioned at their centre.

 
 

We also integrate the sun light according to our cascaded shadow maps. We could use exponential(指数) shadow maps but we do not as our temporal up-sampling is enough to soften the result.

 
 

You can easily notice the heterogeneous nature of the local fog shown here.

 
 


 
 

Local lights

  • Reuse tiled-lighting code
  • Use forward tile light list post-culling
  • No scattering? skip local lights

 
 

We also integrate local lights. And we re-use the tile culling(平铺剔除) result to only take into account lights visible within each tile.

One good optimisation is to skip it all if you do not have any scattering possible according to your material properties.

 
 

Shadows

  • Regular shadow maps
  • Volumetric shadow maps

 
 

Each of these lights can also sample their associated shadow maps. We support regular shadow maps and also volumetric shadow maps (described later).

 
 


 
 

  1. 2 Temporal volumetric integration (对应图上第二部分)

 
 

问题:

 
 

scattering/extinction sample per frame

  • Under sampling with very strong material
  • Aliasing under camera motion
  • Shadows make it worse

 
 

As I said, we are only using a single sample per froxel.

 
 

aliasing (下面两个视频见投影片,很明显的aliasing)

This can unfortunately result in very strong aliasing for very thick participating media and when integrating the local light contribution.

 
 


 
 

You can also notice it in the video, as well as very strong aliasing of the shadow coming from the tree.

 
 


 
 

解决:Temporal integration(时间积分)

To mitigate these issues, we temporally integrate our frame result with the one of previous frame. (well know, also used by Karis last year for TAA).

 
 

To achieve this,

we jitter our samples per frame uniformly along the view ray

The material and scattered light samples are jittered using the same offset (to soften evaluated material and scattered light)

Integrate (集成) each frame according to an exponential(指数) moving average

And we ignore previous result in case no history sample is available (out of previous frustum)

 
 

Jittered samples (Halton)

Same offset for all samples along view ray

Jitter scattering AND material samples in sync

 
 

Re-project previous scattering/extinction

5% Blend current with previous

Exponential moving average [Karis14]

Out of Frustum: skip history

 
 


 
 

效果很明显,先投影片视频。

 
 

仍然存在问题:

This is great and promising but there are several issues remaining:

 
 

Local fog volume and lights will leave trails when moving

One could use local fog volumes motion stored in a buffer the same way as we do in screenspace for motion blur

But what do we do when two volumes intersect? This is the same problem as deep compositing

For lighting, we could use neighbour colour clamping but this will not solve the problem entirely

 
 

This is an exciting and challenging R&D area for the future and I’ll be happy to discuss about it with you if you have some ideas J

 
 

  1. Final integration

 
 

积分

Integrate froxel {scattering, extinction} along view ray

  • Solves {𝑳𝒊(𝒙,𝝎𝒐), 𝑻𝒓(𝒙,𝒙𝒔)} for each froxel at position 𝒙𝒔

 
 

We basically accumulate near to far scattering according to transmittance. This will solve the integrated scattered light and transmittance along the view and that for each froxel.

 
 

代码示例

One could use the code sample shown here: accumulate scattering and then transmittance for the next froxel, and this slice by slice. However, that is completely wrong. Indeed there is a dependency on the accumScatteringTransmitance.a value (transmittance). Should we update transmittance of scattering first?

 
 


 
 

Final

 
 

Non energy conservative integration: (非能量守恒的集成)

 
 

You can see here multiple volumes with increasing scattering properties. It is easy to understand that integrating scattering and then transmittance is not energy conservative.

 
 


 
 

We could reverse the order of operations. You can see that we get somewhat get back the correct albedo one would expect but it is overall too dark and temporally integrating that is definitely not helping here.

 
 


 
 

So how to improve this? We know we have one light and one extinction sample.

 
 

We can keep the light sample: it is expensive to evaluate and good enough to assume it constant on along the view ray inside each depth slice.

 
 

But the single transmittance is completely wrong. The transmittance should in fact be 0 at the near interface of the depth layer and exp(-mu_t d) at the far interface of the depth slice of width d.

 
 

What we do to solve this is integrate scattered light analytically according to the transmittance in each point on the view ray range within the slice. One can easily find that the analytical integration of constant scattered light over a definite range according to one extinction sample can be reduced this equation.

Using this, we finally get consistent lighting result for scattering and this with respect to our single extinction sample (as you can see on the bottom picture).

 
 

  • Single scattered light sample 𝑆=𝑳𝒔𝒄𝒂𝒕(𝒙𝒕,𝝎𝒐) OK
  • Single transmittance sample 𝑻𝒓(𝒙,𝒙𝒔) NOT OK

 
 

è Integrate lighting w.r.t. transmittance over froxel depth D


 
 


 
 

Also improves with volumetric shadows

You can also see that this fixes the light leaking we noticed sometimes for relatively large depth slices and strongly scattering media even when volumetric shadow are enabled.

 
 


 
 

Once we have that final integrated buffer, we can apply it on everything in our scene during the sky rendering pass. As it contains scattered light reaching the camera and transmittance, it is easy to apply it as a pre-multiplied colour-alpha on everything.

 
 

For efficiency, it is applied per vertex on transparents but we are thinking of switching this to per pixel for better quality.

 
 

  • {𝑳𝒊(𝒙,𝝎𝒐), 𝑻𝒓(𝒙,𝒙𝒔)} Similar to pre-multiplied color/alpha
  • Applied on opaque surfaces per pixel
  • Evaluated on transparent surfaces per vertex, applied per pixel

 
 


 
 

 
 

Result validation

 
 

Our target is to get physically based results. As such, we have compared our results against the physically based path tracer called Mitsuba. We constrained Mitsuba to single scattering and to use the same exposure, etc. as our example scenes.

 
 

Compare results to references from Mitsuba

  • Physically based path tracer
  • Same conditions: single scattering only, exposure, etc.

 
 

The first scene I am going to show you is a thick participating media layer with a light above and then into it.

 
 


 
 

You can see here the frostbite render on top and Mitsuba render at the bottom. You can also see the scene with a gradient applied to it. It is easy to see that our result matches, you can also recognize the triangle shape of scattered light when the point lights is within the medium.

 
 

This is a difficult case when participating media is non uniform and thick due to our discretisation of volumetric shadows and material representation. So you can see some small differences. But overall, it matches and we are happy with these first results and improve them in the future.

 
 


 
 

This is another example showing very good match for an HG phase function with g=0 and g=0,9 (strong forward scattering).

 
 


 
 

Performance

 
 

Sun + shadow cascade

14 point lights

  • 2 with regular & volumetric shadows

6 local fog volumes

  • All with density textures

 
 

PS4, 900p

 
 

Volume tile resolution

8×8

16×16

PM Material voxelization

0.45 ms

0.15 ms

Light scattering

2.00 ms

0.50 ms

Final accumulation

0.40 ms

0.08 ms

Application (Fog pass)

+0.1 ms

+0.1 ms

Total

2.95 ms

0.83 ms

 
 

Light scattering components

8×8

Local lights

1.1 ms

+Sun scattering

+0.5 ms

+Temporal integration

+0.4 ms

 
 

You can see that the performance varies a lot depending on what you have enabled and the resolution of the clip space volumes.

 
 

This shows that it will be important to carefully plan what are the needs of you game and different scenes. Maybe one could also bake static scenes scattering and use the emissive channel to represent the scattered light for an even faster rendering of complex volumetric lighting.

 
 

 
 

  • Volumetric shadows

 
 

Volumetric shadow maps

 
 

We also support volumetric shadow maps (shadow resulting from voxelized volumetric entities in our scene)

 
 

To this aim, we went for a simple and fast solution

 
 

  • We first define a 3 levels cascaded clip map volume following and containing the camera.(定义三个跟随相机的体)
    • With tweakable per level voxel size and world space snapping
  • This volume contains all our participating media entities voxelized again within it (required for out of view shadow caster, clip space volume would not be enough)
  • A volumetric shadow map is defined as a 3D texture (assigned to a light) that stores transmittance
    • Transmittance is evaluated by ray marching the extinction volume
    • Projection is chosen as a best fit for the light type (e.g. frustum for spot light)
  • Our volumetric shadow maps are stored into an atlas to only have to bind a single texture (with uv scale and bias) when using them.

 
 


 
 

Volumetric shadow maps are entirely part of our shared lighting pipeline and shader code.

 
 

Part of our common light shadow system

  • Opaque
  • Particles
  • Participating media

 
 

It is sampled for each light having it enabled and applied on everything in the scene (particles, opaque surfaces, participating media) as visible on this video.

 
 

(这边可以看PPt效果视频)

 
 

Another bonus is that we also voxelize our particles.

 
 

We have tried many voxelization method. Point and its blurred version but this was just too noisy. Our default voxelization method is trilinear(三线性). You can see the shadow is very soft and there is no popping(抛出) visible.

 
 

We also have a high quality voxelization where all threads write all the voxels contained within the particle sphere. A bit brute force for now but it works when needed.

 
 

You can see the result of volumetric shadows from particle onto participating media in the last video.

 
 

(See bonus slides for more details)

 
 


 
 

Quality: PS4

 
 

Ray marching of 323 volumetric shadow maps

Spot light:         

0.04 ms

Point light:         

0.14 ms

 
 

1k particles voxelization

Default quality:         

0.03 ms

High quality:         

0.25 ms

 
 

Point lights are more expensive than spot lights because spot lights are integrated slice by slice whereas a full raytrace is done for each point light shadow voxels. We have ideas to fix that in the near future.

 
 

Default particle voxelization is definitely cheap for 1K particles.

 
 

  • More volumetric rendering in Frostbite

 
 

Particle/Sun interaction

 
 

  • High quality scattering and self-shadowing for sun/particles interactions
  • Fourier opacity Maps [Jansen10]
  • Used in production now

 
 


 
 

Our translucent(半透) shadows in Frostbite (see Andersson11) allows particles to cast shadows on opaque surfaces but not on themselves. This technique also did not support scattering.

 
 

We have added that support in frostbite by using Fourier opacity mapping. This allows us to have some very high quality coloured shadowing, scattering resulting in sharp silver lining visual effects as you can see on this screenshots and cloud video.

 
 

This is one special case for the sun (non unified) but it was needed to get that extra bit of quality were needed for the special case of the sun which requires special attention.

 
 

Physically-based sky/atmosphere

 
 

  • Improved from [Elek09] (Simpler but faster than [Bruneton08])
  • Collaboration between Frostbite, Ghost and DICE teams.
  • In production: Mirror’s Edge Catalyst, Need for Speed and Mass Effect Andromeda

 
 


 
 

We also have added support for physically based sky and atmosphere scattering simulation last year. This was a fruitful collaboration between Frostbite and Ghost and DICE game teams (Mainly developed by Edvard Sandberg and Gustav Bodare at Ghost). Now it is used in production by lots games such as Mirror’s Edge or Mass Effect Andromeda.

 
 

It is an improved version of Elek’s paper which is simpler and faster than Bruneton. I unfortunately have no time to dive into details in this presentation.

 
 

But in the comment I have time J. Basically, the lighting artist would define the atmosphere properties and the light scattering and sky rendering will automatically adapt to the sun position. When the atmosphere is changed, we need to update our pre-computed lookup tables and this can be distributed over several frame to limit the evaluation impact on GPU.

 
 

  • Conclusion

 
 

Physically-based volumetric rendering framework used for all games powered by Frostbite in the future

 
 

Physically based volumetric rendering

  • Participating media material definition
  • Lighting and shadowing interactions

 
 

A more unified volumetric rendering system

  • Handles many interactions
    • Participating media, volumetric shadows, particles, opaque surfaces, etc.

 
 

Future work

 
 

Improved participating media rendering

  • Phase function integral w.r.t. area lights solid angle
  • Inclusion in reflection views
  • Graph based material definition, GPU simulation, Streaming
  • Better temporal integration! Any ideas?
  • Sun volumetric shadow
  • Transparent shadows from transparent surfaces?

 
 

Optimisations

  • V-Buffer packing
  • Particles voxelization
  • Volumetric shadow maps generation
  • How to scale to 4k screens efficiently

 
 

For further discussions

 
 

sebastien.hillaire@frostbite.com

https://twitter.com/SebHillaire

 
 

 
 

References

 
 

[Lagarde & de Rousiers 2014] Moving Frostbite to PBR, SIGGRAPH 2014.

[PBR] Physically Based Rendering book, http://www.pbrt.org/.

[Wenzel07] Real time atmospheric effects in game revisited, GDC 2007.

[Mitchell07] Volumetric Light Scattering as a Post-Process, GPU Gems 3, 2007.

[Andersson11] Shiny PC Graphics in Battlefield 3, GeForceLan, 2011.

[Engelhardt10] Epipolar Sampling for Shadows and Crepuscular Rays in Participating Media with Single Scattering, I3D 2010.

[Miles] Blog post http://blog.mmacklin.com/tag/fog-volumes/

[Valliant14] Volumetric Light Effects in Killzone Shadow Fall, SIGGRAPH 2014.

[Glatzel14] Volumetric Lighting for Many Lights in Lords of the Fallen, Digital Dragons 2014.

[Hillaire14] Volumetric lights demo

[Lagarde13] Lagarde and Harduin, The art and rendering of Remember Me, GDC 2013.

[Wronski14] Volumetric fog: unified compute shader based solution to atmospheric solution, SIGGRAPH 2014.

[Karis14] High Quality Temporal Super Sampling, SIGGRAPH 2014.

[Jansen10] Fourier Opacity Mapping, I3D 2010.

[Salvi10] Adaptive Volumetric Shadow Maps, ESR 2010.

[Elek09] Rendering Parametrizable Planetary Atmospheres with Multiple Scattering in Real-time, CESCG 2009.

[Bruneton08] Precomputed Atmospheric scattering, EGSR 2008.

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

Voxel House

  • Introduction

 
 

http://www.oskarstalberg.com/game/house/Index.html

 
 


 
 

My projects typically revolve(围绕) around some central idea that I want to explore. Here, that central idea is a particular content driven approach to modular tilesets that I’ve had on my mind for a while. This project could have been created as a Python script in Maya or a node graph in Houdini. However, since I don’t want my final presentation material to be a dull narrated youtube clip set in a grey-boxed Maya scene, I created an interactive web demo instead. As a tech artist, the width of my skill set is crucial; I’m not a master artist nor a proper coder, but I’ve got a slice of both in me. I’m most comfortable in the very intersection of art and tech; of procedure and craftsmanship. A web demo is the perfect medium to display those skills.

 
 

 
 

  • Figuring out the tiles

 
 

The core concept is this: the tiles are places in the corners between blocks, not in the center of the blocks. The tiles are defined by the blocks that surround them: a tile adjacent to one block in the corner would be 1,0,0,0,0,0,0; a tile representing a straight wall would be 1,1,1,1,0,0,0,0.

 
 


 
 

Since each corner is surrounded by 8 possible blocks, each of which can be of the 2 possible states of existence or non-existence, the number of possible tiles are 2^8= 256. That is way more blocks than I want to model, so I wrote a script to figure out which of these tiles were truly unique, and which tiles were just rotations of other tiles. The script told me that I had to model 67 unique tiles – a much more manageable number.

 
 


 
 

I could have excluded flipped version of other tiles as well, which would have brought the number down even further. However, I decided to keep those so that I could make some asymmetrically tiling features. The drain pipes you see in concave corners of the building is one example of that.

 
 

 
 

  • Boolean setup in Maya

 
 

Being the tech artist that I am, I often spend more time on my workflow than on my actual work. Even accounting for rotational permutations(排列), this project still involved a large amount of 3D meshes to manually create and keep track of. The modular nature of the project also made it important to continuously see and evaluate the models in their proper context outside of Maya. The export process had to be quick and easy and I decided to write a small python script to help me out.

【这里有巨大的工作量,即使可以旋转,依然有大量的组合,目标是使得各个连接都可以有很好的效果。这个过程要足够的快速和容易,使用python脚本解决。理解就是脚本的作用就是来验证美术做出来的效果是ok可用的。】

 
 

First, the script merges all my meshes into one piece. Second, a bounding box for each tile proceeds to cut out its particular slice of this merged mesh using Maya’s boolean operation. All the cutout pieces inherit the name and transform from their bounding box and are exported together as an fbx.

【把所有相关mesh合并成piece,使用maya布尔操作进入切出每一tile的包围盒,就是包到邻居。】

 
 

Not only did this make the export process a one-button solution, it also meant that I didn’t have to keep my Maya scene that tidy. It didn’t matter what meshes were named, how they were parented or whether they were properly merged or not. I adapted my Maya script to allow several variations of the same tile type. My Unity script then chose randomly from that pool of variation where it existed. In the image below, you can see that some of the bounding boxes are bigger than the others. Those are for tiles that have vertices that stretch outside their allotted volume.

 
 


 
 

 
 

  • Ambient Occulusion 环境光遮蔽

 
 

Lighting is crucial to convey 3D shapes and a good sense of space. Due to the technical limitations in the free version of Unity, I didn’t have access to either real time shadows or ssao – nor could I write my own, since free Unity does not allow render targets. The solution was found in the blocky nature of this project. Each block was made to represent a voxel in a 3D texture. While Unity does not allow me to draw render targets on the GPU, it does allow me to manipulate textures from script on the CPU. (This is of course much slower per pixel, but more than fast enough for my purposes.)

Simply sampling that pixel in the general direction of the normal gives me a decent ambient occlusion approximation.

 
 

I tried to multiply this AO on top of my unlit color texture, but the result was too dark and boring. I decided on an approach that took advantage on my newly acquired experience in 3D textures: Instead of just making pixels darker, the AO lerps the pixel towards a 3D LUT that makes it bluer and less saturated. The result gives me a great variation in hue without too harsh a variation in value. This lighting model gave me the soft and tranquil feeling I was aiming for in this project.

 
 


 
 

 
 

  • Special Pieces(特殊件)

 
 


 
 

When you launch the demo, it will auto generate a random structure for you. By design, that structure does not contain any loose or suspended blocks.

 
 

I know that a seasoned tool-user will try to break the tool straight away by seeing how it might treat these type of abnormal structures. I decided to show off by making these tiles extra special, displaying features such as arcs, passages, and pillars.

 
 


 
 


 
 

 
 

  • Floating Pieces

 
 

There is nothing in my project preventing a user from creating free-floating chunks, and that’s the way I wanted to keep it. But I also wanted to show the user that I had, indeed, thought about that possibility. My solution to this was to let the freefloating chunks slowly bob up and down. This required me to create a fun little algorithm to figure out in real time which blocks were connected to the base and which weren’t:

 
 

The base blocks each get a logical distance of 0. The other block check if any of their neighbors have a shorter logical distance than themselves; if they do, they adopt that value and add 1 to it. Thus, if you disconnect a chunk there will be nothing grounding those blocks to the 0 of the base blocks and their logical distance will quickly go through the roof. That is when they start to bob.

 
 

The slow bobbing of the floating chunks add some nice ambient animation to the scene.

 
 


 
 

 
 

  • Art Choices

 
 

Picking a style is a fun and important part of any project. The style should highlight the features relevant to a particular project. In this project, I wanted a style that would emphasize blockiness and modularity rather than hiding it.

 
 

The clear green lines outline the terraces, the walls are plain and have lines of darker brick marking each floor, the windows are evenly spaced, and the dirt at the bottom is smooth and sedimented in straight lines. Corners are heavily beveled to emphasize that the tiles fit together seamlessly. The terraces are supposed to look like cozy secret spaces where you could enjoy a slow brunch on a quiet Sunday morning. Overall, the piece is peaceful and friendly – a homage to the tranquility of bourgeois life, if you will.

 
 

 
 

  • Animation

 
 

It should be fun and responsive to interact with the piece. I created an animated effect for adding and removing blocks. The effect is a simple combination of a vertex shader that pushes the vertices out along their normals and a pixel shader that breaks up the surface over time. A nice twist is that I was able to use the 3D texture created for the AO to constrain the vertices along the edge of the effect – this is what creates the bulge along the middle seen in the picture.

 
 


 
 


 
 

 
 

 
 

  • Conclusion

 
 

The final result is like a tool, but not. It’s an interactive piece of art that runs in your browser. It can be evaluated for it’s technical aspects, it’s potential as a level editor tool, it’s shader work, it’s execution and finish, or just as a fun thing to play around with. My hope is that it can appeal to developers and laymen alike. In a way, a web demo like this is simply a mischievous way to trick people into looking at your art longer than they otherwise would.