Tag: Advanced Game Tech

Phase-Functioned Neural Networks for Character Control

理论部分:

 
 

http://theorangeduck.com/page/phase-functioned-neural-networks-character-control

 
 

视频内容:

讲者背景:角色动画机器学习方向的博士毕业,

动画系统,中间的就是黑盒,就是动画系统

用户输入动画系统的是按钮的指向信息,如图所示的平面的指向,是比较高级的指令,比如我们想走向哪个方向。

动画系统则像是巨大的状态机,不同的状态对应不同的动画,之间会有动画混合。这个复杂的系统直接写代码比较复杂,很难维护,

因此我们重新考虑能不能用一个简单的算法来实现这个复杂的动画交互系统。

我们希望动画系统的输入输出就变成一些参数而已。

我们再来看原来的复杂的动画系统,如果把输入输出看作是动画模型的参数,那么也是可以做到的,就像是在一个巨大的数据库里面挑东西输出。

因此我们希望做到的第二点就是直接输出下一个pose

当然可以,基本思想就是把动画系统当作是黑盒,我们给一些输入就有想要的输出,后面具体讲做法。

输入x:Trajectory Positions, Directions, Heights;Previous Joint Position, Velocities…

输出y:各关节的transform信息

要做训练我们首先需要数据,采用的是动作捕捉,每一段约十分钟,一共约两小时的非结构化的数据。

高度匹配的问题,我们捕捉的数据默认脚就是完全贴合地面的,以此来训练,我们是了大量的各种不同的地面来获得相关的数据。

然后我们来看怎么学习

一个神经网络事实上就是一个函数,

比如这个函数就可以让输入得到相应的输出

我们的函数的输入输出如图所示

而这个可以量化为vectors作为输入输出

如果是单层的最简单的NN举个例子可以像是这样,wb这两个参数就是我们需要学习得到的结果。

输入就是已知的xy

输出就是学习得到的结果wb,如这里所示

最终我们得到的这个函数叫做行为函数

这里可能涉及各种不同的函数,比如这个是非线性混合函数

这两个就是很类似的

如果是深度学习的多层的函数,其表现就是这样

这个例子就是一个三层的神经网络公式

训练的做法就是每次输入值,然后跟据得到的结果衡量错误,然后在调整网络的参数,这就是基本的思路

我们采用了GPU运算节省时间

Phase-functioned NN意思就是我们采用了一种特殊的NN方法,对于不同的motion采用不同的网络权重,避免混合motions,细节请看paper

这是我们最终获得的,简单的动画驱动模型来替代state machineg和blend tree

然后展示demo

性能展示

结论

 
 

首先完整看一遍PPT:

SIGGRAPH上面的演讲PPT

目标:游戏中的角色控制做到快速紧凑的表现力

最终结果展示

第一部分背景

前人的工作存在的可改进之处:

  1. 需要将整个数据库全存放于内存
  2. 需要手动处理数据
  3. 需要一些复杂的加速方法

NN可以带来什么帮助呢?

  1. 虚拟的无限制的数据容量(任意动作)
  2. 快速的实时的低内存使用

但是怎么来生成动作呢?

CNN:学习用户角色控制信号与角色行为的关系

demo

问题是什么?

存在歧义,会发生相同的输入得到不同的角色行为结果

实际上:

  1. 需要特殊处理解决掉歧义
  2. 一开始需要提供所有的输入轨迹情况
  3. 多层CNN对于游戏来讲还是太慢了

RNN:学习从前一帧到后一帧的对应关系

demo

RNN结果质量:

  1. 只能坚持10
  2. 无法避免漂浮
  3. 无法避免歧义

总结我们面对的问题:

  1. 我们怎么去处理大规模的数据
  2. 我们怎么解决歧义的问题
  3. 我们怎么样让生成的结果看上去不错

数据捕捉部分

非结构化的数据捕捉,一共补了差不多两小时的数据,每一段十分钟左右,摆放了很多桌子椅子来模拟复杂地形,使得尽量包含各种复杂的情况

demo

demo

地形匹配

  1. 我们希望地形数据和运动数据一起加入学习
  2. 但是同时捕捉运动和地形数据是比较麻烦的
  3. 制作一个高度图数据库,然后让一段运动匹配高度图中的一块

例子

参数设置:

  1. 最终效果不错
  2. 角色轨迹采用窗口模式
  3. 加上了步态,地形高度等信息

神经网络部分

PFNN:一个phase函数看作是权重的NN

phase是0-2pi的标量,表示的是当前locomotion cycle下当前的pose

图示:输入是当前帧pose,输出是下一帧pose,NN里面的参数是phase function

demo

特征:前回馈NN,有两个隐藏层,每层有512个影藏单元,ELU驱动函数

输出是NN的权重,循环三次方函数差值四个控制点,每个控制点由一组NN权重组成。

训练算法:

  1. 输入phase生成权重
  2. 使用权重和输入值到nn得到输出值
  3. 衡量输出错误
  4. 反向传播nn和phase函数更新控制点的值

结果

demo

结论

phase函数预计算:因为这个函数的计算对于游戏来说是比较耗时的

  1. 控制数值范围 0-2pi,在这个范围内可以预计算
  2. 运行时对于预计算的结果做差值
  3. 得到速度和内存之间的平衡

性能参数

缺点:

  1. 模型的训练时间非常耗时
  2. 对于美术的编辑和修改,不能直接得到正反馈
  3. 很难预测结果,有问题也不能直接知道为什么

优点:

  1. NN很容易处理大量的数据,可以得到万般种结果
  2. 语义分解解决了歧义的问题
  3. 简单的结构和参数化的使用方式很容易控制质量

 
 

 
 

实践部分两步走:

  1. 先看demo怎么实现的
  2. 再看network怎么处理的

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

AIAnimation使用代码分析

历尽艰难跑起来了,问题是没有办法操作,猜测Ubuntu和windows的操作代码不兼容,代码分析部分的主要目标之一就是把操作改掉:

 
 

代码结构分析

一切从main函数开始:

  • 首先是SDL初始化,Simple DirectMedia Layer(SDL) is a cross-platform development library designed to provide low level access to audio, keyboard, mouse, joystick, and graphics hardware via OpenGL and Direct3D. 可以说是一个常用的跨平台opengl支持库。

    这边就是想到应该不会是操作兼容性的问题导致不能动,一看代码原来是只支持手柄。

    可惜的是,windows平台上面跑起来好卡,感觉帧率小于10帧!

  • GLEW初始化

  • 资源加载

    • Options 里面定义的是一些我们可以修改的设置选项:

    • CameraOrbit 则是相机初始化设置

    • LightDirectional 场景光照初始化设置

      这里就是一堆GL设置

    • Character 角色初始化设置

      这部分是这里的角色的定义信息和加载,这部分很重要!

      首先我们来看一下存储的角色数据的文件包涵下面四个,坑爹的采用了二进制存储:

      顶点,三角形,父子关系,xforms信息分别四个文件。

      读文件的函数这边也很简单,将数据直接存进对应的容器,角色数据结构信息如下:

      这部分最后就是一个前向动力学的实现,这个很简单,就是子类跟着父类运动。

    • Trajectory 路径初始化设置

      这里就是定义了运动路径的数据结构,如下所示:

    • IK 初始化设置

      Ik数据结构如下所示:

      这里还提供了一个two_joint函数,这个后面用到再讲,因为暂时也看不出来其功能。

    • Shader 相关

      这部分就是加载shader的函数,并将其运用到opengl

    • Heightmap 高度图初始化设置

      这边主要来看一下高度数据的读取和存储

      高度文件数据示例,包括两个文件:

       
       

       
       

      Load()函数就是一个一个读取float存储在 vector<vector<float>> data 里面:

      xxx.txt 信息就是用来生出 data,xxx_ao.txt 信息则是用来生出 vbo/tbo(坐标,颜色等信息);vbo_data/tbo_data(索引坐标信息)。

    • Areas 区域初始化设置

      这部分的数据结构如下:

    • PFNN模型

      模型的加载和初始化,首先来看其数据结构:

      ArrayXf 这个数据结构是Eigen下面存储 float array的结构。Load函数底下就是加载的文件,很多很多文件啊!

      我们来看上图所示的文件结构就可以发现,pfnn这个网格模型相关的数据内容,主要包含的就是网络模型和角色。

    • 加载游戏世界

      load_world 这些函数,目前来看这些函数里面主要是在做地形标记,所以来说这程序跑起来需要做的地形标记?

  • Game Loop 部分
    • Input处理

      目前只支持手柄,SDL包含跨平台的输入交互模块,细节不解释,见下图

      但事实上不是所有的交互都在这里,在渲染那边很多的主要操作都是直接写在渲染的部分的,但都是用了SDL接口。

    • 渲染

      一共包含前处理,渲染处理,后处理三部分,我们分别来看。

 
 

前处理

  • 更新相机(直接按键确定)

    右手柄摇杆控制相机旋转,LR控制zoomin/zoomout,然后直接作用于相机参数。

  • 更新目标方向和速度(直接按键确定)

    这部分也是直接响应按键输入,按键就确定了用户期望的目标方向和速度。

  • 更新步态(算法数据前处理第一步)

    通过上一时刻的 trajectory 参数 和 options 参数来确定当前时刻 trajectory 的参数。

  • 预测未来的 Trajectory(算法数据前处理第二步)

    通过上一步获得的 trajectory 参数 和 character 参数,来混合获得 trajectory_positions_blend 这个对象

  • 碰撞处理(算法数据前处理第三步)

    根据 areas 的 walls 的信息,来调整 trajectory_positions_blend 的值。

    在这里,又做了一步将 trajectory_positions_blend 的值写回 trajectory

  • 跳跃(算法数据前处理第四步)

    根据 areas 的 jump 的信息,来调整 trajectory 的值。

  • Crouch 区域(算法数据前处理第五步)

    根据 areas 的 crouch_pos 的信息,来调整 trajectory 的值。

  • 墙(算法数据前处理第六步)

    根据 areas 的 walls 的信息,来直接调整 trajectory 的值。

  • Trajectory 旋转(算法数据前处理第七步)

    trajectory->rotations 的值调整

  • Trajectory 高(算法数据前处理第八步)

    根据 heightmap 的值来调整 trajectory 的值

  • 输入的 Trajectory 位置方向(pfnn输入内容第一步)

    Trajectory 信息来获得 pfnn->Xp

  • 输入的 Trajectory 步态(pfnn输入内容第二步)

    Trajectory 信息来获得 pfnn->Xp

  • 输入的 当前的 Joint 位置速度和旋转角度(pfnn输入内容第三步)

    Trajectory 信息来获得 pfnn->Xp

  • 输入的 Trajectory 高度(pfnn输入内容第四步)

    Trajectory 信息来获得 pfnn->Xp

  • Perform Regression 【核心步骤:模型predict

    上面在设置的是pfnn的参数,而这里还需要设置的是predict函数的传入参数,是character->phase

  • 时间处理,这一步就是计算一下predict时间,debug用。
  • Build Local Transformpfnn输出)

    这一步就是运用pfnn的输出结果,来获得角色每个关节的position/velocity/rotation

    这里还需要的一步就是上面得到的关节数据是世界坐标,要转换到局部坐标。

  • IK 处理

    这一步就是对上面获得的关节数据,一个一个的应用到角色的IK关节!

 
 

渲染处理

  • Render Shadow
  • Render Terrain
  • Render Character
  • Render the Rest
  • Render Crouch Area
  • Render Jump Areas
  • Render Walls
  • Render Trajectory
  • Render Joints
  • UI Elements
  • PFNN Visual
  • Display UI

这里都是opengl使用,和AI数据的使用无关,就不在赘述。

 
 

后处理

  • Update Past Trajectory

    Trajectory 数据传递更新

  • Update Current Trajectory

    Trajectory数值计算更新

  • Collide with walls

    Trajectory 碰撞更新

  • Update Future Trajectory

    Trajectory 依据 pfnn结果来做更新

  • Update Phase

  • Update Camera

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

AI4Animation 工程

这里尝试非常多次还是没有办法对于里面的demo信息完整打开且没有报错,可以跑的demo。

因此我们在AIAnimation的基础上首先来看看这个工程怎么使用起来。

 
 

首先来看重点在角色,我们来看其角色的构造:

上面是两个demo的使用结构,可以看到就一个重要的csharp文件,我们来对比分析。

Original对应的是Siggraph17的内容,Adam对应的是Siggraph18的内容,我们先来看17。

 
 

首先看大的结构:

第二个类继承Editor对象,作用是在editor里面形成层级菜单Animation,其余的三个则是分别由另外的三个类来完成。

这三个对象也分别形成了三个子标签菜单项,如上面所示的图。


  • NeuralNetwork 这个类,里面只做了一件事情,就是让用户选择NN模型,也就是说这个类处理的是交互UI表现和逻辑,没有其他。NN里面应该包含的信息都在Model这个类里面。下图就是model里面存储的数据结构:

    然后我们来看接口函数:

    这边是为了兼容和扩展多种的NN方法设置的接口。

    剩下的就是一些Tensor相关的函数,Tensor是对Eigen数据的封装,其真实的计算实现都是由Eigen实现的,这边同时提供了一堆的数据结构关联操作的方法。

    最后model里面涉及的比较重要的内容就是Parameters,这边unity里面主要做的就是加载读取和存储方法。

  • Controller 这个类,处理的是Input,主要就是WSADQE. 还有一个很重要的变量是Styles数组,记录的应该是状态权重。
  • Character 这里做的就是驱动骨架运动。

而作为核心的中介数据 Trajectory 这个类,其就是一组数据点数组,并且包含对这个数组,单个点的操作方法。单个点的数据内容很丰富,包括各种变换和状态等:

 
 

所有的核心使用方法就是在Update函数里面,这边的做法应该是和AIAnimation里面是一模一样的,我们可以对比一下:

  • 只有存在NN模型的情况下,才会执行下面的所有内容。
  • Update Target Direction / Velocity

    这里做的就是:

    TargetDirection = TargetDirection Trajectory定义的当前位置 跟据 TargetBlending 权重混合。

    TargetVelocity = TargetVelocity Controller输入速度 跟据 TargetBlending 权重混合。

  • Update Gait

    Trajectory.Points[RootPointIndex] 的每一个Style的值 = 当前值 和 用户是否点选了要成为该Style 跟据 GaitTransition 权重混合。

  • Predict Future Trajectory

    预测的位置 = 当前位置和前一个位置的差值的延续 和 TargetPosition 差值获得

    预测的style = 延续当前的style

    预测的方向 = 当前的方向 和 TargetDirection 差值获得

  • Avoid Collisions

    保证新的位置可靠,也就是考虑了碰撞。

  • Input Trajectory Positions / Directions

    给NN.Model喂数据,Trajectory的每一个Point的位置和方向(都是xz轴值)

  • Input Trajectory Gaits

    给NN.Model喂数据,Trajectory的每一个Point的Style数组

  • Input Previous Bone Positions / Velocities

    给NN.Model喂数据,Joints的每一个关节的位置和速度

  • Input Trajectory Heights

    给NN.Model喂数据,Trajectory的每一个Point的高度信息(y轴值)

  • Predict【利用模型运算】

  • Update Past Trajectory (轨迹上 i < RootPointIndex 的点)

    更新Trajectory.Points[i] 的每一个点的信息:i位置=i+1位置的值(意思就是向前取一个点)

  • Update Current Trajectory(轨迹上 RootPointIndex 所在的点)

    跟据NN的输出结果来构建一个新的点Trajectory.Points[RootPointIndex]的信息,设置其位置方向

  • Update Future Trajectory(轨迹上 RootPointIndex+1 < i < Trajectory.Points.Length的点)

    每个点新的位置 = 每个点原来位置 + 当前方向 与 跟据模型输出值混合得到的距离和方向 差值(这边做了多重的影响差值考虑)

  • Avoid Collisions

    同 5 做法一致

  • Compute Posture

    positionsrotations两个数组存每一个joint的变换;

    每个 positions[i] = NN返回结果 * 0.5 + 上一个位置按照上一个方向到达的这一个应该在的位置 * 0.5;

    每个 Velocities[i] = NN返回的结果

  • Update Posture

    每个joint的position,rotation直接取上一步数组中对应的值

  • Map to Character

    transform应用在角色上面

     
     

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

AIAnimation 工程安装

Glenn – networked physics

https://gafferongames.com/

 
 

Glenn Fiedler is the founder of Network Next where he’s hard at work as CEO/CTO. Network Next is creating a new internet for games and e-sports.

 
 

第一篇:

Networked Physics in Virtual Reality

https://gafferongames.com/post/networked_physics_in_virtual_reality/

 
 

Introduction

 
 

About a year ago, Oculus approached me and offered to sponsor my research. They asked me, effectively: “Hey Glenn, there’s a lot of interest in networked physics in VR. You did a cool talk at GDC. Do you think could come up with a networked physics sample in VR that we could share with devs? Maybe you could use the touch controllers?”

【一年前,Oculus来找我做网络物理VR相关的研究】

 
 

I replied “F*** yes!” cough “Sure. This could be a lot of fun!”. But to keep it real, I insisted on two conditions. One: the source code I developed would be published under a permissive open source licence (for example, BSD) so it would create the most good. Two: when I was finished, I would be able to write an article describing the steps I took to develop the sample.

【我很开心的答应了,条件是成果必须开源而且可以对外宣讲】

 
 

Oculus agreed. Welcome to that article! Also, the source for the networked physics sample is here, wherein the code that I wrote is released under a BSD licence. I hope the next generation of programmers can learn from my research into networked physics and create some really cool things. Good luck!

【因此就有了这篇博文以及源代码,我希望大家使用这里的相关技术可以做出酷酷的游戏】

 
 

What are we building?

 
 

When I first started discussions with Oculus, we imagined creating something like a table where four players could sit around and interact with physically simulated cubes on the table. For example, throwing, catching and stacking cubes, maybe knocking each other’s stacks over with a swipe of their hand.【Oculus那边希望的场景是四个人围在桌子边,桌上有一堆的方块可以做物理交互。】

 
 

But after a few days spent learning Unity and C#, I found myself actually inside the Rift. In VR, scale is so important. When the cubes were small, everything felt much less interesting, but when the cubes were scaled up to around a meter cubed, everything had this really cool sense of scale. You could make these huge stacks of cubes, up to 20 or 30 meters high. This felt really cool!【但是我在接触VR一段时间后发现 Scale 在VR里面非常重要。如果桌上一堆小的方块你会觉得不那么有趣,但是如果放大到一米一个左右,你会发现整个世界非常的酷。】

 
 

It’s impossible to communicate visually what this feels like outside of VR, but it looks something like this…

 
 

 
 

… where you can select, grab and throw cubes using the touch controller, and any cubes you release from your hand interact with the other cubes in the simulation. You can throw a cube at a stack of cubes and knock them over. You can pick up a cube in each hand and juggle them. You can build a stack of cubes and see how high you can make it go.【这种感觉很难表达,看起来就像这个图所示的这些方块你可以物理的交互他们。】

 
 

Even though this was a lot of fun, it’s not all rainbows and unicorns. Working with Oculus as a client, I had to define tasks and deliverables before I could actually start the work.【在Oculus工作,我需要在开工前确定任务目标】

 
 

I suggested the following criteria we would use to define success:

  1. Players should be able to pick up, throw and catch cubes without latency.
  2. Players should be able to stack cubes, and these stacks should be stable (come to rest) and be without visible jitter.
  3. When cubes thrown by any player interact with the simulation, wherever possible, these interactions should be without latency.

【目标包括这三个点:首先是玩家可以无延迟感知的拾取,扔抓方块们;然后是玩家可以堆叠立方体,并且会保持稳定(静止);做到正式的延迟最小化。】
 

At the same time I created a set of tasks to work in order of greatest risk to least, since this was R&D, there was no guarantee we would actually succeed at what we were trying to do.【与此同时我创造了一系列工作任务,将工作风险最小化。因为这是研发,我们不能保证我们实际上能够成功实现我们的目标。】

 
 

 
 

Network Models

 
 

First up, we had to pick a network model. A network model is basically a strategy, exactly how we are going to hide latency and keep the simulation in sync.【首先我们需要选择网络模型,网络模型就是一套我们怎么隐藏延迟保持同步的机制。】

 
 

There are three main network models to choose from:

  1. Deterministic lockstep
  2. Client/server with client-side prediction
  3. Distributed simulation with authority scheme

【一共三种模型:确定性锁步(帧同步),客户端预测和客户端/服务器模式,采用授权方案的分布式模拟】

 
 

I was instantly confident of the correct network model: a distributed simulation model where players take over authority of cubes they interact with. But let me share with you my reasoning behind this.【我很有信心的选择了我认为正确的网络模型:分布式模拟,每个玩家是其交互方块的authority,我来慢慢解释为什么选这个。】

 
 

First, I could trivially rule out a deterministic lockstep network model, since the physics engine inside Unity (PhysX) is not deterministic. Furthermore, even if PhysX was deterministic I could still rule it out because of the requirement that player interactions with the simulation be without latency.【首先我很确性的排除帧同步的方案,因为unity的物理引擎结果的不确定性。此外,即使PhysX是确定性的,我仍然可以排除它,因为这里我们的要求是玩家模拟交互没有延迟。】

 
 

The reason for this is that to hide latency with deterministic lockstep I needed to maintain two copies of the simulation and predict the authoritative simulation ahead with local inputs prior to render (GGPO style). At 90HZ simulation rate and with up to 250ms of latency to hide, this meant 25 physics simulation steps for each visual render frame. 25X cost is simply not realistic for a CPU intensive physics simulation.【理由是帧同步隐藏延时,我需要保留两个仿真副本,并在渲染之前使用本地输入预测权威仿真。在90HZ的模拟速率下隐藏时间达到250ms,这意味着每个视觉渲染帧需要25个物理模拟步骤。 对于CPU密集型物理模拟来说,25X成本根本不现实。】【这边我的理解:90hz说的就是一秒刷新90帧,250ms延时意味着4帧/s,一帧25个物理模拟步骤意味着一秒一共需要100个物理模拟步骤,这是对不上的。其实这边理解很简单的应该就是每秒需要计算90次物理模拟步骤,对于CPU压力会很大。】

 
 

This leaves two options: a client/server network model with client-side prediction (perhaps with dedicated server) and a less secure distributed simulation network model.【这样就还剩下两种方案】

 
 

Since this was a non-competitive sample, there was little justification to incur the cost of running dedicated servers. Therefore, whether I implemented a client/server model with client-side prediction or distributed simulation model, the security would be effectively the same. The only difference would be if only one of the players in the game could theoretically cheat, or all of them could.【由于这是一个非竞争性的样本,没有理由承担运行专用服务器的成本来保证公平和安全。】

 
 

For this reason, a distributed simulation model made the most sense. It had effectively the same amount of security, and would not require any expensive rollback and resimulation, since players simply take authority over cubes they interact with and send the state for those cubes to other players.【因为这些理由,我们选择分布式模拟的方案。它实际上具有相同的安全性,并且不需要任何昂贵的回滚和重新模拟,因为玩家只需对与其交互的立方体授权并将这些立方体的状态发送给其他玩家。】

 
 

 
 

Authority Scheme

 
 

While it makes intuitive sense that taking authority (acting like the server) for objects you interact can hide latency – since, well if you’re the server, you don’t experience any lag, right? – what’s not immediately obvious is how to resolve conflicts.【尽管直观地表明,为你所交互的对象赋予权限(像服务器一样)可以隐藏延迟因为如果你是服务器,那么你不会遇到任何延迟,对吧?这里存在的明显问题就是如何解决冲突。】

 
 

What if two players interact with the same stack? What if two players, masked by latency, grab the same cube? In the case of conflict: who wins, who gets corrected, and how is this decided?【如果两个玩家与同一个堆栈交互会怎样?
如果两个玩家,在延迟的掩盖下抓住同一个立方体怎么办? 在冲突的情况下:谁获胜,谁得到纠正,这是如何决定的?】

 
 

My intuition at this point was that because I would be exchanging state for objects rapidly (up to 60 times per-second), that it would be best to implement this as an encoding in the state exchanged between players over my network protocol, rather than as events.【因为我会经常快速地交换对象的状态(每秒最多60次),所以直觉上最好在网络中交换状态时使用编码,而不是作为事件。】

 
 

I thought about this for a while and came up with two key concepts:

  1. Authority
  2. Ownership

【这里先列出两个关键的概念:AuthorityOwnership

 
 

Each cube would have authority, either set to default (white), or to whatever color of the player that last interacted with it. If another player interacted with an object, authority would switch and update to that player. I planned to use authority for interactions of thrown objects with the scene. I imagined that a cube thrown by player 2 could take authority over any objects it interacted with, and in turn any objects those objects interacted with, recursively.【每个方块都必须要有Authority,默认的或者就是最后一次与他交互的玩家。如果其他玩家于这个方块交互,authority需要跟新切换到这个新的玩家。我计划使用Authority来处理投掷对象与场景的交互。
我想到玩家2抛出的立方体可以对与其交互的任何其他对象拥有Authority,并且对这些对象进行交互的任何对象都是递归的。】

 
 

Ownership was a bit different. Once a cube is owned by a player, no other player could take ownership until that player reliquished ownership. I planned to use ownership for players grabbing cubes, because I didn’t want to make it possible for players to grab cubes out of other player’s hands after they picked them up.Ownership 则不一样,一旦一个立方体属于一个玩家,其他玩家在这个玩家释放Ownership权限前都无法获得这个立方体的Ownership权限。我计划将Ownership用于抓取立方体的玩家,因为我不想让玩家抢夺其他玩家手中抓取的立方体。】

 
 

I had an intuition that I could represent and communicate authority and ownership as state by including two different sequence numbers per-cube as I sent them: an authority sequence, and an ownership sequence number. This intuition ultimately proved correct, but turned out to be much more complicated in implementation than I expected. More on this later.【我想到可以通过 an authority sequence, and an ownership sequence number 来做表示和同步,这个最后被证明是可行的,后面说实现细节。】

 
 

 
 

State Synchronization

 
 

Trusting I could implement the authority rules described above, my first task was to prove that synchronizing physics in one direction of flow could actually work with Unity and PhysX. In previous work I had networked simulations built with ODE, so really, I had no idea if it was really possible.【要证实我可以实现上述的权威规则,我的第一个任务是证明UnityPhysX情况下物理同步是可行的。】

 
 

To find out, I setup a loopback scene in Unity where cubes fall into a pile in front of the player. There are two sets of cubes. The cubes on the left represent the authority side. The cubes on the right represent the non-authority side, which we want to be in sync with the cubes on the left.【我设置了一个unity测试场景,两坨方块,左边的是Authority,右边的是同步得到的结果,就是Non-Authority。】

 
 

 
 

At the start, without anything in place to keep the cubes in sync, even though both sets of cubes start from the same initial state, they give slightly different end results. You can see this most easily from top-down:【两边一模一样的初始化开始,但是得到了不一样的最终结果,如下图。】

 
 

 
 

This happens because PhysX is non-deterministic. Rather than tilting at non-determinstic windmills, I fight non-determinism by grabbing state from the left side (authority) and applying it to the right side (non-authority) 10 times per-second:【这是因为Physx是不确定性的,我通过每秒10次同步状态来解决不确定性这个问题。】

 
 

 
 

The state I grab from each cube looks like this:【对于每个方块同步的状态组成如下】

 
 

 
 

And when I apply this state to the simulation on the right side, I simply snap the position, rotation, linear and angular velocity of each cube to the state captured from the left side.【当我将这个状态应用到右侧的模拟中时,我只需将左侧每个立方体捕获的位置,旋转,线性和角速度等状态赋给右侧。】

 
 

This simple change is enough to keep the left and right simulations in sync. PhysX doesn’t even diverge enough in the 1/10th of a second between updates to show any noticeable pops.【这个简单的改变就可以保证从左边到右边的模拟同步,Physx不会产生足够的不一致。】

 
 

 
 

This proves that a state synchronization based approach for networking can work with PhysX. (Sigh of relief). The only problem of course, is that sending uncompressed physics state uses way too much bandwidth…【这证实physx支持基于网络状态的同步,唯一的问题当然是发送未压缩的物理状态使用了太多的带宽

 
 

 
 

Bandwidth Optimization

 
 

To make sure the networked physics sample is playable over the internet, I needed to get bandwidth under control.【接下来我们要做的就是优化带宽】

 
 

The easiest gain I found was to simply encode the state for at rest cubes more efficiently. For example, instead of repeatedly sending (0,0,0) for linear velocity and (0,0,0) for angular velocity for at rest cubes, I send just one bit:【一个简单的高收益方法就是更有效地编码静态方块。例如下方所示,velocity选项变的可有可无。】

 
 

 
 

This is lossless technique because it doesn’t change the state sent over the network in any way. It’s also extremely effective, since statistically speaking, most of the time the majority of cubes are at rest.【这是无损的一种方法,没有改变任何的网络状态,但同时非常的有效,因为场景中存在的大量方块都是静态的。】

 
 

To optimize bandwidth further we need to use lossy techniques. For example, we can reduce the precision of the physics state sent over the network by bounding position in some min/max range and quantizing it to a resolution of 1/1000th of a centimeter and sending that quantized position as an integer value in some known range. The same basic approach can be used for linear and angular velocity. For rotation I used the smallest three representation of a quaternion.【为了进一步优化带宽,我们需要使用有损技术。
例如我们通过在某些最小/最大范围内限定并量化为1/1000厘米的分辨率,其可以用整数来替代,有效的减少数据量。 相同的基本方法可以用于速度和角速度。 旋转我使用了通过三个值来表示的四元数来表示。】

 
 

But while this saves bandwidth, it also adds risk. My concern was that if we are networking a stack of cubes (for example, 10 or 20 cubes placed on top of each other), maybe the quantization would create errors that add jitter to that stack. Perhaps it would even cause the stack to become unstable, but in a particularly annoying and hard to debug way, where the stack looks fine for you, and is only unstable in the remote view (eg. the non-authority simulation), where another player is watching what you do.【这虽然节省了带宽,但也增加了风险。
我担心的是如果我们联网了一堆立方体(例如,1020个立方体放在另一个立方体上),也许量化会产生jitter这类的错误。 也许甚至会导致堆栈变得不稳定(自己是好的,别人看到的你都是有问题的)。】

 
 

The best solution to this problem that I found was to quantize the state on both sides. This means that before each physics simulation step, I capture and quantize the physics state exactly the same way as when it’s sent over the network, then I apply this quantized state back to the local simulation.【我发现这个问题的最佳解决方案是量化双方的状态。
就是说在每个物理模拟步骤之前,我捕获和量化物理状态,同时用于本地模拟和网络发送。】

 
 

Now the extrapolation from quantized state on the non-authority side exactly matches the authority simulation, minimizing jitter in large stacks. At least, in theory.【现在,Non-Authority方的状态和Authority的模拟完全匹配,最大限度的减少了Jitter。

 
 

 
 

Coming To Rest

 
 

But quantizing the physics state created some very interesting side-effects!【但量化物理状态创造了一些非常有趣的副作用!】

 
 

  1. PhysX doesn’t really like you forcing the state of each rigid body at the start of every frame and makes sure you know by taking up a bunch of CPU.PhysX并不真的喜欢你在每一帧开始时强制每个刚体的状态,且这事CPU占用严重。】
  2. Quantization adds error to position which PhysX tries very hard to correct, snapping cubes immediately out of penetration with huge pops!【量化会增加PhysX尝试纠正的位置错误,这个有时候会导致方块失控。】
  3. Rotations can’t be represented exactly either, again causing penetration. Interestingly in this case, cubes can get stuck in a feedback loop where they slide across the floor!【旋转不能这么搞,会导致立方体可能会卡在反馈回路中,并在地面上滑动!】
  4. Although cubes in large stacks seem to be at rest, close inspection in the editor reveals that they are actually jittering by tiny amounts, as cubes are quantized just above surface and falling towards it.【尽管大堆中的立方体似乎处于静止状态,但编辑人员仔细检查发现,实际上它们实际上在微小的抖动。】

 
 

There’s not much I could do about the PhysX CPU usage, but the solution I found for the depenetration was to set maxDepenetrationVelocity on each rigid body, limiting the velocity that cubes are pushed apart with. I found that one meter per-second works very well.【关于PhysX CPU使用情况我没有太多可以做的,但是我发现的解决方案的解决方案是在每个刚体上设置maxDepenetrationVelocity,限制立方体被推开的速度。 我发现每秒一米的效果非常好。】

 
 

Getting cubes to come to rest reliably was much harder. The solution I found was to disable the PhysX at rest calculation entirely and replace it with a ring-buffer of positions and rotations per-cube. If a cube has not moved or rotated significantly in the last 16 frames, I force it to rest. Boom. Perfectly stable stacks with quantization.【让立方体完全静止是非常困难的。
我找到的解决方案是完全停用PhysX,并将其替换为每个立方体的位置和旋转的ring-buffer。 如果一个立方体在最后的16帧中没有明显的移动旋转,我强迫它休息(静止)。 】

 
 

Now this might seem like a hack, but short of actually getting in the PhysX source code and rewriting the PhysX solver and at rest calculations, which I’m certainly not qualified to do, I didn’t see any other option. I’m happy to be proven wrong though, so if you find a better way to do this, please let me know 🙂【这可能看起来不是完美的解决方案,最好的就是进入PhysX源代码并重新编写PhysX解算器,加入休眠计算,我没有办法这么做,但我没有看到任何其他好的替代方案,你能解决的话求告诉我。】

 
 

 
 

Priority Accumulator

 
 

The next big bandwidth optimization I did was to send only a subset of cubes in each packet. This gave me fine control over the amount of bandwidth sent, by setting a maximum packet size and sending only the set of updates that fit in each packet.【我做的下一个带来巨大带宽优化的做法是每个网络包发送一组方块的数据。
这使我能够很好地控制发送带宽量,方法是设置最大数据包并只发送每个数据包的更新集。(优先级分包发送)】

 
 

Here’s how it works in practice:【具体做法】

 
 

  1. Each cube has a priority factor which is calculated each frame. Higher values are more likely to be sent. Negative values mean “don’t send this cube”.【每个立方体都具有优先系数,更高的值更可能被发送,负值意味着”不要发送这个立方体”。】
  2. If the priority factor is positive, it’s added to the priority accumulator value for that cube. This value persists between simulation updates such that the priority accumulator increases each frame, so cubes with higher priority rise faster than cubes with low priority.【如果方块的优先级因子为正,则将其添加到其优先级累加器值。
    该值在模拟期间持续存在,每帧不清0
  3. Negative priority factors clear the priority accumulator to -1.0.【否定优先因子将优先累加器清除为-1.0。】
  4. When a packet is sent, cubes are sorted in order of highest priority accumulator to lowest. The first n cubes become the set of cubes to potentially include in the packet. Objects with negative priority accumulator values are excluded.【按照优先级累加器数值从高到低排列方块,前n个立方体成为一个数据包中的立方体集合,排除具有负优先级累加器值的对象。】
  5. The packet is written and cubes are serialized to the packet in order of importance. Not all state updates will necessarily fit in the packet, since cube updates have a variable encoding depending on their current state (at rest vs. not at rest and so on). Therefore, packet serialization returns a flag per-cube indicating whether it was included in the packet.【立方体按照重要性排序串行化到数据包,不是所有的方块状态都需要更新。】
  6. Priority accumulator values for cubes sent in the packet are cleared to 0.0, giving other cubes a fair chance to be included in the next packet.【数据包中发送的方块的优先累加器值被清除为0.0,使其他多维数据集有机会被包含在下一个数据包中。】

 
 

For this demo I found some value in boosting priority for cubes recently involved in high energy collisions, since high energy collision was the largest source of divergence due to non-deterministic results. I also boosted priority for cubes recently thrown by players.【因为高能碰撞是非确定性结果引起的最大分歧源,因此我提高了最近抛出的立方体的优先级。】

 
 

Somewhat counter-intuitively, reducing priority for at rest cubes gave bad results. My theory is that since the simulation runs on both sides, at rest cubes would get slightly out of sync and not be corrected quickly enough, causing divergence when other cubes collided with them.【有点反直觉,降低休息立方体的优先级给出了不好的结果。
我猜测是由于收发双方都在模拟运行,静止立方体的略有的不同步不能很快纠正的话,其他立方体与它们碰撞时就会引起分歧。】

 
 

 
 

Delta Compression

 
 

Even with all the techniques so far, it still wasn’t optimized enough. With four players I really wanted to get the cost per-player down under 256kbps, so the entire simulation could fit into 1mbps for the host.【即使采用了上面所讲的所有技术,仍然不够。
我希望每人带宽低于256kbps当四个人在场景里面玩耍的时候,这样主机也就控制在1mbps。】

 
 

I had one last trick remaining: delta compression.【我还剩下最后一招:增量压缩】

 
 

First person shooters often implement delta compression by compressing the entire state of the world relative to a previous state. In this technique, a previous complete world state or ‘snapshot’ acts as the baseline, and a set of differences, or delta, between the baseline and the current snapshot is generated and sent down to the client.【第一人称射击者通常对相对于先前状态的整个世界的所有状态来实施增量压缩。
在这种技术中,先前的完整世界状态充当基准,并且生成过去和当前之间的一组差异或增量,并将其发送到客户端。】

 
 

This technique is (relatively) easy to implement because the state for all objects are included in each snapshot, thus all the server needs to do is track the most recent snapshot received by each client, and generate deltas from that snapshot to the current.【由于所有对象的状态都包含在每个快照中,所以该技术相对容易实现,因此所有服务器需要执行的操作是跟踪每个客户端接收到的最新快照,并生成从该快照到当前的增量。】

 
 

However, when a priority accumulator is used, packets don’t contain updates for all objects and delta encoding becomes more complicated. Now the server (or authority-side) can’t simply encode cubes relative to a previous snapshot number. Instead, the baseline must be specified per-cube, so the receiver knows which state each cube is encoded relative to.【但是使用了优先级累加器后,数据包不包含所有对象的更新,这使得增量编码变得更加复杂。
现在服务器不能简单地将立方体相对于先前的快照进行编码。 而是必须指定每个多维数据集的基线,以便接收器知道每个多维数据集相对于哪个进行编码。】

 
 

The supporting systems and data structures are also much more complicated:【支持系统和数据结构也复杂得多】

  1. A reliability system is required that can report back to the sender which packets were received, not just the most recently received snapshot.【系统需要能够向发送方报告接收到哪些数据包,而不仅仅是最近接收到的快照】
  2. The sender needs to track the states included in each packet sent, so it can map packet level acks to sent states and update the most recently acked state per-cube. The next time a cube is sent, its delta is encoded relative to this state as a baseline.【发送者需要跟踪发送的每个数据包中包含的状态,因此它可以将数据包级别映射到发送状态,并更新最近每个多维数据集的最近查询状态。
    下一次发送多维数据集时,它的增量将相对于此状态编码为基准。】(这边意思就是说数据包级别和发送状态是一个变量标识的,也由此来识别编码基准)
  3. The receiver needs to store a ring-buffer of received states per-cube, so it can reconstruct the current cube state from a delta by looking up the baseline in this ring-buffer.【接收器的环形缓冲区需要存储每个立方体接收到的状态,因此它可以在该环形缓冲区中查找任意一个当前的立方体状态的基线来重建当前状态。】

 
 

But ultimately, it’s worth the extra complexity, because this system combines the flexibility of being able to dynamically adjust bandwidth usage, with the orders of magnitude bandwidth improvement you get from delta encoding.【但这一切的复杂性是值得的,因为这方案能够动态调整带宽的使用,同时获得数量级级别的带宽改进。】

 
 

 
 

Delta Encoding

 
 

Now that I have the supporting structures in place, I actually have to encode the difference of a cube relative to a previous baseline state. How is this done?【上面讲到了整个增量压缩的架构,实际上怎么进行差异编码这里来讲。】

 
 

The simplest way is to encode cubes that haven’t changed from the baseline value as just one bit: not changed. This is also the easiest gain you’ll ever see, because at any time most cubes are at rest, and therefore aren’t changing state.【最简单的方法是用一位来标记有没有变化。
这其实收益非常大,因为在任何时候大多数立方体都不会改变状态。】

 
 

A more advanced strategy is to encode the difference between the current and baseline values, aiming to encode small differences with fewer bits. For example, delta position could be (-1,+2,+5) from baseline. I found this works well for linear values, but breaks down for deltas of the smallest three quaternion representation, as the largest component of a quaternion is often different between the baseline and current rotation.【更高级的策略是对当前值和基准值之间的差异进行编码,目的就是用较少的位对差异进行编码。
例如标记起点从基线(-1+ 2+ 5)开始。 我发现这适用于线性值,但是对于用三个值表示的四元数作用不大。】

 
 

Furthermore, while encoding the difference gives some gains, it didn’t provide the order of magnitude improvement I was hoping for. In a desperate, last hope, I came up with a delta encoding strategy that included prediction. In this approach, I predict the current state from the baseline assuming the cube is moving ballistically under acceleration due to gravity.【此外,虽然编码差异带来了一些收益,但它达不到数量级的提升。
最后我想出了一种包含预测的增量编码策略。 在这种方法中,假设立方体由于重力而在加速度下运动,我可以基于此来预测基线的当前状态】

 
 

Prediction was complicated by the fact that the predictor must be written in fixed point, because floating point calculations are not necessarily guaranteed to be deterministic. But after a few days of tweaking and experimentation, I was able to write a ballistic predictor for position, linear and angular velocity that matched the PhysX integrator within quantize resolution about 90% of the time.【预测因子写入固定值这做法很难,因为浮点计算不一定保证确定性。
但经过几天的调整和实验之后,我可以写出一个弹道预测器,与PhysX结果匹配的位置,速度和角速度相似性大于90%。】

 
 

These lucky cubes get encoded with another bit: perfect prediction, leading to another order of magnitude improvement. For cases where the prediction doesn’t match exactly, I encoded small error offset relative to the prediction.【这些幸运的cubes用一个bit来表示,使得压缩效果达到了数量级的提升。
这种情况下我的编码结果和真实的结果有一个小的错误偏移量。】

 
 

In the time I had to spend, I not able to get a good predictor for rotation. I blame this on the smallest three representation, which is highly numerically unstable, especially in fixed point. In the future, I would not use the smallest three representation for quantized rotations.【但是对于旋转量是没有办法这么做的,因为四元数的三个值数值上是非常不稳定的。 将来,我不会使用三个值表示的四元数来表示旋转。】

 
 

It was also painfully obvious while encoding differences and error offsets that using a bitpacker was not the best way to read and write these quantities. I’m certain that something like a range coder or arithmetic compressor that can represent fractional bits, and dynamically adjust its model to the differences would give much better results, but I was already within my bandwidth budget at this point and couldn’t justify any further noodling 🙂【编码差异和错误补偿也是非常明显的,因为这种封装方式并不是读取和写入这些数值的最佳方式。
我确定采用范围编码器或算术压缩器,并根据差异动态调整这两个工具模型会得到更好的结果,但此时我已经达到了我带宽目标,不再继续处理了🙂

 
 

 
 

Synchronizing Avatars

 
 

After several months of work, I had made the following progress:【经过几个月的工作,我取得了以下进展:】

 
 

  • Proof that state synchronization works with Unity and PhysX【证明状态同步适用于UnityPhysX
  • Stable stacks in the remote view while quantizing state on both sides【远程和本地状态同步且稳定】
  • Bandwidth reduced to the point where all four players can fit in 1mbps【四个玩家的情况下带宽控制在了1mbps

 
 

The next thing I needed to implement was interaction with the simulation via the touch controllers. This part was a lot of fun, and was my favorite part of the project 🙂【我需要实现的下一件事是通过Touch控制器交互,这部分我非常喜欢且非常有趣】

 
 

I hope you enjoy these interactions. There was a lot of experimentation and tuning to make simple things like picking up, throwing, passing from hand to hand feel good, even crazy adjustments to ensure throwing worked great, while placing objects on top of high stacks could still be done with high accuracy.【我希望你喜欢这些交互,这里我们为了保证拾取,投掷,传球等动作效果好,做了疯狂的实验和调整】

 
 

But when it comes to networking, in this case the game code doesn’t count. All the networking cares about is that avatars are represented by a head and two hands driven by the tracked headset and touch controller positions and orientations.【一般情况下头和双手都是由头盔和手柄的位置和方向驱动的。但是对于网络传输过去的avatar,本地设备驱动是不奏效的。】

 
 

To synchronize this I captured the position and orientation of the avatar components in FixedUpdate along the rest of the physics state, and applied this state to the avatar components in the remote view.【我获取了avatar组件的位置和旋转量来同步传输和应用给远端的avatar。】

 
 

But when I first tried this it looked absolutely awful. Why?【但是当我第一次尝试这个时,它看起来非常糟糕。
为什么?】

 
 

After a bunch of debugging I worked out that the avatar state was sampled from the touch hardware at render framerate in Update, and was applied on the other machine at FixedUpdate, causing jitter because the avatar sample time didn’t line up with the current time in the remote view.【debug的时候发现,硬件设备的位置跟新和基于渲染的frame更新,时间是不一致的,这个会导致远程跟新jitter。】

 
 

To fix this I stored the difference between physics and render time when sampling avatar state, and included this in the avatar state in each packet. Then I added a jitter buffer with 100ms delay to received packets, solving network jitter from time variance in packet delivery and enabling interpolation between avatar states to reconstruct a sample at the correct time.【为了解决这个问题,我存储物理和渲染的时候的avatar的状态差异,并将其包含在每个avatar状态的数据包中。
然后我添加了一个100ms的延迟抖动缓冲区接收数据包,解决数据包传输时间差异造成的网络抖动,并在avatar状态之间启用插值,确保效果。】

 
 

To synchronize cubes held by avatars, while a cube is parented to an avatar’s hand, I set the cube’s priority factor to -1, stopping it from being sent with regular physics state updates. While a cube is attached to a hand, I include its id and relative position and rotation as part of the avatar state. In the remote view, cubes are attached to the avatar hand when the first avatar state arrives with that cube parented to it, and detached when regular physics state updates resume, corresponding to the cube being thrown or released.【为了同步由Avatar持有的cubes,当一个cube处于化身状态时,我将其优先因子设置为-1,从而阻止它以常规物理状态更新发送。当一个cube附着在手上时,我会将其ID和相对位置以及旋转作为avatar的状态的一部分。 在远端当第一个avatar状态到达时,立方体与该虚拟形体相连,并在常规物理状态更新恢复时分离,对应于正在抛出或释放的立方体。】

 
 

 
 

Bidirectional Flow

 
 

Now that I had player interaction with the scene working with the touch controllers, it was time to start thinking about how the second player can interact with the scene as well.【现在我已经可以使用控制器与场景进行交互了,现在开始考虑第二个玩家如何与场景互动。】

 
 

To do this without going insane switching between two headsets all the time (!!!), I extended my Unity test scene to be able to switch between the context of player one (left) and player two (right).【要做到这一点,为了测试的时候无需一直在两个头盔之间进行疯狂的切换,我扩展了Unity测试场景,以便能够在玩家1(左)和玩家2(右)之间切换。】

 
 

I called the first player the “host” and the second player the “guest”. In this model, the host is the “real” simulation, and by default synchronizes all cubes to the guest player, but as the guest interacts with the world, it takes authority over these objects and sends state for them back to the host player.【我把第一个玩家称为”host”,第二个玩家称为”guest”。
在这个模型中,host是”真实”模拟,默认情况下所有立方体都会同步到guest玩家,但是随着guest与世界的交互,它需要对这些对象进行控制,并将状态发送回host玩家。】

 
 

To make this work without inducing obvious conflicts the host and guest both check the local state of cubes before taking authority and ownership. For example, the host won’t take ownership over a cube already under ownership of the guest, and vice versa, while authority is allowed to be taken, to let players throw cubes at somebody else’s stack and knock it over while it’s being built.【为了解决冲突,host和guest在获得权限和所有权之前都会检查立方体的本地状态。例如,host不会对guest拥有所有权的cubes拥有所有权,反之亦然。】

 
 

Generalizing further to four players, in the networked physics sample, all packets flow through the host player, making the host the arbiter. In effect, rather than being truly peer-to-peer, a topology is chosen that all guests in the game communicate only with the host player. This lets the host decide which updates to accept, and which updates to ignore and subsequently correct.【进一步推广到四个玩家,在网络物理示例中,所有数据包都会流经host玩家,让host仲裁。
这实际上不是真正的点对点,而是选择游戏中的所有guest仅与host玩家通信的拓扑方式。 这让host可以决定接受哪些更新以及忽略哪些更新并随后进行更正。】

 
 

To apply these corrections I needed some way for the host to override guests and say, no, you don’t have authority/ownership over this cube, and you should accept this update. I also needed some way for the host to determine ordering for guest interactions with the world, so if one client experiences a burst of lag and delivers a bunch of packets late, these packets won’t take precedence over more recent actions from other guests.【要应用这些更正,我需要某种方式让host覆盖guest的结果。我还需要一些方法让host确定这些guests与整个世界互动的顺序。因此如果一个客户端经历了一段时间的延迟并延迟发送大量数据包,这些数据包的优先级将低于来自其他客户的数据包。】

 
 

As per my hunch earlier, this was achieved with two sequence numbers per-cube:【按照我之前的做法,这一切通过每个立方体的两个序列号实现的:】

  1. Authority sequence
  2. Ownership sequence

 
 

These sequence numbers are sent along with each state update and included in avatar state when cubes are held by players. They are used by the host to determine if it should accept an update from guests, and by guests to determine if the state update from the server is more recent and should be accepted, even when that guest thinks it has authority or ownership over a cube.【这些序号随着每个状态更新一起发送,并包含玩家持有的cubes的状态。
host使用它们来确定是否应接受来自guest的更新,对于guest来讲则是强制更新来自host的状态信息。】

 
 

Authority sequence increments each time a player takes authority over a cube and when a cube under authority of a player comes to rest. When a cube has authority on a guest machine, it holds authority on that machine until it receives confirmation from the host before returning to default authority. This ensures that the final at rest state for cubes under guest authority are committed back to the host, even under significant packet loss.【每次玩家取得和释放cube的权限,权限序列都会增加。
当guest具有一个cube的权限时,它将保持这个权限直到它从host接收到返回默认状态的信息。 这确保了即使在大量丢包的情况下,guest权限下的cubes处于静止状态也会被提交给主机(因为主机不确认就一直随着avatar发送信息)。】

 
 

Ownership sequence increments each time a player grabs a cube. Ownership is stronger than authority, such that an increase in ownership sequence wins over an increase in authority sequence number. For example, if a player interacts with a cube just before another player grabs it, the player who grabbed it wins.【每次玩家抓取cube时,ownership序列都会增加。
ownership比authority更强大,因此ownership序列的增加会赢得authority 序列号的增加。 例如,如果玩家在另一个玩家抓住它之前与立方体进行交互,那么抓住它的玩家将获胜。】

 
 

In my experience working on this demo I found these rules to be sufficient to resolve conflicts, while letting host and guest players interact with the world lag free. Conflicts requiring corrections are rare in practice even under significant latency, and when they do occur, the simulation quickly converges to a consistent state.【根据我在这个demo中的经验,我发现这些规则足以解决冲突,同时让host和guest与世界进行互动。
即使在严重的等待时间下,需要更正的冲突在实践中也很少见,而且当它们确实发生时,仿真会迅速收敛到一致的状态。】

 
 

 
 

Conclusion

 
 

High quality networked physics with stable stacks of cubes is possible with Unity and PhysX using a distributed simulation network model.UnityPhysX使用分布式仿真网络模型,可以实现高品质的网络物理和稳定的立方体堆栈。】

 
 

This approach is best used for cooperative experiences only, as it does not provide the security of a server-authoritative network model with dedicated servers and client-side prediction.【这种方法最适合仅用于协作体验,因为它不能提供具有专用服务器和客户端预测的服务器权威网络模型的安全性。(也就是安全性是不够的)】

 
 

Thanks to Oculus for sponsoring my work and making this research possible!【感谢Oculus赞助我的工作并使这项研究成为可能!】

 
 

The source code for the networked physics sample can be downloaded here.

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

GPU Gems – Animation in the “Dawn” Demo

4.1 Introduction

 
 

“Dawn” is a demonstration(示范) that was created by NVIDIA Corporation to introduce the GeForce FX product line and illustrate how a high-level language (such as HLSL or Cg) could be used to create a realistic human character. The vertex shaders deform a high-resolution mesh through indexed skinning and morph targets, and they provide setup for the lighting model used in the fragment shaders. The skin and wing fragment shaders offer both range and detail that could not have been achieved before the introduction of advanced programmable graphics hardware. See Figure 4-1.

【Dawn是Nvidia新产品中怎样将HLSL应用于真人角色的示范,主要涉及Vertex shader用于morphing, fragment shader用于光照。】

 
 


Figure 4-1 A Screen Capture of the Real-Time Dawn

 
 

This chapter discusses how programmable graphics hardware was used to accelerate the animation of the Dawn character in the demo.

【这里讲的就是如何使用图形硬件编程来加速Draw的角色动画。】

 
 

 
 

 
 

4.2 Mesh Animation

 
 

Traditionally, mesh animation has been prohibitively expensive for complex meshes because it was performed on the CPU, which was already burdened with physical simulation, artificial intelligence, and other computations required by today’s applications. Newer graphics hardware has replaced the traditional fixed-function pipeline with programmable vertex and fragment shaders, and it can now alleviate some of that burden from the CPU.

【传统的网格动画开销非常贵因为局限于CPU顶点计算,而CPU还承担其他的大量的工作,新的图形硬件带来的vertex/fragment shader可以分担部分CPU工作。】

 
 

Sometimes it is still necessary to perform such operations on the CPU. Many stencil-based shadow volume techniques must traverse the transformed mesh in order to find the silhouette edges, and the generation of the dynamic shadow frustum is often best done on the CPU (see Chapter 9, “Efficient Shadow Volume Rendering”). In scenes where the character is drawn multiple times per frame into shadow buffers, glow buffers, and other such temporary surfaces, it may be better to perform the deformations on the CPU if the application becomes vertex-limited. Deciding whether to perform mesh deformations on the CPU or on the GPU should be done on a per-application or even on a per-object basis.

【有时候还是需要将网格动画计算放在CPU执行,因为像体阴影需要实时去找到阴影遮挡关系来计算阴影,这些工作是在CPu上面做的。当一个character需要很多次的去渲染到模版/texture的时候就比较不适合使用GPU运算。】

 
 

The modeling, texturing, and animation of the Dawn character were done primarily in Alias Systems’ Maya package. We therefore based our mesh animation methods on the tool set the software provides. We have since created a similar demo (“Dusk,” used to launch the GeForce FX 5900) in discreet’s 3ds max package, using the same techniques; these methods are common to a variety of modeling packages and not tied to any single workflow. The methods used in these two demos are (indexed) skinning, where vertices are influenced by a weighted array of matrices, and weighted morph targets, used to drive the emotions on Dawn’s face.

【dawn美术资源来源于maya,用于这里的morphing demo】

 
 

 
 

4.3 Morph Targets

 
 

Using morph targets is a common way to represent complex mesh deformation, and the NVIDIA demo team has created a variety of demos using this technique. The “Zoltar” demo and the “Yeah! The Movie” demo (content provided by Spellcraft Studio) started with 30 mesh interpolants per second, then removed mesh keys based on an accumulated error scheme. This allowed us to reduce the file size and the memory footprint—up to two-thirds of the original keys could be removed with little to no visible artifacts. In this type of mesh interpolation, there are only two interpolants active at any given time, and they are animated sequentially.

【morphing 常用于网格变形,nvidia也做过很多相关demo。】

 
 

Alternatively, morph targets can be used in parallel. Dawn is a standard example of how this approach can be useful. Beginning with a neutral head (27,000 triangles), our artist created 50 copies of that head and modeled them into an array of morph targets, as shown in Figure 4-2. Approximately 30 of those heads corresponded to emotions (such as happy, sad, thoughtful, and so on), and 20 more were modifiers (such as left eyebrow up, right eyebrow up, smirk, and so on). In this style of animation, the morph target weights will probably not add to 1, because you may have (0.8 * happy + 1.0 * ear_wiggle), for example—Dawn is a fairy, after all.

【另外,morph target也可是并行的。Dawn 的头包含27000个三角形,做了50个头来作为morph的target array。下图展示了一些,最终morphing的结果可以是多个morph target的加权和。】

 
 


Figure 4-2 Emotional Blend Targets (Blend Shapes)

 
 

Although such complex emotional faces could have been made entirely of blends of more elemental modifiers, our artist found it more intuitive to model the face in the pose he desired, because it is hard to model an element such as an eyebrow creasing, without seeing how the eyes, cheeks, and mouth work together. This combination also helps with hardware register limitations, described later.

【要合成复杂的表情动画还是非常难的,最终的结果要看是否自然,是否会出一些明显的错误是不被允许的。譬如眼睛突出来这样的人不可能会有的行为,要加以约束,如何处理约束后面会讲。】

 
 

 
 

4.3.1 Morph Targets in a High-Level Language

 
 

Luckily, the implementation of morph targets in HLSL or Cg is simple. Assuming that vertexIn is our structure containing per-vertex data, applying morph targets in a linear or serial fashion is easy:

【幸运的是硬件实现morpg target很简单,首先来看先后时间位置的差值做法会是如下。】

 
 

float4 position = (1.0f – interp) * vertexIn.prevPositionKey + interp * vertexIn.nextPositionKey;

 
 

In this code, interp is a constant input parameter in the shader, but prevPositionKey and nextPositionKey are the positions at the prior time and next time, respectively. When applying morph targets in parallel, we find the spatial difference between the morph target and the neutral pose, which results in a difference vector. We then weight that difference vector by a scalar. The result is that a weight of 1.0 will apply the per-vertex offsets to achieve that morph target, but each morph target can be applied separately. The application of each morph target is just a single “multiply-add” instruction:

【interp 是常数输入值,prevPositionKey/nextPositionKey 是前后时刻的位置。同一时间多个morph target做法也是类似,如下加权平均。】

 
 

// vertexIn.positionDiffN = position morph target N – neutralPosition

 
 

float4 position = neutralPosition;

position += weight0 * vertexIn.positionDiff0;

position += weight1 * vertexIn.positionDiff1;

position += weight2 * vertexIn.positionDiff2;

 
 

 
 

4.3.2 Morph Target Implementation

 
 

We wanted our morph targets to influence both the vertex position and the basis (that is, the normal, binormal, and tangent) so that they might influence the lighting performed in the fragment shader. At first it would seem that one would just execute the previous lines for position, normal, binormal, and tangent, but it is easy to run out of vertex input registers. When we wrote the “Dawn” and “Dusk” demos, the GPU could map a maximum of 16 per-vertex input attributes. The mesh must begin with the neutral position, normal, binormal, texture coordinate, bone weights, and bone indices (described later), leaving 10 inputs open for morph targets. We might have mapped the tangent as well, but we opted to take the cross product of the normal and binormal in order to save one extra input.

【我们想要morph target影响顶点位置和basis,相应的影响fragment shader的光照性能。这里要注意的是GPU寄存器的数量是有限的,除去渲染要用的寄存器剩下的就是morph可以使用的寄存器。只用normal,binormal就可以,可以叉乘得到tengent,节约寄存器。】

 
 

Because each difference vector takes one input, we might have 10 blend shapes that influence position, five blend shapes that influence position and normal, three position-normal-binormal blend shapes, or two position-normal-binormal-tangent blend shapes. We ultimately chose to have our vertex shader apply five blend shapes that modified the position and normal. The vertex shader would then orthonormalize the neutral tangent against the new normal (that is, subtract the collinear elements of the new normal from the neutral tangent and then normalize) and take the cross product for the binormal. Orthonormalization is a reasonable approximation for meshes that do not twist around the surface normal:

【每一个vector作为一个输入,通过blend都会影响到最终的position。我们最终选的方案是应用五个shape blend出最终的shape。计算新的tangent如下:】

 
 

// assumes normal is the post-morph-target result

// normalize only needed if not performed in fragment shader

 
 

float3 tangent = vertexIn.neutralTangent – dot(vertexIn.neutralTangent, normal) * normal;

tangent = normalize(tangent);

 
 

Thus, we had a data set with 50 morph targets, but only five could be active (that is, with weight greater than 0) at any given time. We did not wish to burden the CPU with copying data into the mesh every time a different blend shape became active, so we allocated a mesh with vertex channels for neutralPosition, neutralNormal, neutralBinormal, textureCoord, and 50 * (positionDiff, NormalDiff). On a per-frame basis, we merely changed the names of the vertex input attributes so that those that should be active became the valid inputs and those that were inactive were ignored. For each frame, we would find those five position and normal pairs and map those into the vertex shader, allowing all other vertex data to go unused.

【因此我们有了50个morph目标但是只能在同一时刻激活使用5个。我们不希望每一次做差值都需要重新拷贝这五个目标的数据,因此我们为mesh分配相关的vertex channel包括neutralPosition…信息。在每一帧的基础上,我们只是改变vertex input的属性名字来决定其是否激活,在进行计算。】

 
 

Note that the .w components of the positionDiff and normalDiff were not really storing any useful interpolants. We took advantage of this fact and stored a scalar self-occlusion term in the .w of the neutralNormal and the occlusion difference in each of the normal targets. When extracting the resulting normal, we just used the .xyz modifier to the register, which allowed us to compute a dynamic occlusion term that changed based on whether Dawn’s eyes and mouth were open or closed, without any additional instructions. This provided for a soft shadow used in the lighting of her skin (as described in detail in Chapter 3, “Skin in the ‘Dawn’ Demo”).

【positionDiff/normalDiff 的 .w 分量在差值中用不到,我们根据这个来让这个值存储遮蔽信息,这样就可以做到跟据w判读是否启用这里的.xyz,节省空间时间。】

 
 

On the content-creation side, our animator had no difficulty remaining within the limit of five active blend shapes, because he primarily animated between three or so emotional faces and then added the elemental modifiers for complexity. We separated the head mesh from the rest of the body mesh because we did not want the added work of doing the math or storing the zero difference that, say, the happy face would apply to Dawn’s elbow. The result remained seamless—despite the fact that the head was doing morph targets and skinning while the body was doing just skinning—because the outermost vertices of the face mesh were untouched by any of the emotional blend shapes. They were still modified by the skinning described next, but the weights were identical to the matching vertices in the body mesh. This ensured that no visible artifact resulted.

【在内容创建的部分,其实五个差值已经足够用来差出目标效果了。我们这里切分出头和身体,一般身体不参与这里的运算驱动。】

 
 

 
 

4.4 Skinning

 
 

Skinning is a method of mesh deformation in which each vertex of that mesh is assigned an array of matrices that act upon it along with weights (that should add up to 1.0) that describe how bound to that matrix the vertex should be. For example, vertices on the bicep may be acted upon only by the shoulder joint, but a vertex on the elbow may be 50 percent shoulder joint and 50 percent elbow joint, becoming 100 percent elbow joint for vertices beyond the curve of the elbow.

【蒙皮就是骨骼驱动网格数据,就是去定义一个mesh顶点怎样根据其骨骼权重差值得到新的位置。】

 
 

Preparing a mesh for skinning usually involves creating a neutral state for the mesh, called a bind pose. This pose keeps the arms and legs somewhat separated and avoids creases as much as possible, as shown in Figure 4-3. First, we create a transform hierarchy that matches this mesh, and then we assign matrix influences based on distance—usually with the help of animation tools, which can do this reasonably well. Almost always, the result must be massaged to handle problems around shoulders, elbows, hips, and the like. This skeleton can then be animated through a variety of techniques. We used a combination of key-frame animation, inverse kinematics, and motion capture, as supported in our content-creation tool.

【准备好一些bind pose,就是预定义的一些关键帧,这些关键帧就是人为的去除一些不自然的情况。然后的做法就是上述的蒙皮得到变形网格。】

 
 


Figure 4-3 Dawn’s Bind Pose

 
 

A skinned vertex is the weighted summation of that vertex being put through its active joints, or:

【公式描述:vertex最终位置由joint的加权乘结果得到,存在矩阵乘法是因为骨骼间的继承关系。】

 
 


 
 

Conceptually, this equation takes the vertex from its neutral position into a weighted model space and back into world space for each matrix and then blends the results. The concatenated 


 matrices are stored as constant parameters, and the matrix indices and weights are passed as vertex properties. The application of four-bone skinning looks like this:

【上面的计算存在在模型空间完成计算,然后结果再应用到世界空间这一个过程。实现如下】

 
 

float4 skin(float4x4 bones[98],

float4 boneWeights0,

float4 boneIndices0)

{

float4 result = boneWeights0.x * mul(bones[boneIndices.x], position);

result = result + boneWeights0.y * mul(bones[boneIndices.y],

position);

result = result + boneWeights0.z * mul(bones[boneIndices.z],

position);

result = result + boneWeights0.w * mul(bones[boneIndices.w],

position);

return result;

}

 
 

In the “Dawn” demo, we drive a mesh of more than 180,000 triangles with a skeleton of 98 bones. We found that four matrices per vertex was more than enough to drive the body and head, so each vertex had to have four bone indices and four bone weights stored as vertex input attributes (the last two of the 16 xyzw vertex registers mentioned in Section 4.3.2). We sorted bone weights and bone indices so that we could rewrite the vertex shader to artificially truncate the number of bones acting on the vertex if we required higher vertex performance. Note that if you do this, you must also rescale the active bone weights so that they continue to add up to 1.

【在 Dawn 的例子中,我们的网格超过 180000 三角形, 骨骼有 98 根。 我们发现一个顶点被四根骨头驱动就已经足够了,这里就是这么应用的,在这里要注意的一点是要保证权重合为一。】

 
 

4.4.1 Accumulated Matrix Skinning

 
 

When skinning, one must apply the matrix and its bind pose inverse not only to the position, but also to the normal, binormal, and tangent for lighting to be correct. If your hierarchy cannot assume that scales are the same across x, y, and z, then you must apply the inverse transpose of this concatenated matrix. If scales are uniform, then the inverse is the transpose, so the matrix remains unchanged. Nonuniform scales create problems in a variety of areas, so our engine does not permit them.

【skin的时候,我们不仅仅要对pose应用matrix, 其他信息一样要这么做。 一定要注意 uniform scale是必要的。】

 
 

If we call the skin function from the previous code, we must call mul for each matrix for each vertex property. In current hardware, multiplying a point by a matrix is implemented as four dot products and three adds, and vector-multiply is three dot products and two adds. Thus, four-bone skinning of position, normal, binormal, and tangent results in:

【统计一下四骨头驱动那些信息的计算量:一个顶点乘上矩阵就是下面第一个小括号的计算量,再乘上四个顶点共88条指令。】

 
 


 
 

An unintuitive technique that creates the sum of the weighted matrices can be trivially implemented in HLSL or Cg as follows:

【GPU处理矩阵运算:】

 
 

float4x4 accumulate_skin(float4x4 bones[98],

 

float4 boneWeights0,

float4 boneIndices0)

{

float4x4 result = boneWeights0.x * bones[boneIndices0.x];

result = result + boneWeights0.y * bones[boneIndices0.y];

result = result + boneWeights0.z * bones[boneIndices0.z];

result = result + boneWeights0.w * bones[boneIndices0.w];

return result;

}

 
 

Although this technique does burn instructions to build the accumulated matrix (16 multiplies and 12 adds), it now takes only a single matrix multiply to skin a point or vector. Skinning the same properties as before costs:

【这样可以减少总数的指令】

 
 


 
 

 
 

4.5 Conclusion

 
 

It is almost always beneficial to offload mesh animation from the CPU and take advantage of the programmable vertex pipeline offered by modern graphics hardware. Having seen the implementation of skinning and morph targets using shaders, however, it is clear that the inner loops are quite easy to implement using Streaming SIMD Extensions (SSE) instructions and the like, and that in those few cases where it is desirable to remain on the CPU, these same techniques work well.

 
 

In the case of the “Dawn” demo, morph targets were used to drive only the expressions on the head. If we had had more time, we would have used morph targets all over the body to solve problems with simple skinning. Even a well-skinned mesh has the problem that elbows, knees, and other joints lose volume when rotated. This is because the mesh bends but the joint does not get “fatter” to compensate for the pressing of flesh against flesh. A morph target or other mesh deformation applied either before or after the skinning step could provide this soft, fleshy deformation and create a more realistic result. We have done some work on reproducing the variety of mesh deformers provided in digital content-creation tools, and we look forward to applying them in the future.

【废话不翻译了。】

 
 

【这里没有很值得让人记住的技术点,最主要的贡献在于N的显卡的能力强大到如此大计算量的蒙皮人物也能跑的起来,如此复杂的avatar实际应用价值有限,GPU蒙皮的优化方案的效果理论上都达不到50%的优化,实际效果应该更加不如人意。】

 
 

4.6 References

 
 

Alias Systems. Maya 5.0 Devkit. <installation_directory>/devkit/animEngine/

 
 

Alias Systems. Maya 5.0 Documentation.

 
 

Eberly, David H. 2001. 3D Game Engine Design, pp. 356–358. Academic Press.

 
 

Gritz, Larry, Tony Apodaca, Matt Pharr, Dan Goldman, Hayden Landis, Guido Quaroni, and Rob Bredow. 2002. “RenderMan in Production.” Course 16, SIGGRAPH 2002.

 
 

Hagland, Torgeir. 2000. “A Fast and Simple Skinning Technique.” In Game Programming Gems, edited by Mark DeLoura. Charles River Media.

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

Voxel House

  • Introduction

 
 

http://www.oskarstalberg.com/game/house/Index.html

 
 


 
 

My projects typically revolve(围绕) around some central idea that I want to explore. Here, that central idea is a particular content driven approach to modular tilesets that I’ve had on my mind for a while. This project could have been created as a Python script in Maya or a node graph in Houdini. However, since I don’t want my final presentation material to be a dull narrated youtube clip set in a grey-boxed Maya scene, I created an interactive web demo instead. As a tech artist, the width of my skill set is crucial; I’m not a master artist nor a proper coder, but I’ve got a slice of both in me. I’m most comfortable in the very intersection of art and tech; of procedure and craftsmanship. A web demo is the perfect medium to display those skills.

 
 

 
 

  • Figuring out the tiles

 
 

The core concept is this: the tiles are places in the corners between blocks, not in the center of the blocks. The tiles are defined by the blocks that surround them: a tile adjacent to one block in the corner would be 1,0,0,0,0,0,0; a tile representing a straight wall would be 1,1,1,1,0,0,0,0.

 
 


 
 

Since each corner is surrounded by 8 possible blocks, each of which can be of the 2 possible states of existence or non-existence, the number of possible tiles are 2^8= 256. That is way more blocks than I want to model, so I wrote a script to figure out which of these tiles were truly unique, and which tiles were just rotations of other tiles. The script told me that I had to model 67 unique tiles – a much more manageable number.

 
 


 
 

I could have excluded flipped version of other tiles as well, which would have brought the number down even further. However, I decided to keep those so that I could make some asymmetrically tiling features. The drain pipes you see in concave corners of the building is one example of that.

 
 

 
 

  • Boolean setup in Maya

 
 

Being the tech artist that I am, I often spend more time on my workflow than on my actual work. Even accounting for rotational permutations(排列), this project still involved a large amount of 3D meshes to manually create and keep track of. The modular nature of the project also made it important to continuously see and evaluate the models in their proper context outside of Maya. The export process had to be quick and easy and I decided to write a small python script to help me out.

【这里有巨大的工作量,即使可以旋转,依然有大量的组合,目标是使得各个连接都可以有很好的效果。这个过程要足够的快速和容易,使用python脚本解决。理解就是脚本的作用就是来验证美术做出来的效果是ok可用的。】

 
 

First, the script merges all my meshes into one piece. Second, a bounding box for each tile proceeds to cut out its particular slice of this merged mesh using Maya’s boolean operation. All the cutout pieces inherit the name and transform from their bounding box and are exported together as an fbx.

【把所有相关mesh合并成piece,使用maya布尔操作进入切出每一tile的包围盒,就是包到邻居。】

 
 

Not only did this make the export process a one-button solution, it also meant that I didn’t have to keep my Maya scene that tidy. It didn’t matter what meshes were named, how they were parented or whether they were properly merged or not. I adapted my Maya script to allow several variations of the same tile type. My Unity script then chose randomly from that pool of variation where it existed. In the image below, you can see that some of the bounding boxes are bigger than the others. Those are for tiles that have vertices that stretch outside their allotted volume.

 
 


 
 

 
 

  • Ambient Occulusion 环境光遮蔽

 
 

Lighting is crucial to convey 3D shapes and a good sense of space. Due to the technical limitations in the free version of Unity, I didn’t have access to either real time shadows or ssao – nor could I write my own, since free Unity does not allow render targets. The solution was found in the blocky nature of this project. Each block was made to represent a voxel in a 3D texture. While Unity does not allow me to draw render targets on the GPU, it does allow me to manipulate textures from script on the CPU. (This is of course much slower per pixel, but more than fast enough for my purposes.)

Simply sampling that pixel in the general direction of the normal gives me a decent ambient occlusion approximation.

 
 

I tried to multiply this AO on top of my unlit color texture, but the result was too dark and boring. I decided on an approach that took advantage on my newly acquired experience in 3D textures: Instead of just making pixels darker, the AO lerps the pixel towards a 3D LUT that makes it bluer and less saturated. The result gives me a great variation in hue without too harsh a variation in value. This lighting model gave me the soft and tranquil feeling I was aiming for in this project.

 
 


 
 

 
 

  • Special Pieces(特殊件)

 
 


 
 

When you launch the demo, it will auto generate a random structure for you. By design, that structure does not contain any loose or suspended blocks.

 
 

I know that a seasoned tool-user will try to break the tool straight away by seeing how it might treat these type of abnormal structures. I decided to show off by making these tiles extra special, displaying features such as arcs, passages, and pillars.

 
 


 
 


 
 

 
 

  • Floating Pieces

 
 

There is nothing in my project preventing a user from creating free-floating chunks, and that’s the way I wanted to keep it. But I also wanted to show the user that I had, indeed, thought about that possibility. My solution to this was to let the freefloating chunks slowly bob up and down. This required me to create a fun little algorithm to figure out in real time which blocks were connected to the base and which weren’t:

 
 

The base blocks each get a logical distance of 0. The other block check if any of their neighbors have a shorter logical distance than themselves; if they do, they adopt that value and add 1 to it. Thus, if you disconnect a chunk there will be nothing grounding those blocks to the 0 of the base blocks and their logical distance will quickly go through the roof. That is when they start to bob.

 
 

The slow bobbing of the floating chunks add some nice ambient animation to the scene.

 
 


 
 

 
 

  • Art Choices

 
 

Picking a style is a fun and important part of any project. The style should highlight the features relevant to a particular project. In this project, I wanted a style that would emphasize blockiness and modularity rather than hiding it.

 
 

The clear green lines outline the terraces, the walls are plain and have lines of darker brick marking each floor, the windows are evenly spaced, and the dirt at the bottom is smooth and sedimented in straight lines. Corners are heavily beveled to emphasize that the tiles fit together seamlessly. The terraces are supposed to look like cozy secret spaces where you could enjoy a slow brunch on a quiet Sunday morning. Overall, the piece is peaceful and friendly – a homage to the tranquility of bourgeois life, if you will.

 
 

 
 

  • Animation

 
 

It should be fun and responsive to interact with the piece. I created an animated effect for adding and removing blocks. The effect is a simple combination of a vertex shader that pushes the vertices out along their normals and a pixel shader that breaks up the surface over time. A nice twist is that I was able to use the 3D texture created for the AO to constrain the vertices along the edge of the effect – this is what creates the bulge along the middle seen in the picture.

 
 


 
 


 
 

 
 

 
 

  • Conclusion

 
 

The final result is like a tool, but not. It’s an interactive piece of art that runs in your browser. It can be evaluated for it’s technical aspects, it’s potential as a level editor tool, it’s shader work, it’s execution and finish, or just as a fun thing to play around with. My hope is that it can appeal to developers and laymen alike. In a way, a web demo like this is simply a mischievous way to trick people into looking at your art longer than they otherwise would.

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

Managing Transformations in Hierarchy

  • Introduction

 

One of the most fundamental aspects of 3D engine design is management of spatial relationship between objects. The most intuitive way of handling this issue is to organize objects in a tree structure (hierarchy), where each node stores its local transformation, relative to its parent.

The most common way to define the local transformation is to use a socalled TRS system, where the transformation is composed of translation, rotation, and scale. This system is very easy to use for both programmers using the engine as well as non-technical users like level designers. In this chapter we describe the theory behind such a system.

One problem with the system is decomposition of a matrix back to TRS. It turns out that this problem is often ill-defined and no robust solution exists. We present an approximate solution that works reasonably well in the majority of cases.

 

  • Theory

树结构

Keeping objects in hierarchy is a well-known concept. Every object can have a number of children and only one parent.  It can also be convenient to store and manage a list of pointers to the children so that we have fast access to them. The aforementioned structure is in fact a tree.

节点结构

We assume that a node stores its translation, rotation, and scale (TRS) that are relative to its parent. Therefore, we say these properties are local. When we move an object, we drag all its children with it. If we increase scale of the object, then all of its children will become larger too.

例:

bgt8_1_01

 

变换矩阵与RTS

对于单个节点的变换矩阵和RTS的关系

Local TRS uniquely defines a local transformation matrix M. We transform vector v in the following way:

bgt8_1_02

where S is an arbitrary scale matrix, R is an arbitrary rotation matrix, T is a translation matrix, and T is the vector matrix T is made of.

系统层次结构的变换矩阵关系

To render an object, we need to obtain its global (world) transformation by composing local transformations of all the object’s ancestors up in the hierarchy.

The composition is achieved by simply multiplying local matrices. Given a vector v0, its local matrix M0, and the local matrix M1 of v0’s parent, we can find the global position v2:

bgt8_1_03

Using vector notation for translation, we get

bgt8_1_04

这里需要注意的就是

RS != S’R’

 

Skew Problem

问题描述:

Applying a nonuniform scale (coming from object A) that follows a local rotation (objects B and C) will cause objects (B and C) to be skewed. Skew can appear during matrices composition but it becomes a problem during the decomposition, as it cannot be expressed within a single TRS node. We give an approximate solution to this issue in Section 3.2.4.

bgt8_1_05

解决方法:

Let an object have n ancestors in the hierarchy tree. Let M1,M2, · · · ,Mn be their local transformation matrices, M0 be a local transformation matrix of the considered object, and Mi = SiRiTi.

MTRSΣ = M0M1 · · ·Mn

MTR = R0T0R1T1 · · ·RnTn

TR可以很好的叠加来获得世界坐标的TR

MSΣ = MRSΣMR

here we have the skew and the scale combined. We use diagonal elements of MSΣ to get the scale, and we choose to ignore the rest that is responsible for the skew.

Scale 的话用这边算出来的对角线来表示,其他的结果丢弃采用上面的TR,这样的结果就可以避免skew.

 

父节点变化处理

In a 3D engine we often need to modify objects’ parent-children relationship.

we want to change the local transformation such that the global transformation is still the same. Obviously, that forces us to recompute local TRS values of the object whose parent we’re changing.

To get from the current local space to a new local space (parent changes, global transform stays the same), we first need to find the global transform of the object by going up in the hierarchy to the root node. Having done this we need to go down the hierarchy to which our new parent belongs.

LetM’0 be the new parent’s local transformation matrix. Let that new parent have n’ ancestors in the hierarchy tree with local transformations M’1,M’2, · · · ,M’n’, where M’i = S’iR’iT’i. The new local transformation matrix can thus be found using the following formula:

bgt8_1_06

bgt8_1_07

通过此公式就可以求出新的RTS

 

Alternative Systems

这边主要讲 Scale 处理,和skew相关

做法:除了叶节点存储x,y,z不相同的,各项异的scale值(三维向量)(nonuniform scale in last node),其他节点存储的是uniform scale值(不是三维向量,是值)这样可以有效的解决skew问题且实现简单。

 

  • Implementation

节点结构:

bgt8_1_08

Reducing Texture Memory Usage by 2-channel Color Encoding

原理:

简单地说就是贴图一般所使用到的色域都很小,因此可以利用这个特征来减少表示texture的数据量。

These single-material textures often do not exhibit large color variety and contain a limited range of hues, while using a full range of brightness resulting from highlights and dark (e.g., shadowed), regions within the material surface.

 

基本做法就是传输亮度和饱和度

The method presented here follows these observations and aims to encode any given texture into two channels: one channel preserving full luminance information and the other one dedicated to hue/saturation encoding.

 

Texture Encoding Algorithm

 

编码就是三维映射到二维的过程,就是找出一个平面,使得所有的三维顶点到该平面的距离最小来保证误差最小。

Approximating this space with two channels effectively means that we have to find a surface (two-dimensional manifold) embedded within this unit cube that lies as close as possible to the set of texels from the source texture.

bgt_7_01

 

步骤:

1. 重估颜色空间

 

sRGB颜色值转到线性颜色空间。

RGB值对亮度的贡献的非线性和不同的,因此我们要赋予RGB不同的权重。

上面两步得到新的可以线性运算的三维空间坐标

bgt_7_02

然后在此基础上去计算平面。

点到平面距离:

bgt_7_03

所有点到平面距离的平方和

bgt_7_04

通过如下计算方法计算得到。参考 estimate_image 函数和书本。

2. 算出两个base的颜色向量

 

bgt_7_05

这里很简单如图:已知道 bc1(0,1,m)bc2(1,0,n)初始化以后利用平面的信息求的。见函数

find_components() 求解。

3. 亮度编码

 

公式:

bgt_7_06

4. 饱和度编码

 

bgt_7_07

四步走:首先获得三维点在平面的投影,然后就有(0,0,0)到该投影点的向量,另外还可以计算得到两个base颜色向量,让这个投影向量用两个基本颜色向量表示。最终再通过公式求的饱和度值。

 

Decoding Algorithm

这个很简单,就是也需要两个base颜色值和亮度混合参数,然后反解得到。

bgt_7_08

 

  • 实现:

 

vec3 estimate_image(BITMAP *src) :

整个拆成 rr, gg, bb, rg, rb, gb 六个分量。首先计算整个image的六个分量的均值。

然后暴力法遍历预设好的法线取值范围(例 n.xyz 在 0 到 100)

求出误差公示最小的法线值。

 

void stamp_color_probe(BITMAP *bmp):

这个是图片颜色预处理

 

编码:

BITMAP *encode_image(BITMAP *src,vec3 n):

平面法线normalize,并找出两个基本的颜色。

然后通过这两个基本颜色构建2d位置坐标系和对应的颜色坐标系。

接着创建输出bitmap,然后对于每一个输出mipmap的像素点:

获得RGB值

应用 gamma 2.0

计算 2D 坐标系下面的坐标位置和颜色

构建饱和度 float hue = -da/(db-da+0.00000001f);(da, db是2d坐标下的颜色值)

构建亮度  float lum = sqrt(c.dot(to_scale))

编码成 hue, lum 两个分量。

 

解码

BITMAP *decode_image(BITMAP *src,const vec3 &base_a,const vec3 &base_b):

初始化目标Bitmap对于它的每一个像素点

首先获得传过来的 hue, lum

解码颜色 vec3 c = base_a + (base_b-base_a)*hue;

解码亮度 float clum = 0.2126f*c.x + 0.7152f*c.y + 0.0722f*c.z;

应用 gamma 2.0 回到 sRGB 值