Tag: Performance

Advanced VR Rendering Performance

作者:

Alex Vlachos

ValveAlex@ValveSoftware.com


这是全文的第二部分,第一部分在GDC 15的时候讲的,下面是链接:

Video and slides from last year are free online: http://www.gdcvault.com/play/1021771/Advanced-VR

全篇目标就是:VR渲染如何在保证质量的前提下获得最好的性能。

全文分为四部分:


Multi-GPU for VR

这一点主要考虑的方向就是GPU硬件对于性能的提升。

首先回顾上一篇有讲到隐藏网格区域,指的就是对于最后渲染buffer的不会看到的部分位置的像素点不渲染。

计算机生成了可选文字: ○ 寸 1

bgt_6_1

bgt_6_2

bgt_6_3计算机生成了可选文字:
首先考虑使用单颗
GPU来完成所有工作。

这里单GPU渲染工作方式有很多种,这里以顺序渲染作为例子。

bgt_6_4计算机生成了可选文字:

bgt_6_5计算机生成了可选文字: GPIJO Sha ows eft E e VSync Submit L 11.11 ms Submit R syste VSync

上图展示的就是一次渲染的单GPU工作流,特点是两个眼睛共享shadow buffer

然后我们再来考虑同时使用多个GPU来渲染。

AMD, Nvidia提供的多GPU协同工作API大同小异,都具备以下的重要功能:

  • 广播draw call + mask 确定每个GPU的工作内容
  • 每个GPU拥有独立的shader constant buffers并可以独立设置
  • GPU之间传输(或异步传输:在目标GPU在工作时不会打断,特别有用)渲染目标主体

使用两个GPU的情况:

bgt_6_6

bgt_6_7

计算机生成了可选文字: Submit L Submit R eft Eye GPUO Sha ows syste Win o GPI-Jl Shadows VSync 11.11 ms VSync

  • 每个GPU渲染一只眼睛看到的东西
  • 每个GPU都需要渲染shadow buffer
  • GPU的渲染结果需要提交到主GPU上,最终提交给VR system
  • 实践证明这样的方式大概可以提高30-35%的系统性能

使用4GPU的情况

bgt_6_8计算机生成了可选文字: E ndĐ nd9

bgt_6_9计算机生成了可选文字: Submit L GPIJO GPIJI GPIJ2 GPIJ3 eft E e A Sha ows Shadows Right Eye A Sha ows Shadows Submit R Syste VSync Transfe 11.11 ms VSync

  • 每个GPU只需要渲染半只眼睛看到的画面
  • 但是每个GPU还是需要独立的都运算一遍shadow buffers
  • 对于 pixel shadercost来说每个GPU减到原来的1/4,但是对于vertex shader来说,每个GPU都需要做,cost还是1。(相比较于单GPU
  • 太多的GPU驱动的工作协调会带来更高的CPU cost

这里还要注意的是因为GPU渲染结果之间的传输是支持异步的,因此可以来考虑如何把多个次GPu的结果传输到主GPU上去,可以有多种组合方式,第三种方式的最终等待时间最少被采用。

bgt_6_10

计算机生成了可选文字: GPUO GPU2 GPIJ3 GPI-JO GPUI GPU2 GPU3 GPUO GPI-Jl GPU2 GPU3 Submit L Submit R GPI-Jl Is Left Eye B hadows Shadows Right Eye A Submit R Shadows Left Eye B hadows Right Eye A Submit L Sutrnit R Left Eye Shadows Shadows Right Eye A • Trn<er• Right Eye B Shadows

到此大家可能会发现,当单个GPU渲染改成2个的时候,其性能的提升还是非常明显的。但是再次增加GPU数目的时候,从时间效率的角度的性能提升已经不是很有优势了,这是因为当多GPU可以切分的Cost(主要是Pixel shadercost)分的越来越小,其占据GPU运算的主要瓶颈就在于多GPU不可切分的Costshadow buffer运算和vertex shader相关)。也就是每个GPU都在做同样的重复的事情。

下图就是在相同工作量的情况下,不同GPU数量的时间性能比较。

bgt_6_11

计算机生成了可选文字: GPI-JOI GPI-JO GPI-Jl GPI-JO GPI-Jl GPIJ2 GPIJ3 Submit L Submit L Submit R Submit L Submit R hadow Left Eye B Shadows Right Eye A Shadows Submit R VSync 11.11 ms VSync

但是反过来考虑突出多GPU的优点就是,可以获得更高的最终画质(原pixel shader cost非常高)。

下图就是在相同性能时间的情况下,不同数量GPU可以获得的画面质量比较。

bgt_6_12


Fixed Foveated Rendering & Radial Density Masking

这一点主要考虑VR眼镜的光学特性来提升渲染性能。

投影矩阵投影后的在渲染buffer上的像素密度分布是与我们所希望的相反的。

  • Projection matrix: 在画面中心的位置所能采样到的点的数量比画面周边少
  • VR optics:画面中心位置的采样点在VR里面是你最关注的,看得最清楚的

结果就是导致对画面周边的over rendering

Over Rendering 解释:

bgt_6_13

bgt_6_14计算机生成了可选文字:

优化:Fixed Foveated Rendering

按下列模版去渲染,放大画面中心位置,减少画面周边的所需渲染的pixel数量。

bgt_6_15

bgt_6_16

bgt_6_17

计算机生成了可选文字:

这种模式下推荐使用多GPU渲染

Using NVIDIA’s “Multi-Resolution Shading” we gain an additional ~5-10% GPU perf with less CPU overhead (See “GameWorksVR”, Nathan Reed, SIGGRAPH 2015)

接下来作者想到了新的玩法

Radial Density Masking

对于周边的区域采用2*2pixel块间隔渲染来减少渲染的pixel数量。

Skip rending a checker pattern of 2×2 pixel quads to match current GPU architectures

bgt_6_18

计算机生成了可选文字:

然后根据算法填充完没有渲染的pixel区域

bgt_6_19

计算机生成了可选文字: Average 2 neig ors (Average across dia 1/16 1/8 1/16 1/8 1/4 1/8 1/16 1/8 1/16 Optimized Bilinear Samples Weights near to far: 0.375, 0.375, 0.125, 0.125 Weights near to far: 0.5, 0.28125, 0.09375, 0.09375, 0.03125

左边是理论上的算法,右边是可以根据左边方法直接生成权值模版套用。

总结下步骤:

首先是渲染间隔的2*2pixel块,然后就是套用filter来填充其他pixel

这种方式在Aperture Robot Repair的例子里面减少了5-15%的性能消耗。对于低端的GPU来说这种方式特别有效。


Reprojection

如果你的引擎达不到目标的帧率,那么VR系统就应该通过reproject上一帧来设置这一帧的结果。

reproject包括

  • Rotation-only reprojection
  • Position & rotation reprojection

但特别需要注意的是:这里的这种reprojection的方式看作是帧率的最后的安全网,只有在你的GPU性能都低于你的应用程序最低性能要求的前提下才去使用。

Rotation-Only Reprojection

两张前后序的渲染结果对应位置求平均得到的图片会存在judder

bgt_6_20

计算机生成了可选文字:

judder的存在包括很多原因,比如包括相机移动,动画,对象移动等等。

这里judder存在的一个很大的原因就是对相机的模拟方式不够准确。

首先rotation projection应该是以眼睛为中心的,而不是以头部中心为中心的,不然会导致旋转与用户感知的旋转不一致。

bgt_6_21计算机生成了可选文字:

其次需要考虑的是两眼间的间距,两眼间距如果不是和戴头盔的人眼的间距不一致,也就是旋转半径不同,这样得到的旋转结果也和用户的感受不一致。

bgt_6_22计算机生成了可选文字:

但是综合考虑的话,rotation-onlyreprojection可以说已经足够好用,相比起掉帧来说。

Positional Reprojection

仍然是一个没有解决的问题。

  • 传统的渲染方式只会保留一个深度的内容,因此对于半透明的reprojection来说是一种挑战,特别是粒子系统的使用后的positional reprojection
  • MSAA depth buffer已经存了现有的颜色,再当深度信息到达存储的时候可能会导致渗色。
  • 用户的移动会导致看到的内容出现缺漏,补洞算法也是一个挑战。

Asynchronous Reprojection

作者提出的新的概念,理想的安全网

首先这种方式需要GPU可以精确的被抢占(抢占粒度的控制),当前的GPU理论上可以在draw call之间被抢占,但实际上是看GPU现有的功能。

异步的一大问题还是在于不能保证reproject在一次vsync之内完成,而如果完不成就没有任何意义。

作为应用程序如果想要使用异步时间扭曲,必须注重抢占粒度

“You can split up the screen into tiles and run the post processing on each tile in a separate draw call. That way, you provide the opportunity for asynctimewarp to come in and preempt in between those draws if it needs to.” –“VRDirect”,Nathan Reed, GDC 2015

Interleaved Reprojection

对于老旧的GPU来说是不支持异步reprojection的,因为没有GPU抢占功能,这时候我们就需要寻找替代方案。

如果你的系统不支持 always-on asynchronous reprojection 功能, OpenVR API 提供 every-other-frame rotation-only reprojection 的功能。这模式下应用程序可以获得18ms的时间来渲染一张frame。这种模式对于保证帧率来说是很好的交易:

“In our experience, ATW should run at a fixed fraction of the game frame rate. For example, at 90Hz refresh rate, we should either hit 90Hz or fall down to the half-rate of 45Hz with ATW. This will result in image doubling, but the relative positions of the double images on the retina will be stable. Rendering at an intermediate rate, such as 65Hz, will result in a constantly changing number and position of the images on the retina, which is a worse artifact.” –“Asynchronous Timewarp Examined”, Michael Antonov, Oculus blog, March, 2015


Adaptive Quality

保证帧率是非常困难的一件事情。VR相对于传统游戏来说的挑战在于:

  • 用户对相机的精细控制
  • 用户与游戏世界的新的交互模型

这里作者有提到他们为了让Robor Repair达到目标帧率的经历是整个项目中最难的最累的部分精力。为了让用户在任意角度观看和操作都达到90帧的帧率来微调内容和渲染是最痛苦的。

动态的质量变化就是根据当前GPU的能力动态的调整渲染质量来保证帧率。

  • Goal #1: Reduce the chances of dropping frames and reprojecting
  • Goal #2: Increase quality when there are idle GPU cycles

那么首先考虑VR层面哪些渲染设置是可以调整的:

  • Rendering resolution/ viewport
  • MSAA 层数 抗锯齿算法
  • Fixed Foveated Rendering (第二部分的内容)
  • Radial Density Masking(第二部分的内容)
  • Etc.

而有些渲染设置是不可以调整的:

  • 阴影
  • 视觉效果,比如镜面

作者他们使用的一个动态调整质量的例子:

bgt_6_23

计算机生成了可选文字: Defa U Quality Level +6 +5 +3 +2 1 2 3 _4 MSAA 8x 4x Resolution Scale 1.4 1.3 1.2 1.1 1.0 1.1 1.0 0.9 0.81 0.73 0.65 Radial Density Masking On Render Resolution 2116x2352 1965x2184 1814x2016 1663x1848 1512x1680 1663x1848 1512x1680 1360x1512 1224x1360 1102x1224 992xı 102

这里作者展示了一段视频来说明渲染质量之间的切换,上面的拉条标识的就是当前的渲染质量。

bgt_6_24

计算机生成了可选文字: AT-D179b

在自动调整渲染质量的过程中最关键的就是要去衡量GPU的工作负载。

VR系统里面GPU的工作量是变化的,来源于lens distortion, chromatic aberration, chaperone bounds, overlays, etc.

我们需要了解的是整个VR system的时间流,OpenVR提供了对于all GPU workstotal GPU timer的显示:

bgt_6_25计算机生成了可选文字: VSync Application Rendering Start Timer VR System Rendering VSync Time Remainin End Timer

GPU Timers 存在延时

  • GPU 查询到的是前一个frame的结果
  • 在处理队列中的一两个帧也是不能被修改的

下图展示的就是GPU工作流,一帧的内容从CPUGPU一起处理完是横跨几个Vsync的,因此你要修改的瞬间前已经进入CPU处理的还是会被渲染出来,也就是上面第二点会延迟一两个帧再轮到你就该的结果frame的显示。

关于第一点就是说你在当前帧没处理完提交前的查询,查询的是渲染buffer的内容,就是上一帧提交的结果。

bgt_6_26计算机生成了可选文字: Get ms 1 ms CPU GPU vsync submit D3D call$ 11.11 ms ende 1 VSync Game Simu aor Timer 11.11 ms VSync ame Simu atio submit D3D calls Rende VSync Submit D3D all 11.11 end

作者动态调整渲染级别的细节:三条规则

目标:维持GPU 70-90%的利用率

高于 90%:主动降低两个渲染级别

低于 70%:主动提高一个渲染级别

预测 85% + 线性增长:主动降低两个渲染级别,因为处理延时存在2frame,因此需要提前预测。

10%的空闲GPU流出来可以应对其他进程对于GPU的需求或者是一些系统或其他的突发请求,是非常有用的。

因此这里我们要求每一帧的渲染时间控制在10ms以内,而不是去年提出来的11.11ms

这里还有一个问题需要注意的就是当resolution scalar下降太过的时候会导致文本不宜阅读,因此对于GPU性能很差的时候,我们建议开启 Interleaved Reprojection Hint 来维持帧率。

因此在Aperture Robot Repair例子里面我们提供了两套Adaptive Quality的选择。

bgt_6_27计算机生成了可选文字: Option A +6: 8xMSAA, +5: 8xMSAA, +4: 8xMSAA, +3: 8xMSAA, +2: 8xMSAA, +1: 4xMSAA, 1.4x res 1.3x res 1.2x res 1.1x res 1.0x res 1.1x res 1.0x resolution (Default) Option B — Text-friendly +6: 8xMSAA, +5: 8xMSAA, +4: 8xMSAA, +3: 8xMSAA, +2: 8xMSAA, +1: 4xMSAA, O: 4xMSAA, -1: 4xMSAA, 0.9x res -2: 4xMSAA, 0.81x res -3: 4xMSAA, 0.73x res -4: 4xMSAA, 0.65x res, O: 4xMSAA, -1: 4xMSAA, -2: 4xMSAA, -3: 4xMSAA, 1.4x res 1.3x res 1.2x res 1.1x res 1.0x res 1.1x res 1.0x resolution (Default) 0.9x res 0.81x res 0.81x res, Interleaved Reprojection Hint Radial Density Masking

【这部分需要再看下视频】

还有一个需要注意的问题是GPU内存,这也是我们选择光圈大小的依据之一。

bgt_6_28计算机生成了可选文字: Scalar 2.0 2.0 776 мв Aperture 1.4 342 мв 684 мв 1.2 502 мв 1.0 348 мв 1.1 117 мв 234 мв 1.0 194 мв 128 мв MSAA 8х Resolution 3024х3360 3024х3360 2116х2352 1814х2016 1512х1680 1663х1848 1512х1680 1224х1360 GPU Метогу 1 Еуе = Color + Depth + Resolve GPU Метогу 2 Eyes = Color + Depth + Resolve 1,396 мв 0.81 698 мв 388 мв 251 мв 174 мв 97 мв 64 МВ

Aperture allocates both a 1.4 8xMSAA and a 1.1 4xMSAA render target per eye for a total of 342 MB + 117 MB = 459 MB per eye (918 MB 2 eyes)! So we use sequential rendering to share the render target and limit resolution to 1.4x for 4 GB GPUs.

bgt_6_29计算机生成了可选文字: Scalar MSAA 8х Aperture 2.0 2.0 776 мв 1.4 684 мв 1.2 502 мв 1.0 348 мв 1.1 117 мв 234 мв 1.0 194 мв 128 мв Resolution 3024х3360 3024х3360 2116х2352 1814х2016 1512х1680 1663х1848 1512х1680 1224х1360 GPU Метогу 1 Еуе = Color + Depth + Resolve 698 мв GPU Метогу 2 Eyes = Color + Depth + Resolve 1,396 мв 0.81 388 мв 342 мв 251 мв 174 мв 97 мв 64 МВ

For a 2.0 resolution scalar, we require 698 MB + 117 MB = 815 MB per eye.

Valve’s Unity Rendering Plugin

Valve unity中使用的是自定义的渲染插件,该插件即将免费开放给大家用且开源。

The plugin is a single-pass forward renderer (because we want 4xMSAA and 8xMSAA) supporting up to 18 dynamic shadowing lights and Adaptive Quality

CPU GPU 性能解耦

前提条件是你的渲染线程需要自治。

如果你的CPU还没有准备好新的一帧的渲染内容,那么渲染线程根据HMD pose信息和dynamic resolution的设置信息修改并重新提交上一帧的GPU工作任务给GPU

【这边讲动画judder去除的需要再看下视频】

Then you can plan to run your CPU at 1/2 or 1/3 GPU framerate to do more complex simulation or run on lower end CPUs


总结

  • Multi-GPU support should be in all VR engines (at least 2-GPUs)
  • Fixed Foveated Rendering and Radial Density Masking are solutions that help counteract the optics vs projection matrix battle
  • Adaptive Quality scales fidelity up and down while leaving 10% of the GPU available for other processes. Do not rely on reprojection to hit framerate on your min spec!
  • Valve VR Rendering Plugin for Unity will ship free soon
  • Think about how your engine can decouple CPU and GPU performance with resubmission on your render thread

Optimizing the Unreal Engine 4 Renderer for VR

https://developer.oculus.com/blog/introducing-the-oculus-unreal-renderer/

 

For Farlands, the Oculus team wrote an experimental, fast, single-pass forward renderer for Unreal Engine. It’s also used in Dreamdeck and the Oculus Store version of Showdown. We’re sharing the renderer’s source as a sample to help developers reach higher quality levels and frame rates in their own applications. As of today, you can get it as an Unreal developer from https://github.com/Oculus-VR/UnrealEngine/tree/4.11-ofr.

【Oculus团队写了一个试验性的,快速的,单pass forward renderer的unreal engine工具,在这里我们分享出来见github,这工具已经应用在了Dreamdecks等Oculus应用上了】

 

Rendering immersive VR worlds at a solid 90Hz is complex and technically challenging. Creating VR content is, in many ways, unlike making traditional monitor-only content—it brings us a stunning variety of new interactions and experiences, but forces developers to re-think old assumptions and come up with new tricks. The recent wave of VR titles showcase the opportunities and ingenuity of developers.

【渲染沉浸式的VR世界保证帧率是一件非常有挑战性的事情。渲染VR内容不像是传统的显示器渲染,交互的创新带来了很多改变。这对于渲染来说带来的就是去重新审视过去的一些技术的选择,想说的就是适合屏幕渲染的技术不一定还继续适合VR渲染,这里重新来考虑一些技术的比较。】

 

As we worked, we re-evaluated some of the traditional assumptions made for VR rendering, and developed technology to help us deliver high-fidelity content at 90Hz. Now, we’re sharing some results: an experimental forward renderer for Unreal Engine 4.11.

【我们的工作就是来重新考虑这些旧有技术对于VR的价值,下面就是分享一些实验结果。】

 

We’ve developed the Oculus Unreal Renderer with the specific constraints of VR rendering in mind. It lets us more easily create high-fidelity, high-performance experiences, and we’re eager to share it with all UE4 developers.

【我们开发了一个独立的VR内容渲染器,可以获得更高效的渲染结果,见github.】

 

Background

 

As the team began production on Farlands, we took a moment to reflect on what we learned with the demo experiences we showed at Oculus Connect, GDC, CES, and other events. We used Unreal Engine 4 exclusively to create this content, which provided us with an incredible editing environment and a wealth of advanced rendering features.

【我们团队是使用Unreal开发Farlands的,相关内容已经在各大展会分享过,不作具体介绍】

 

Unfortunately, the reality of rendering to Rift meant we’d only been able to use a subset of these features. We wanted to examine those we used most often, and see if we could design a stripped-down renderer that would deliver higher performance and greater visual fidelity, all while allowing the team to continue using UE4’s world-class editor and engine. While the Oculus Unreal Renderer is focused on the use cases of Oculus applications, it’s been retrofit into pre-existing projects (including Showdown and Oculus Dreamdeck) without needing major content work. In these cases, it delivered clearer visuals, and freed up enough GPU headroom to enable additional features or increase resolution 15-30%.

【Ue4很好用但是相对来说渲染性能对于VR程序来说还有可以针对性优化的空间来提升效率并获得更好的渲染结果】

bgt_5_1

Comparison at high resolution: The Oculus Unreal Renderer runs at 90fps while Unreal’s default deferred renderer is under 60fps.

【oculus采用 forward 渲染效率秒杀Unreal 默认的 defered渲染】

 

The Trouble With Deferred VR

 

【这边相关的基础知识可以见Base里面讲述forward/defered rendering的内容】

 

Unreal Engine is known for its advanced rendering feature set and fidelity. So, what was our rationale for changing it for VR? It mostly came down our experiences building VR content, and the differences rendering to a monitor vs Rift.

【UE本身包含大量功能,我们要做的就是选择合适的应用到VR渲染。】

 

When examining the demos we’d created for Rift, we found most shaders were fairly simple and relied mainly on detailed textures with few lookups and a small amount of arithmetic. When coupled with a deferred renderer, this meant our GBuffer passes were heavily texture-bound—we read from a large number of textures, wrote out to GBuffers, and didn’t do much in between.

【VR更高的分辨率要求如果采用defered rendering带来的是对GBuffer数据传输的超高要求】

 

We also used dynamic lighting and shadows sparingly and leaned more heavily on precomputed lighting. In practice, switching to a renderer helped us provide a more limited set of features in a single pass, yielded better GPU utilization, enabled optimization, removed bandwidth overhead, and made it easier for us to hit 90 Hz.

【我们尽量少的使用动态光照和阴影,取而代之的是使用预计算光照。在使用中使用我们提供的渲染器限制了single pass的一些功能,开启了必要的优化关闭了大量的无效功能,最终有助于提升帧率。】

 

We also wanted to compare hardware accelerated multi-sample anti-aliasing (MSAA) with Unreal’s temporal antialiasing (TAA). TAA works extremely well in monitor-only rendering and is a very good match for deferred rendering, but it causes noticeable artifacts in VR. In particular, it can cause judder and geometric aliasing during head motion. To be clear, this was made worse by some of our own shader and vertex animation tricks. But it’s mostly due to the way VR headsets function.

【我们还想要比较的是硬件加速的MSAA和unreal提供的TAA的效果。】

【TAA对于显示器终端的渲染效果非常好且可以很好的配合deferred rendering,但是在VR渲染中使用明显让人感觉到假像。在head motion的过程中会导致judder和geometric aliasing. 】

 

Compared to a monitor, each Rift pixel covers a larger part of the viewer’s field of view. A typical monitor has over 10 times more pixels per solid angle than a VR headset. Images provided to the Oculus SDK also pass through an additional layer of resampling to compensate for the effects of the headset’s optics. This extra filtering tends to slightly over-smooth the image.

【相比较显示器,头盔的每一个像素覆盖的真实范围视觉比较大。Oculus SDK通过一额外的层来resampling补偿来使得最终的效果更平滑】

 

All these factors together contribute to our desire to preserve as much image detail as possible when rendering. We found MSAA to produce sharper, more detailed images that we preferred.

【所有的这些都是为了使最终的渲染效果更加的细腻,而我们发现MSAA提供的效果更佳的shaper,可以保留更多的细节。】

bgt_5_2

Deferred compared with forward. Zoom in to compare.

 

A Better Fit With Forward

 

Current state-of-the-art rendering often leverages(杠杆) screen-space effects, such as screen-space ambient occlusion (SSAO) and screen-space reflections (SSR). Each of these are well known for their realistic and high-quality visual impact, but they make tradeoffs that aren’t ideal in VR. Operating purely in screen-space can introduce incorrect stereo disparities (differences in the images shown to each eye), which some find uncomfortable. Along with the cost of rendering these effects, this made us more comfortable forgoing support of those features in our use case.

【现在的渲染方式通过采用屏幕空间的一些方式来达到更好的效果,比如SSAO,SSR. 但是这些方法都无法直接在VR渲染上面采用。】

 

Our decision to implement a forward renderer took all these considerations into account. Critically, forward rendering lets us use MSAA for anti-aliasing, adds arithmetic(算数) to our texture-heavy shaders (and removes GBuffer writes), removes expensive full-screen passes that can interfere with(干扰) asynchronous timewarp, and—in general—gives us a moderate speedup over the more featureful deferred renderer. Switching to a forward renderer has also allowed the easy addition of monoscopic(单视场) background rendering, which can provide a substantial performance boost for titles with large, complex distant geometry. However, these advantages come with tradeoffs that aren’t right for everyone. Our aim is to share our learnings with VR developers as they continue fighting to make world-class content run at 90Hz.

【我们决定采用一种把上面这些因素考虑在内的forward renderer。 采用MSAA,texture-heavy shader,去掉了full-screen passes(会干扰异步timewarp),还有增加了forward renderer  支持的 monoscopic(单视场) background rendering(就是说原理相机的背景部分不用渲染两次,而是渲染一次同时提交给左右眼,Oculus的SDk里面有。)】

 

Our implementation is based on Ola Olsson’s 2012 HPG paper, Clustered Deferred and Forward Shading. Readers familiar with traditional forward rendering may be concerned about the CPU and GPU overhead of dynamic lights when using such a renderer. Luckily, modern approaches to forward lighting do not require additional draw calls: All geometry and lights are rendered in a single pass (with an optional z-prepass). This is made possible by using a compute shader to pre-calculate which lights influence 3D “clusters” of the scene (subdivisions of each eye’s viewing frustum, yielding a frustum-voxel grid). Using this data, each pixel can cheaply determine a list of lights that has high screen-space coherence, and perform a lighting loop that leverages the efficient branching capability of modern GPUs. This provides accurate culling and efficiently handles smaller numbers of dynamic lights, without the overhead of additional draw calls and render passes.

【这里的实现是 forward+ 的方法,具体内容见2012年的论文,相关基本的概念见我总结的三种渲染方式的比较。这边后面讲的就是forward+的基本原理:通过与处理来挑选对每个pixel有较大影响的光源,在后面处理的时候只考虑这几个光照,就是light-culling的意思。】

bgt_5_3

(Visualization of 3D light grid, illustrating the lighting coherence and culling)

 

Beyond the renderer, we’ve modified UE4 to allow for additional GPU and CPU optimizations. The renderer is provided as an unmaintained sample and not an officially-supported SDK, but we’re excited to give projects using Unreal Engine’s world-class engine and editor additional options for rendering their VR worlds.

【我们搞了个UE4的版本大家可以试试。】

 

You can grab it today from our Github repository as an Unreal Developer at https://github.com/Oculus-VR/UnrealEngine/tree/4.11-ofr. To see it in action, try out Farlands, Dreamdeck, and Showdown.