Category: VR Rendering

VR下左右眼UI排序不一样

 
 

首先确认问题出在哪里:经过测试只有出现在 SolarMeshWidgetComponent 这个相关。

 
 

然后想了一下,有可能是是左右眼的深度处理不一致,有这样的情况么?

可能一:

https://answers.unrealengine.com/questions/88032/bug-fresnel-on-vr-oculus-lr-eye-different.html

https://forums.unrealengine.com/showthread.php?45945-Translucent-Water-Material-Displacement-Banding-Issue

CameraVector. This is the world space camera direction. Since virtual reality glasses use 2 separate camera’s ingame, with both a slightly different direction, you won’t get the same result.

 
 

有遇到相似的左右眼不一致的情况有如下:

Fresnel 效果不一致:

https://answers.unrealengine.com/questions/88032/bug-fresnel-on-vr-oculus-lr-eye-different.html

LOD 效果不一致:

https://answers.unrealengine.com/questions/584083/eyes-sometimes-show-different-lods-in-vr.html

 
 

解决方案:根据actor到摄像机的距离手动排序

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

Getting Started with VR in Unreal Engine 4

http://www.tomlooman.com/getting-started-with-vr/

 
 

 
 

This guide is for anyone who is looking to get into developing for Virtual Reality projects in Unreal Engine 4. Covering Blueprint, C++, performance considerations and how to set up your VR kits for UE4.

 
 

I highly recommend using the latest release of Unreal Engine 4 as VR is still being improved greatly with each new release.

 
 

A few good places to reference are the official Oculus forums, the official VR documentation pages and the Unreal Engine VR Subforums.

 
 

If you are looking for VR Templates to get you started right away, go to myGitHub Repository. These templates are currently WIP for both C++ and Blueprint with a variety of camera and motion controller features and include the performance optimizations we will discuss in this guide. The official VR Template is currently being developed and will be released as soon as possible!

 
 

 
 

 
 

Setup your VR Device

 
 

For this guide I will assume you have successfully installed your head-mounted display of choice (Visit Oculus Rift Setup or HTC Vive Pre Setup in case you did not). In case you are having difficulties getting your Vive to work, I found this Troubleshooting guide to be helpful.

 
 

Unreal Engine 4 supports all the major devices and you don’t need to perform any hassle to setup your game project for VR. Just make sure that the correct plugins are loaded for your HMD under Edit > Plugins. There are some performance considerations to take into account, we’re covering these later in the guide.

【UE4已经支持所有主流VR设备,一般来说可直接使用,不行的话看一下VR-Plugin里面是否启用。】

 
 


 
 

Before you launch the editor make sure your VR software is running, in the case of the HTC Vive this is the SteamVR app.

【运行editor前确保VR软件开启】

 
 

 
 

 
 

Launching VR Preview

 
 

Testing out your VR set is very straightforward, simply select “VR Preview” from the Play drop-down button. By default the head tracking will work right away without any changes to your existing project or template. I will go into more detail on how to add additional features such as motion controller setup and origin resetting etc. later on in this guide.

【直接VR Preview就可以看到效果了】

 
 


 
 

 
 

 
 

VR Best Practices

 
 

VR is still a relatively unexplored area, and we are learning new things with every day of development. Both Oculus and Epic Games have set up a Best Practices Guide that I recommend you read through and keep in the back of your head while developing games for VR.

【推荐先阅读一下各家的Best Practices Guide】

 
 

 
 

 
 

Using VR in Blueprint

 
 

Using VR in Blueprint is very straightforward and you don’t need a lot of set up to get yourself going.

 
 

You will need a Camera Component and optionally one or two Motion Controllers Components. By default your Camera is already set up for HMD support, if you wish to disable rotation changes from the HMD you can disable “Lock to HMD” in the Component’s properties. For more information on the Motion Controllers you can jump down in this guide or immediately jump to the official documentation page on how to Setup Motion Controllers.

【默认相机就已经启用HMD控制,但是可以选择去掉,motion controller也可以直接加上。】

 
 

Here is a (non-exhaustive list) of the available nodes in Blueprint:

 
 


 
 

To reset your HMD position and/or orientation (With optional Yaw offset):

【重置】

 
 


 
 

To selectively enable features when using VR you can easily check whether your HMD is enabled:

【是否启用Vr的检查】

 
 


 
 

 
 

 
 

SteamVR Chaperone

 
 

The Chaperone component is specific to SteamVR and has easy access to the soft bounds. The soft bounds are represented as an array of Vectors centered around the calibrated HMD’s Origin (0,0,0). The Z component of the Vectors is always zero. You can add this component like any other ActorComponent to your Blueprint as seen below.

 
 


 
 

 
 

 
 

USteamVRChaperoneComponent

 
 

To use the chaperone in C++ open up your ProjectName.Build.cs and add the “SteamVR” module to the PrivateDependencyModuleNames array. See below for a sample.

【C++启用SteamVR模块来开启SteamVR Chaperone】

 
 

using UnrealBuildTool;

 

public class VRFirstPerson : ModuleRules

{

public VRFirstPerson(TargetInfo Target)

{

PublicDependencyModuleNames.AddRange(new string[] { “Core”, “CoreUObject”, “Engine”, “InputCore” });

 

/* VR Required Modules */

PrivateDependencyModuleNames.AddRange(new string[] { “HeadMountedDisplay” , “SteamVR” });

}

}

 
 

 
 

 
 

Setup Motion Controllers

 
 

The official documentation has a good introduction on Motion Controller setup and input handling, so if your VR Device supports motion controllers I recommend following along with the documentation. For a practical example check out my VR Templates on GitHub.

【官方文档和我的例子里面都有好的demo可以直接学习使用】

 
 

If you’re having trouble aligning your Motion Controllers with the Camera, simply use a SceneComponent as “VROrigin”, this is especially helpful when the root component has an undesirable pivot like the CapsuleComponent in a Character Blueprint.

【建议使用VR的时候相关内容成立一个场景的独立组件,如下图的结构】

 
 


 
 

 
 

 
 

 
 

Using VR in C++

 
 

As of 4.11 not all functionality is exposed to Blueprint, if you are looking to do more advanced custom setups you might need to dig into C++ to adjust a few settings. Check out the IHeadMountedDisplay.h for a look at the available functions. Certain plugins add additional features likeSteamVRChaperoneComponent but are specific to a single device.

【UE4没有将所有的功能都暴露在blueprint里面,因此对于高级功能还是要使用C++来处理, IHeadMountedDisplay.h 展示了可用功能。】

 
 

 
 

 
 

Required Modules & Includes

 
 

If you wish to access the HMD features through C++ you need to include the“HeadMountedDisplay” Module in your ProjectName.Build.cs file you can find in your Visual Studio solution explorer. Here is an example of the build file from the VRFirstPerson project.

【首先一样要导入相关模块,下面是第一人称VR游戏的例子】

 
 

using UnrealBuildTool;

 

public class VRFirstPerson : ModuleRules

{

public VRFirstPerson(TargetInfo Target)

{

PublicDependencyModuleNames.AddRange(new string[] { “Core”, “CoreUObject”, “Engine”, “InputCore” });

 

/* VR Module */

PrivateDependencyModuleNames.AddRange(new string[] { “HeadMountedDisplay” });

 

// …

}

}

 

To use HMD features or the motion controller component, make sure you include the following header files.

【还有相关头文件】

 
 

/* VR Includes */

#include “HeadMountedDisplay.h”

#include “MotionControllerComponent.h”

 
 

 
 

 
 

 
 

Performance Considerations

 
 

For the whole VR experience to look smooth, your game needs to run on 75 hz (Oculus DK2) or even 90 hz. (HTC Vive and Oculus CV1) depending on your device. To see your current framerate type in “stat fps” or “stat unit” (for more detailed breakdown) in your console when running the game.

 
 

 
 

 
 

CPU Profiling

 
 

Your game might be CPU or GPU bound, to find out you need to measure (a quick way is to use “stat unit”). With the complexity of current gen games and engines it’s near impossible to make good guesses on what’s bottlenecking your performance so use the tools at your disposal! Bob Tellez wrote a blog post on CPU Profiling with Unreal Engine 4 and it’s a good place to get started.

【关于UE4 VR CPU方面的剖析可以看看】

 
 

 
 

 
 

GPU Profiling

 
 

To capture a single frame with GPU timings press Ctrl+Shift+, or type in “profilegpu” in the console. This command dumps accurate timings of the GPU, you will find that certain processes are a heavy burden on the framerate (Ambient Occlusion is one common example) when using VR.

【Ctrl+Shift+, 来显示单帧GPU渲染时间,通过这可以来分析性能相关问题。】

 
 


 
 

The GPU Profiling & Performance and Profiling docs are a good place to learn about profiling your game.

【相关文档】

 
 

While profiling you might stumble(绊倒) on other costly features depending on your scene and project. One example is the Translucent Lighting Volume you may not need but even when unused it adds a static cost to your scene, check out this

 AnswerHub post by Daniel Wright for more info on how to disable this feature. All that is left for you to do is measure and test, there is no single configuration that is perfect for all projects.

【有些无用渲染特征可以关闭,如何关闭渲染feature详见上文】

 
 

The developers from FATED came up with a great list of tips in their quest for optimized VR. A few examples they mention are to disable HZB Occlusion Culling (r.HZBOcclusion 0), Motion Blur (r.DefaultFeature.MotionBlur=False) and Lens Flares (r.DefaultFeature.LensFlare=False). The commands do not persist through multiple sessions, so you should add (or search and replace) them in your /Config/DefaultEngine.ini config file although most of these settings are available through Edit > Project Settings… > Rendering.

【要关闭哪些相关VR无用渲染设置可以见这份tips。】

 
 

Another great optimization to consider is the Instanced Stereo Rendering, we’ll talk about that next.

 
 

 
 

 
 

Instanced Stereo Rendering

 
 

The latest 4.11 release introduces Instanced Stereo Rendering, check the video below for a comparison video of how that works.

【新版支持的一个非常关键的渲染能力】

 
 

https://www.youtube.com/watch?v=uTUwKC7GXjo

(“Basically, we’re utilizing hardware instancing to draw both eyes simultaneously with a single draw call and pass through the render loop. This cuts down render thread CPU time significantly and also improves GPU performance. Bullet Train was seeing ~15 – 20% CPU improvement on the render thread and ~7 – 10% improvement on the GPU.” – Ryan Vance.)

 
 

To enable this feature in 4.11 and above, go to your Project Settings and look for “Instanced Stereo” under the Rendering category.

【参加下图来开启/关闭该功能】

 
 


 
 

 
 

 
 

Disable Heavy Post-Processors

 
 

Certain post-processing effects are very costly in VR like Ambient Occlusion. Others may even become an annoyance in VR like Lens Flares as they may break your immersion of being present in the scene and instead looking through a camera. These are easy examples to get started and see how it affects your game and performance.

【很多非常耗性能的后处理工作需要关闭】

 
 

To disable post processing features on a project level, go to Edit > Project Settings > Rendering. You can do the same thing in post-processing volumes. Keep in mind that post-processing volumes can override the project-wide settings specified below.

【怎么做,参考下图】

 
 


 
 

 
 

 
 

Reduce Scene Complexity

 
 

With current gen hardware it’s really difficult to stay on your 90 fps target. You may need to revisit your previous traditional constraints and look at your scene complexity like dynamic shadows, atmospheric smoke effects and polycount of meshes.

【减少场景复杂度是提高性能的一大利器。】

 
 

It’s important to minimize overdraw to keep performance at a maximum. Lots of translucent surfaces and/or particle effects can easily cause your framerate to tank. To visualize the current shader complexity / overdraw press Alt+8 in your viewport (Alt+4 to return to default view). Look at the bottom picture from the Elemental Demo to get an idea of how much the atmospheric effects can impact your framerate (green = good, red = bad, white hot = extremely bad at about 2000 shader instructions per pixel)

【Alt+4 default view; Alt+8 visualize the current shader complexity / overdraw. 查看overdraw的性能,绿色表示好。】

 
 

Dynamic shadows and lights have a huge impact on performance too. Bake as much lighting as you can to keep the per-frame cost as low as possible.

【动态光照也是一个性能问题】

 
 


 
 

 
 

 
 

 
 

List of Rendering Commands

 
 

The excellent talk by Nick Whiting and Nick Donaldson contains a list of render commands to use for GPU optimization in VR. You can find the list below. I recommend watching their talk regardless as it contains great info on the basics of Virtual Reality in general.

【这文章包含一系列的渲染命令来做用于VR渲染优化,强烈建议看看。】

 
 

To test out these commands hit ~ (Tilde) to open the command console. Once you settled on a command to be included for your project, you can add them to your configuration in /Config/DefaultEngine.ini under [/Script/Engine.RendererSettings]. Tip: Check if the command exists in the list before adding it yourself.

【相关设置你可以加入你的代码工程设置,做法如上述。】

 
 

r.SeparateTranslucency=0

r.HZBOcclusion=0

r.FinishCurrentFrame=1

r.MotionBlurQuality=0

r.PostProcessAAQuality=3

r.BloomQuality=1

r.EyeAdaptionQuality=0

r.AmbientOcclusionLevels=0

r.DepthOfFieldQuality=0

r.SceneColorFormat=2

r.TranslucentLightingVolume 0

r.TranslucencyVolumeBlur=0

r.TranslucencyLightingVolumeDim=4

r.MaxAnisotropy=8

r.LensFlareQuality=0

r.SceneColorFringeQuality=0

r.FastBlurThreshold=0

r.SSR.MaxRoughness=0

r.SSR.Quality=0

r.rhicmdbypass=0

r.TiledReflectionEnvironmentMinimumCount=10

 
 

 
 

 
 

Troubleshooting

 
 

  •  
     

    Vive Specific (Non-PRE editions): Once you launched the editor, SteamVR may state “Not Ready” this means something may be overlapping and preventing the Compositor screen to run at more than 60 FPS causing jittering and motion sickness. More information and workaround for this issue can be found on this AnswerHub Thread! The next iteration of Vive devices (Vive PRE) no longer have this issue as they moved to direct mode for the displays, for this make sure you updated your graphics drivers to support direct mode.

     
     


     
     

     
     

     
     

     
     

    References

     
     

  • Official Documentation Main Page
  • VR Cheat Sheet
  • Unreal Engine VR Subforums
  • Unreal Engine VR Playlist
  •  
     

    Hopefully this guide has helped you get started with Virtual Reality project!

    If you have a question or feel that I missed something important, let me know by leaving a reply below! To stay in touch, follow me on Twitter!

     
     

     
     

     
     

     
     

     
     

     
     

     
     

     
     

     
     

     
     

     
     

     
     

     
     

     
     

     
     

     
     

     
     

     
     

     
     

Oculus Optimizing the Unreal Engine 4 Renderer for VR

https://developer.oculus.com/blog/introducing-the-oculus-unreal-renderer/

 
 

For Farlands, the Oculus team wrote an experimental, fast, single-pass forward renderer for Unreal Engine. It’s also used in Dreamdeck and the Oculus Store version of Showdown. We’re sharing the renderer’s source as a sample to help developers reach higher quality levels and frame rates in their own applications. As of today, you can get it as an Unreal developer from https://github.com/Oculus-VR/UnrealEngine/tree/4.11-ofr.

【Oculus团队写了一个试验性的,快速的,单pass forward renderer的unreal engine工具,在这里我们分享出来见github,这工具已经应用在了Dreamdecks等Oculus应用上了】

 
 

Rendering immersive VR worlds at a solid 90Hz is complex and technically challenging. Creating VR content is, in many ways, unlike making traditional monitor-only content—it brings us a stunning variety of new interactions and experiences, but forces developers to re-think old assumptions and come up with new tricks. The recent wave of VR titles showcase the opportunities and ingenuity of developers.

【渲染沉浸式的VR世界保证帧率是一件非常有挑战性的事情。渲染VR内容不像是传统的显示器渲染,交互的创新带来了很多改变。这对于渲染来说带来的就是去重新审视过去的一些技术的选择,想说的就是适合屏幕渲染的技术不一定还继续适合VR渲染,这里重新来考虑一些技术的比较。】

 
 

As we worked, we re-evaluated some of the traditional assumptions made for VR rendering, and developed technology to help us deliver high-fidelity content at 90Hz. Now, we’re sharing some results: an experimental forward renderer for Unreal Engine 4.11.

【我们的工作就是来重新考虑这些旧有技术对于VR的价值,下面就是分享一些实验结果。】

 
 

We’ve developed the Oculus Unreal Renderer with the specific constraints of VR rendering in mind. It lets us more easily create high-fidelity, high-performance experiences, and we’re eager to share it with all UE4 developers.

【我们开发了一个独立的VR内容渲染器,可以获得更高效的渲染结果,见github.】

 
 

Background

 
 

As the team began production on Farlands, we took a moment to reflect on what we learned with the demo experiences we showed at Oculus Connect, GDC, CES, and other events. We used Unreal Engine 4 exclusively to create this content, which provided us with an incredible editing environment and a wealth of advanced rendering features.

【我们团队是使用Unreal开发Farlands的,相关内容已经在各大展会分享过,不作具体介绍】

 
 

Unfortunately, the reality of rendering to Rift meant we’d only been able to use a subset of these features. We wanted to examine those we used most often, and see if we could design a stripped-down renderer that would deliver higher performance and greater visual fidelity, all while allowing the team to continue using UE4’s world-class editor and engine. While the Oculus Unreal Renderer is focused on the use cases of Oculus applications, it’s been retrofit into pre-existing projects (including Showdown and Oculus Dreamdeck) without needing major content work. In these cases, it delivered clearer visuals, and freed up enough GPU headroom to enable additional features or increase resolution 15-30%.

【Ue4很好用但是相对来说渲染性能对于VR程序来说还有可以针对性优化的空间来提升效率并获得更好的渲染结果】

 
 


Comparison at high resolution: The Oculus Unreal Renderer runs at 90fps while Unreal’s default deferred renderer is under 60fps.

【oculus采用 forward 渲染效率秒杀Unreal 默认的 defered渲染】

 
 

The Trouble With Deferred VR

 
 

【这边相关的基础知识可以见Base里面讲述forward/defered rendering的内容】

 
 

Unreal Engine is known for its advanced rendering feature set and fidelity. So, what was our rationale for changing it for VR? It mostly came down our experiences building VR content, and the differences rendering to a monitor vs Rift.

【UE本身包含大量功能,我们要做的就是选择合适的应用到VR渲染。】

 
 

When examining the demos we’d created for Rift, we found most shaders were fairly simple and relied mainly on detailed textures with few lookups and a small amount of arithmetic. When coupled with a deferred renderer, this meant our GBuffer passes were heavily texture-bound—we read from a large number of textures, wrote out to GBuffers, and didn’t do much in between.

【VR更高的分辨率要求如果采用defered rendering带来的是对GBuffer数据传输的超高要求】

 
 

We also used dynamic lighting and shadows sparingly and leaned more heavily on precomputed lighting. In practice, switching to a renderer helped us provide a more limited set of features in a single pass, yielded better GPU utilization, enabled optimization, removed bandwidth overhead, and made it easier for us to hit 90 Hz.

【我们尽量少的使用动态光照和阴影,取而代之的是使用预计算光照。在使用中使用我们提供的渲染器限制了single pass的一些功能,开启了必要的优化关闭了大量的无效功能,最终有助于提升帧率。】

 
 

We also wanted to compare hardware accelerated multi-sample anti-aliasing (MSAA) with Unreal’s temporal antialiasing (TAA). TAA works extremely well in monitor-only rendering and is a very good match for deferred rendering, but it causes noticeable artifacts in VR. In particular, it can cause judder and geometric aliasing during head motion. To be clear, this was made worse by some of our own shader and vertex animation tricks. But it’s mostly due to the way VR headsets function.

【我们还想要比较的是硬件加速的MSAA和unreal提供的TAA的效果。】

【TAA对于显示器终端的渲染效果非常好且可以很好的配合deferred rendering,但是在VR渲染中使用明显让人感觉到假像。在head motion的过程中会导致judder和geometric aliasing. 】

 
 

Compared to a monitor, each Rift pixel covers a larger part of the viewer’s field of view. A typical monitor has over 10 times more pixels per solid angle than a VR headset. Images provided to the Oculus SDK also pass through an additional layer of resampling to compensate for the effects of the headset’s optics. This extra filtering tends to slightly over-smooth the image.

【相比较显示器,头盔的每一个像素覆盖的真实范围视觉比较大。Oculus SDK通过一额外的层来resampling补偿来使得最终的效果更平滑】

 
 

All these factors together contribute to our desire to preserve as much image detail as possible when rendering. We found MSAA to produce sharper, more detailed images that we preferred.

【所有的这些都是为了使最终的渲染效果更加的细腻,而我们发现MSAA提供的效果更佳的shaper,可以保留更多的细节。】

 
 


Deferred compared with forward. Zoom in to compare.

 
 

A Better Fit With Forward

 
 

Current state-of-the-art rendering often leverages(杠杆) screen-space effects, such as screen-space ambient occlusion (SSAO) and screen-space reflections (SSR). Each of these are well known for their realistic and high-quality visual impact, but they make tradeoffs that aren’t ideal in VR. Operating purely in screen-space can introduce incorrect stereo disparities (differences in the images shown to each eye), which some find uncomfortable. Along with the cost of rendering these effects, this made us more comfortable forgoing support of those features in our use case.

【现在的渲染方式通过采用屏幕空间的一些方式来达到更好的效果,比如SSAO,SSR. 但是这些方法都无法直接在VR渲染上面采用。】

 
 

Our decision to implement a forward renderer took all these considerations into account. Critically, forward rendering lets us use MSAA for anti-aliasing, adds arithmetic(算数) to our texture-heavy shaders (and removes GBuffer writes), removes expensive full-screen passes that can interfere with(干扰) asynchronous timewarp, and—in general—gives us a moderate speedup over the more featureful deferred renderer. Switching to a forward renderer has also allowed the easy addition of monoscopic(单视场) background rendering, which can provide a substantial performance boost for titles with large, complex distant geometry. However, these advantages come with tradeoffs that aren’t right for everyone. Our aim is to share our learnings with VR developers as they continue fighting to make world-class content run at 90Hz.

【我们决定采用一种把上面这些因素考虑在内的forward renderer。 采用MSAA,texture-heavy shader,去掉了full-screen passes(会干扰异步timewarp),还有增加了forward renderer 支持的 monoscopic(单视场) background rendering(就是说原理相机的背景部分不用渲染两次,而是渲染一次同时提交给左右眼,Oculus的SDk里面有。)】

 
 

Our implementation is based on Ola Olsson’s 2012 HPG paper, Clustered Deferred and Forward Shading. Readers familiar with traditional forward rendering may be concerned about the CPU and GPU overhead of dynamic lights when using such a renderer. Luckily, modern approaches to forward lighting do not require additional draw calls: All geometry and lights are rendered in a single pass (with an optional z-prepass). This is made possible by using a compute shader to pre-calculate which lights influence 3D “clusters” of the scene (subdivisions of each eye’s viewing frustum, yielding a frustum-voxel grid). Using this data, each pixel can cheaply determine a list of lights that has high screen-space coherence, and perform a lighting loop that leverages the efficient branching capability of modern GPUs. This provides accurate culling and efficiently handles smaller numbers of dynamic lights, without the overhead of additional draw calls and render passes.

【这里的实现是 forward+ 的方法,具体内容见2012年的论文,相关基本的概念见我总结的三种渲染方式的比较。这边后面讲的就是forward+的基本原理:通过与处理来挑选对每个pixel有较大影响的光源,在后面处理的时候只考虑这几个光照,就是light-culling的意思。】

 
 


(Visualization of 3D light grid, illustrating the lighting coherence and culling)

 
 

Beyond the renderer, we’ve modified UE4 to allow for additional GPU and CPU optimizations. The renderer is provided as an unmaintained sample and not an officially-supported SDK, but we’re excited to give projects using Unreal Engine’s world-class engine and editor additional options for rendering their VR worlds.

【我们搞了个UE4的版本大家可以试试。】

 
 

You can grab it today from our Github repository as an Unreal Developer at https://github.com/Oculus-VR/UnrealEngine/tree/4.11-ofr. To see it in action, try out Farlands, Dreamdeck, and Showdown.

 
 

 
 

 
 

 
 

 
 

 
 

The Vanishing of Milliseconds: Optimizing the UE4 renderer for Ethan Carter VR

原文:

https://medium.com/@TheIneQuation/the-vanishing-of-milliseconds-dfe7572d9856#.auamge3rg

 
 


 
 

As a game with very rich visuals, The Vanishing of Ethan Carter (available for the

 Oculus Rift and Steam VR ) has been a difficult case for hitting the VR performance targets. The fact that its graphics workload is somewhat uncommon for Unreal Engine 4 (and, specifically, largely dissimilar to existing UE4 VR demos) did not help. I have described the reasons for that at length in

a previous post; the gist of it, however, is that The Vanishing of Ethan Carter’s game world is statically lit in some 95% of areas, with dynamic lights appearing only in small, contained, indoors.

【The Vanishing of Ethan Carter 作为一款视觉效果很不错的VR游戏,在性能方面达标非常非常的难。事实上要很好的去优化渲染的话,你要对整块知识有足够的了解,这个游戏的渲染工作流和基本的UE4渲染工作流区别也很大,主要的在于大量的采用了静态光照,只有在很少的地方采用动态光源。】

 
 

Important note: Our (The Astronauts’) work significantly(显著的) pre-dates Oculus VR’s UE4 renderer. If we had it at our disposal back then, I would probably not have much to do for this port; but as it were, we were on our own. That said, I highly recommend the aforementioned article and code, especially if your game does not match our rendering scenario, and/or if the tricks we used simply do not work for you.

【我们的工作是在出 Oculus VR’s UE4 renderer 之前,因此我们花了大量的时间搞了一套自己的renderer。 在此作者很推荐 Oculus VR’s UE4 renderer,非常值得一看。】

 
 

Although the studied case is a VR title, the optimizations presented are mostly concerned with general rendering and may be successfully applied to other titles; however, they are closely tied to the UE4 and may not translate well to other game engines.

【尽管在优化的时候我们尽量考虑通用性,但事实上优化本身就是一个特殊化的过程,很多优化内容和实际使用场景息息相关。】

 
 

There are Github links in the article. Getting a 404 error does not mean the link is dead — you need to have your Unreal Engine and Github accounts connected  to see UE4 commits.

【请将Github帐号关联UE4后才能打开这篇文章的一些链接。】

 
 

 
 

Show me the numbers

 
 

To whet the reader’s appetite(胃口), let us compare the graphics profile and timings of a typical frame in the PS4/Redux version to a corresponding one from the state of the VR code on my last day of work at The Astronauts:

 
 

【首先咱们来比较一般的frame以及我们优化后的数据结果。】

 
 


GPU profiles from the PS4/Redux and VR versions, side by side. Spacing has been added to have the corresponding data line up. Detailed textual log data available as Gists: PS4/Redux and VR version.

 
 


Timing graphs displayed with the STAT UNITGRAPH command, side by side.

 
 

Both profiles were captured using the UE4Editor -game -emulatestereo command line in a Development configuration, on a system with an NVIDIA GTX 770 GPU, at default game quality settings and 1920×1080 resolution (960×1080 per eye). Gameplay code was switched off using the PAUSE console command to avoid it affecting the readouts, since it is out of the scope of this article.

【上面数据的依赖规格】

 
 

As you can (hopefully) tell, the difference is pretty dramatic. While a large part of it has been due to code improvements, I must also honour the art team at The Astronauts — Adam BryaMichał Kosieradzki, Andrew Poznański, and Kamil Wojciekiewicz have all made a brilliant job of optimizing the game assets!

【结果也和艺术和代码方面相关。】

 
 

This dead-simple optimization algorithm that I followed set a theme for the couple of months following the release of Ethan Carter PS4, and became the words to live by:

 
 

  1. Profile a scene from the game.
  2. Identify expensive render passes.
  3. If the feature is not essential for the game, switch it off.
  4. Otherwise, if we can afford the loss in quality, turn its setting down.

 
 

【优化算法的思考流程:首先选定一个游戏场景,然后确定耗费资源的render passes,接着考虑关闭不重要的部分,如果可以接受效果的下降就去掉他或者调整他。】

 
 

 
 

Hitting the road to VR

 
 

The beginnings of the VR port were humble. I decided to start the transition from the PS4/Redux version with making it easier to test our game in VR mode. As is probably the case with most developers, we did not initially have enough HMDs for everyone in the office, and plugging them in and out all the time was annoying. Thus, I concluded we needed a way to emulate one.

【早期的VR接口是简陋的,而且设备也不是那么多,作为开发者对VR游戏测试会非常头疼,因此我们首先需要一个可用的仿真环境。】

 
 

Turns out that UE4 already has a handy – emulatestereo command line switch. While it works perfectly in game mode, it did not enable that Play in VR button in the editor. I hacked up the FInternalPlayWorldCommandCallbacks::PlayInVR_*() methods to also test for the presence of FFakeStereoRenderingDevice in GEngine->StereoRenderingDevice, apart from just GEngine->HMDDevice. Now, while this does not accurately emulate the rendering workload of a VR HMD, we could at least get a rough, quick feel for stereo rendering performance from within the editor, without running around with a tangle of wires and connectors. And it turned out to be good enough for the most part.

【UE4已经实现了硬件检测,可以方便的切换在不同设备上预览和显示,但是问题在于没有相关硬件就不给你 Play in VR的选项。作者的做法是 FInternalPlayWorldCommandCallbacks::PlayInVR_*() 函数里面不去做VR头盔检测,而是默认就开启VR渲染,这样就解决了设备不够的问题。】

 
 

While trying it out, Andrew, our lead artist, noticed that game tick time is heavily impacted by having miscellaneous editor windows open. This is most probably the overhead from the editor running a whole lot of Slate UI code. Minimizing all the windows apart from the main frame, and setting the main level editor viewport to immersive mode seemed to alleviate the problem, so I automated the process and added a flag for it to ULevelEditorPlaySettings. And so, the artists could now toggle it from the Editor Preferences window at their own leisure.

【接着我们的艺术团队的头头发现编辑器窗口开着的时候也很耗性能,因为需要运行渲染大量的Ui。最好的方式就是运行的时候直接把这些最小化,把执行窗口全屏,我们把这些工作也写到工程设置里面,以便自动这样子执行。】

 
 

These changes, as well as several of the others described in this article, may be viewed in my fork of Unreal Engine on Github (reminder: you need to have your Unreal Engine and Github accounts connected to see UE4 commits).

 
 

 
 

Killing superfluous renderer features

 
 

Digging(挖掘) for information on UE4 in VR, I discovered that Nick Whiting and Nick Donaldson from Epic Games have delivered an interesting presentation at Oculus Connect, which you can see below.

【关于 UE4 VR方面内容的讲解,下面这篇talk讲的不错大家可以看一下。】

 
 

https://www.youtube.com/watch?v=0oM6Xe7fT-8

 
 

Around the 37 minute mark is a slide which in my opinion should not have been a “bonus”, as it contains somewhat weighty information. It made me realize that, by default, Unreal’s renderer does a whole bunch of things which are absolutely unnecessary for our game. I had been intellectually aware of it beforehand, but the profoundness of it was lost on me until that point. Here is the slide in question:

【这篇内容让我们意识到UE4有大量的预置的耗性能的选项是我们的游戏所不需要的,我们需要去考虑如何去做取舍。这里37分钟的PPT给了一个测试比较符合VR开发的设置例子。】

 
 


 
 

I recommend going over every one of the above console variables in the engine source and seeing which of their values makes most sense in the context of your project. From my experience, their help descriptions are not always accurate or up to date, and they may have hidden side effects. There are also several others that I have found useful and will discuss later on.

【我们非常推荐你去引擎仔细看上面的每一个选项的内容与设置会对你的游戏的影响。以我们的经验,别人的设置不一定适合当前的你的需求。】

 
 

It was the first pass of optimization, and resulted in the following settings — an excerpt from our DefaultEngine.ini:

【给一个我们使用的设置如下】

 
 

[SystemSettings]

r.TranslucentLightingVolume=0

r.FinishCurrentFrame=0

r.CustomDepth=0

r.HZBOcclusion=0

r.LightShaftDownSampleFactor=4

r.OcclusionQueryLocation=1

[/Script/Engine.RendererSettings]

r.DefaultFeature.AmbientOcclusion=False

r.DefaultFeature.AmbientOcclusionStaticFraction=False

r.EarlyZPass=1

r.EarlyZPassMovable=True

r.BasePassOutputsVelocity=False

 
 

The fastest code is that which does not run

May I remind you that Ethan Carter is a statically lit game; this is why we could get rid of translucent lighting volumes and ambient occlusion (right with its static fraction), as these effects were not adding value to the game. We could also disable the custom depth pass for similar reasons.

【这里要提醒你的是Ethan Carter是一个静态光源的游戏,因此我们不采用 translucent lighting volumes and ambient occlusion 和 custom depth pass】

 
 

Trade-offs

On most other occasions, though, the variable value was a result of much trial and error, weighing a feature’s visual impact against performance.

【在不同的场合对上面的数值都要好好的去权衡。】

 
 

One such setting is r.FinishCurrentFrame, which, when enabled, effectively creates a CPU/GPU sync point right after dispatching a rendering frame, instead of allowing to queue multiple GPU frames. This contributes to improving motion-to-photon latency at the cost of performance, and seems to have originally been recommended by Epic (see the slide above), but they have backed out of it since (reminder: you need to have your Unreal Engine and Github accounts connected to see UE4 commits). We have disabled it for Ethan Carter VR.

【r.FinishCurrentFrame 这一个选项如果打开则 每一帧都会创建一个 CPU/GPU 同步的点光源来渲染,而不是允许光源在GPU复用。这个是以性能为代价降低对光子运动处理的延迟,但是是作为UE4的推荐设置默认打开的,这里我们将它关闭了。】

 
 

The variable r.HZBOcclusion controls the occlusion culling algorithm. Not surprisingly, we have found the simpler, occlusion query-based solution to be more efficient, despite it always being one frame late and displaying mild popping artifacts. So do others.

【r.HZBOcclusion 控制的是遮挡剔除的算法,我们发现occlusion query-based solution尽管有效的瑕疵但是效率更高。】

 
 

Related to that is the r.OcclusionQueryLocation variable, which controls the point in the rendering pipeline at which occlusion queries are dispatched. It allows balancing between more accurate occlusion results (the depth buffer to test against is more complete after the base pass) against CPU stalling (the later the queries are dispatched, the higher the chance of having to wait for query results on the next frame). Ethan Carter VR’s rendering workload was initially CPU-bound (we were observing randomly occurring stalls several milliseconds long), so moving occlusion queries to before base pass was a net performance gain for us, despite slightly increasing the total draw call count (somewhere in the 10–40% region, for our workload).

【r.OcclusionQueryLocation 控制顶点在渲染流水线遮挡处理中的处理顺序,她控制的是遮挡处理的精度和CPU失速,原因是越精确的比对越耗时,而整个过程需要CPU等待空转,就会导致变慢。我们的游戏在此采用了这个选项,虽然moving occlusion queries会导致draw call总的数量增加,但是其带来的CPU性能的释放会给游戏总的性能增益。】

 
 


Left eye taking up more than twice the time? That is not normal.

 
 

Have you noticed, in our pre-VR profile data, that the early Z pass takes a disproportionately large amount of time for one eye, compared to the other? This is a tell-tale sign that your game is suffering from inter-frame dependency stalls, and moving occlusion queries around might help you.

【不知道你有没有注意到遮挡处理时一只眼睛的画面更耗时,如上图,这是因为你游戏没有使用帧间依赖, moving occlusion queries 可以解决这个问题。】

 
 

For the above trick to work, you need r.EarlyZPass enabled. The variable has several different settings (see the code for details); while we shipped the PS4 port with a full Z prepass (r.EarlyZPass=2) in order to have D-buffer decals working, the VR edition makes use of just opaque (and non-masked) occluders (r.EarlyZPass=1), in order to conserve computing power. The rationale was that while we end up issuing more draw calls in the base pass, and pay a bit more penalty for overshading due to the simpler Z buffer, the thinner prepass would make it a net win.

【 r.EarlyZPass 为了保证上面的设置起作用,这一个选项需要开启。= 2 的时候是为了 D-buffer 通道贴图起作用, = 1 的时候只是用来处理透明,这里就是,足够用了。】

 
 

We have also settled on bumping r.LightShaftDownSampleFactor even further up, from the default of 2 to 4. This means that our light shaft masks’ resolution is just a quarter of the main render target. Light shafts are very blurry this way, but it did not really hurt the look of the game.

【r.LightShaftDownSampleFactor 我们将这个值调的很高,默认是2,我们使用4。这意思是光照的精度处理只是一般渲染目标质量的四分之一,这样在这里也够用。】

 
 

Finally, I settled on disabling the “new” (at the time) UE 4.8 feature of r.BasePassOutputsVelocity. Comparing its performance against Rolando Caloca’s hack of injecting meshes that utilize world position offset into the velocity pass with previous frame’s timings (which I had previously integrated for the PS4 port to have proper motion blur and anti-aliasing of foliage), I found it simply outperformed the new solution in our workload.

【r.BasePassOutputsVelocity 这个UE4.8以后的新特性我们也关掉了。这边应该说的是这种方法对于我们项目中使用的运动模糊和反锯齿方式来说反倒会起到反作用。】

 
 

 
 

Experiments with shared visibility

 
 

If you are not interested in failures, feel free to skip to the next section (Stereo instancing…).

 
 

Several paragraphs earlier I mentioned stalls in the early Z prepass. You may have also noticed in the profile above that our draw time (i.e. time spent in the render thread) was several milliseconds long. It was a case of a Heisenbug: it never showed up in any external profilers, and I think it has to do with all of them focusing on isolated frames, and not sequences thereof, where inter-frame dependencies rear their heads.

【你会发现上面我们的截图的渲染时间长达几毫秒,这是Heisenbug,因为这里只考虑孤立的帧的情况而不考虑帧间关系。】

 
 

Anyway, while I am still not convinced(相信) that the suspicious prepass GPU timings and CPU draw timings were connected, I took to the conventional wisdom that games are usually CPU-bound when it comes to rendering. Which is why I took a look at the statistics that UE4 collects and displays, searching for something that could help me deconstruct the draw time. This is the output of STAT INITVIEWS, which shows details of visibility culling performance:

【不管怎么样,我还是非常怀疑对CPU/GPU运算的解耦,我想看看在我对GPU绘制的优化过程中CPU有啥反映,这就是为什么我要收集统计UE4相关信息的原因。具体展现信息如下图】

 
 


Output of STAT INITVIEWS in the PS4/Redux version.

 
 

Whoa, almost 5 ms spent on frustum and occlusion culling! That call count of 2 was quite suggestive: perhaps I could halve this time by sharing the visible object set data between eyes?

【视锥和遮挡剔除花了5ms,那么我们是否因该考虑两只眼睛渲染之间的关系来做优化。】

 
 

To this end, I had made several experiments. There was some plumbing required to get the engine not to run the view relevance code for the secondary eye and use the primary eye’s data instead. I had added drawing a debug frustum to the FREEZERENDERING command to aid in debugging culling using a joint frustum for both eyes.I had improved theDrawDebugFrustum() code to better handle the inverse-Z projection matrices that UE4 uses, and also to allow a plane set to be the data source. Getting one frustum culling pass to work for both eyes was fairly easy.

【接下来我做了很多实验,最终使得一次视域剔除作用在两只眼睛的渲染过程中是可行的。】

 
 

But occlusion culling was not.

【但是遮挡处理不可以】

 
 

For performance reasons mentioned previously, we were stuck with the occlusion query-based mechanism (UE4 runs a variant of the original technique). It requires an existing, pre-populated depth buffer to test against. If the buffer does not match the frustum, objects will be incorrectly culled, especially at the edges of the viewport.

【基于性能方面的考虑,我们坚持occlusion query-based mechanism,他必须要一个预填充的depth buffer来做遮挡测试。如果这个buffer不匹配frustum,会出错。】

 
 

There seemed to be no way to generate a depth buffer that could approximate the depth buffer for a “joint eye”, short of running an additional depth rendering pass, which was definitely not an option. So I scrapped the idea.

【这个很难去生成一个双眼近似的depth buffer去做处理,因此放弃。】

 
 

Many months and a bit more experience later, I know now that I could have tried reconstructing the “joint eye” depth buffer via reprojection, possibly weighing in the contributions of eyes according to direction of head movement, or laterality; but it’s all over but the shouting now.

【我现在觉得可以根据双眼的物理存在感去重构depth buffer的reprojection算法来达到上面的目的。】

 
 

And at some point, some other optimization — and I must admit I never really cared to find out which one, I just welcomed it — made the problem go away as a side effect, and so it became a moot point:

【还有一些其他的优化,反正总的目的就是有效,有些具体细节我也没有去深入探讨。】

 
 


Output of STAT INITVIEWS in the VR version.

 
 

 
 

Stereo instancing: not a silver bullet

 
 

Epic have developed the feature of instanced stereo rendering for UE 4.11. We had pre-release access to this code courtesy of Epic and we had been looking forward to testing it out very eagerly.

【UE 4.11 提出了  instanced stereo rendering,我们对此非常非常的期待。】

 
 

It turned out to be a disappointment, though.

【虽然最终是很失望的结果】

 
 

First off, the feature was tailored quite specifically to the Bullet Train UE4 VR demo.

【首先这功能是对Bullet Train demo 量身定做的。】

 
 

https://www.youtube.com/watch?v=DmaxmnPzMWE

 
 

Note that this demo uses dynamic lighting and has zero instanced foliage in it. Our game was quite the opposite. And the instanced foliage would not draw in the right eye. It was not a serious bug; evidently, Epic focused just on the features they needed for the demo, which is perfectly understandable, and the fix was easy.

【注意这个demo采用的是动态光源且has zero instanced foliage in it,这与我们的游戏非常的不一样。对于 instanced foliage不能借鉴一只眼睛的绘制于另一只眼睛。这个不是Bug,但是明显的,epic只在乎的是在这个demo里面很好的使用和容易实现。】

 
 

But the worst part was that it actually degraded performance. I do not have that code laying around anymore to make any fresh benchmarks, but from my correspondence with Ryan Vance, the programmer at Epic who prepared a code patch for us (kudos to him for the initiative!):

【但更坏的是这其实反倒导致性能下降。我没有能力去对这块做修改,但是非常需要demo里那样的性能提升,希望Epic能给我来个补丁。(还讽刺了一下Epic哈哈)】

 
 

Comparing against a pre-change build reveals a considerable perf hit: on foliage-less scenes (where we’ve already been GPU-bound) we experience a ~0.7 ms gain on the draw thread, but a ~0.5 ms loss on the GPU.

【在一个foliage-less的场景测试结果是绘制线程快了 0.7ms, 但是GPU整体慢了 0.5ms】

 
 

Foliage makes everything much, much worse, however (even after fixing it). Stat unit shows a ~1 ms GPU loss with vr.InstancedStereo=0 against baseline, and ~5 ms with vr.InstancedStereo=1!

【Foliage 会让一切变得更坏,vr.InstancedStereo=0 的时候GPU慢了1ms, vr.InstancedStereo=1 的时候慢了 5ms】

 
 

Other UE4 VR developers I have spoken to about this seem to concur. There is also thread at the Unreal forums with likewise complaints. As Ryan points out, this is a CPU optimization, which means trading CPU time for GPU time. I scrapped the feature for Ethan Carter VR — we were already GPU-bound for most of the game by that point.

【我和其他的UE4 VR开发者谈过这个问题,他们也很同意。也有人在论坛里面抱怨过这个问题的,总的来说这可以看作是一个CPU的优化,利用GPU的性能来降低CPu的消耗。】

 
 

 
 

The all-seeing eyes

 
 


The problematic opening scene.

 
 

At a point about two-thirds into the development, we had started to benchmark the game regularly, and I was horrified to find that the very opening scene of the game, just after exiting the tunnel, was suffering from poor performance. You could just stand there, looking forward and doing nothing, and we would stay pretty far from VR performance targets. Look away, or take several steps forward, and we were back under budget.

【这里开始讲VR游戏的一个特点就是,用户在全开放的世界里面的关注点是你没有办法能够预测的,这一点在一开始我们做游戏的时候最让人觉得惊讶的,也给我们带来了程序性能上的问题。】

 
 

A short investigation using the STAT SCENERENDERING command showed us that primitive counts were quite high (in the 4,000–6,000 region). A quick look around using the FREEZERENDERING command did not turn up any obvious hotspots, though, so I took to the VIS command. The contents of the Z-buffer after pre-pass (but before the base pass!) explained everything.

【我们将整个区域分成更多块,然后去测试用户比较关注的块,结果找不到任何的热点。】

 
 


Note the missing ground in the foreground, in the bottom-left visualizer panel.

 
 

At the beginning of the game, the player emerges from a tunnel. This tunnel consists of the wall mesh and a landscape component (i.e. terrain tile) that has a hole in it, which resulted in the entire component (tile) being excluded from the early Z-pass, allowing distant primitives (e.g. from the other side of the lake!) to be visible “through” large swaths of the ground. This was also true of components with traps in them, which are also visible in this scene.

【游戏一开始是一个铁道的隧道,因此可以通过Z值来遮挡掉大量的不可见部分。但是到了开阔区域,场景中的一切都可能随时被看到。】

 
 

I simply special-cased landscape components to be rendered as occluders even when they use masked materials (reminder: you need to have your Unreal Engine and Github accounts connected to see UE4 commits). This cut us from several hundred to a couple thousand draw calls in that scene, depending on the exact camera location.

【因此VR让我们回到了几百年前,你所站的位置也就是相机位置的四周都必须考虑进去渲染计算,然后通过相机裁剪得到最终结果,但其实性能尚有浪费。】

 
 

 
 

Fog so thick one might have spread it on bread

 
 

Still not happy with the draw call count, I took to RenderDoc. It has the awesome render overlay feature that helps you quickly identify some frequent problems. In this case, I started clicking through occlusion query dispatch events in the frame tree with the depth test overlay enabled, and a pattern began to emerge.

【对于Drawcll的数量还是不满意,我们来使用renderdoc来分析一下,它具有叠加呈现的功能可以让你快速的识别一些常见问题。在这里我用来分析遮挡问题。】

 
 


RenderDoc’s depth test overlay. An occlusion query(遮挡查询) dispatched for an extremely distant, large (about 5,000 x 700 x 400 units) object, showing a positive result (1 pixel is visible).

 
 

Since UE4 dispatches bounding boxes of meshes for occlusion queries, making it somewhat coarse and conservative (i.e. subject to false positives), we were having large meshes pass frustum culling tests, and then occlusion, by having just 1 or 2 pixels of the bounding box visible through thick foliage. Skipping through to the actual meshes in the base pass would reveal all of their pixels failing the depth test anyway.

【UE4采用基于包围盒的遮挡测试,粗糙但是快速,避免了实际的网格之间的测试。】

 
 


RenderDoc’s depth test overlay in UE4’s base pass. A mesh of decent size (~30k vertices, 50 x 50 x 30 bounding box), distant enough to occupy just 3 pixels (L-shaped formation in the centre). Successful in coarse occlusion testing, but failing the per-pixel depth tests.

 
 

Of course, every now and then, a single pixel would show through the foliage. But even then, I could not help noticing that it would be almost completely washed out by the thick fog that encompasses the forest at the beginning of the game!

【当然,时不时的,会有些非常小的pixel的深度值很突兀,包围盒方法错的很难看,这也可以通过雾来遮挡掉。】

 
 

This gave me the idea: why not add another plane to the culling frustum, at the distance where fog opacity approaches 100%?

【这给了我一个灵感:为什么不在雾导致完全不透明的距离上加一块板来遮挡掉后面的所有。】

 
 

Solving the fog equation for the distance and adding the far cull plane shaved another several hundred draw calls. We had the draw call counts back under control and in line with the rest of the game.

【这样可以大量的减少drawcall数量】

 
 

 
 

Insane LODs

 
 

At some point late in development, AMD’s Matthäus G. Chajdas was having a look at a build of the game and remarked that we are using way too highly tessellated trees in the aforementioned opening scene. He was right: looking up the asset in the editor had revealed that screen sizes of LODs 1+ were set to ridiculous amounts in the single-digit percentage region. In other words, the lower LODs would practically never kick in.

【在开发的后期,AMD的大神看了demo后说我们在开放场景中用了太多的树的高模。他说得非常对,我们需要在离用户比较近的地方使用这些模型,远距离的时候采用LOD。】

 
 

When asked why, the artists responded that when using the same mesh asset for hand-planted and instanced foliage, they had the LODs kick in(起作用) at different distances, and so they used a “compromise” value to compensate.

 
 

Needless to say, I absolutely hate it when artists try to clumsily work around such evident bugs instead of reporting them. I whipped up a test scene, confirmed the bug and started investigating(调查), and it became apparent that instanced foliage does not take instance scaling into account when computing the LOD factors (moreover, it is not even really technically feasible without a major redecoration, since the LOD factor is calculated per foliage type per entire cluster). As a result, all instanced foliage was culled as if it had a scale of 1.0, which usually was not the case for us.

 
 

Fortunately, the scale does not vary much within clusters. Taking advantage of this property, I put together some code for averaging the scale over entire instance clusters, and used that in LOD factor calculations. Far from ideal, but as long as scale variance within the cluster is low, it will work. Problem solved.

 
 

【必须要说的是,LOD会带来视觉效果上的一些BUG,对于这些问题处理起来很让人受不了。主要是实例化后比例上的问题,通过打组解决】

 
 

 
 

The money shot

 
 

But the most important optimizationthe one which I believe put the entire endeavour in the realm of possibilitywas the runtime toggling of G-buffers. I must again give Matthäus G. Chajdas credit for suggesting this one; seeing a GPU profile of the game prompted him to ask if we could maybe reduce our G-buffer pixel format to reduce bandwidth saturation. I slapped my forehead, hard. ‘Why, of course, we could actually get rid of all of them!’

【最重要的优化就是运行时G-Buffer切换,我们需要去减少G-buffer的使用来减少带宽的负荷来提高运行速率。】

 
 

At this point I must remind you again that Ethan Carter has almost all of its lighting baked and stowed away in lightmap textures. This is probably not true for most UE4 titles.

【说到这里我必须再次提醒 Ethan Carter 几乎采用的全是烘焙好的lightmap来实现光照,这不是大多数UE4的游戏都这么做的。】

 
 

Unreal already has a console variable for that called r.GBuffer, only it requires a restart of the engine and a recompilation of base pass shaders for changes to take effect. I have extended the variable to be an enumeration, assigning the value of 2 to automatic runtime control.

【r.Gbuffer 变量就是来控制这边内容的,注意修改这个值需要重启编译器重新编译所有的shader,数值是一个枚举值,数值为2就是自动运行时控制。】

 
 

This entailed a bunch of small changes all around the engine:

【这里需要对引擎做一些小的修改】

 
 

  1. Moving light occlusion and gathering to before the base pass.
  2. Having TBasePassPS conditionally define the NO_GBUFFER macro for shaders, instead of the global shader compilation environment.
  3. Creating a new shader map key string.
  4. Finally, adjusting the draw policies to pick the G-buffer/no G-buffer shader variant at runtime.

 
 

This change saved us a whopping 2–4 milliseconds per frame, depending on the scene!

【这可以每帧减少2-4ms!!!】

 
 

It does not come free, though — short of some clever caching optimization, it doubles the count of base pass shader permutations, which means significantly longer shader compiling times (offline, thankfully) and some additional disk space consumption. Actual cost depends on your content, but it can easily climb to almost double of the original shader cache size, if your art team is overly generous with materials.

【其实他是在变异的时候做了更多的优化工作来减少所需要的shader的cache,虽然增加了编译时间但那是离线的没关系。】

 
 

The fly in the ointment

Except of course the G-buffers would keep turning back on all the time. And for reasons that were somewhat unclear to me at first.

【美中不足:G-buffers would keep turning back on all the time.】

 
 

A quick debugging session revealed that one could easily position themselves in such a way that a point light, hidden away in an indoor scene at the other end of the level, was finding its way into the view frustum. UE4’s pretty naive light culling (simple frustum test, plus a screen area fraction cap) was simply doing a bad job, and we had no way of even knowing which lights they were.

【可以通过单点光源在UE4里面调试来fit到适合的最好结果。】

 
 

I quickly whipped up a dirty visualisation in the form of a new STATcommand — STAT RELEVANTLIGHTS — that lists all the dynamic lights visible in the last frame, and having instructed the artists on its usage, I could leave it up to them to add manual culling (visibility toggling) via trigger volumes.

【通过statcommand可以可视化frame里所有的光源使用情况,便于调试处理。】

 
 


STAT RELEVANTLIGHTS output. Left: scene with fully static lighting. Right: fully dynamic lighting; one point light has shadow casting disabled.

 
 

Now all that was left to optimize was game tick time, but I was confident that Adam Bienias, the lead programmer, would make it. I was free to clean my desk and leave for my new job!

【到此该讲的就讲完了,作者准备滚蛋开始新工作了。】

 
 

 
 

Conclusions

 
 

In hindsight, all of these optimizations appear fairly obvious. I guess I was simply not experienced enough and not comfortable enough with the engine. This project had been a massive crash course in rendering performance on a tight schedule for me, and there are many corners I regret cutting and not fully understanding the issue at hand. The end result appears to be quite decent, however, and I allow myself to be pleased with that. 😉

【事后来看所有优化效果都是相当明显的,作者非常满意。】

 
 

It seems to me that renderer optimization for VR is quite akin to regular optimization: profile, make changes, rinse, repeat. Original VR content may be more free in their choice of rendering techniques, but we were constrained by the already developed look and style of the game, so the only safe option was to fine-tune what was already there.

【优化工作流程: profile, make changes, rinse, repeat】

 
 

I made some failed attempts at sharing object visibility information between eyes, but I am perfectly certain that it is possible. Again, I blame my ignorance and inexperience.

【失败的尝试就不再这里废话了】

 
 

The problem of early-Z pass per-eye timings discrepancy/occlusion query stalling calls for better understanding. I wish I had more time to diagnose it, and the knowledge how to do it, since all the regular methods failed to pin-point it (or even detect it), and I had only started discovering xperf/ETW andGPUView.

【上面提到的early-Z pass per-eye timings discrepancy/occlusion query stalling calls这部分以后会写的更详细】

 
 

Runtime toggling of G-buffers is an optimization that should have made it into the PS4 port already, but again — I had lacked the knowledge and experience to devise it. On the other hand, perhaps it is only for the better that we could not take this performance margin for granted.

【G-buffers方面可能可以更进一步,我还需努力去了解。】

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

 
 

Advanced VR Rendering Performance

作者:

Alex Vlachos

ValveAlex@ValveSoftware.com


这是全文的第二部分,第一部分在GDC 15的时候讲的,下面是链接:

Video and slides from last year are free online: http://www.gdcvault.com/play/1021771/Advanced-VR

全篇目标就是:VR渲染如何在保证质量的前提下获得最好的性能。

全文分为四部分:


Multi-GPU for VR

这一点主要考虑的方向就是GPU硬件对于性能的提升。

首先回顾上一篇有讲到隐藏网格区域,指的就是对于最后渲染buffer的不会看到的部分位置的像素点不渲染。

计算机生成了可选文字: ○ 寸 1

bgt_6_1

bgt_6_2

bgt_6_3计算机生成了可选文字:
首先考虑使用单颗
GPU来完成所有工作。

这里单GPU渲染工作方式有很多种,这里以顺序渲染作为例子。

bgt_6_4计算机生成了可选文字:

bgt_6_5计算机生成了可选文字: GPIJO Sha ows eft E e VSync Submit L 11.11 ms Submit R syste VSync

上图展示的就是一次渲染的单GPU工作流,特点是两个眼睛共享shadow buffer

然后我们再来考虑同时使用多个GPU来渲染。

AMD, Nvidia提供的多GPU协同工作API大同小异,都具备以下的重要功能:

  • 广播draw call + mask 确定每个GPU的工作内容
  • 每个GPU拥有独立的shader constant buffers并可以独立设置
  • GPU之间传输(或异步传输:在目标GPU在工作时不会打断,特别有用)渲染目标主体

使用两个GPU的情况:

bgt_6_6

bgt_6_7

计算机生成了可选文字: Submit L Submit R eft Eye GPUO Sha ows syste Win o GPI-Jl Shadows VSync 11.11 ms VSync

  • 每个GPU渲染一只眼睛看到的东西
  • 每个GPU都需要渲染shadow buffer
  • GPU的渲染结果需要提交到主GPU上,最终提交给VR system
  • 实践证明这样的方式大概可以提高30-35%的系统性能

使用4GPU的情况

bgt_6_8计算机生成了可选文字: E ndĐ nd9

bgt_6_9计算机生成了可选文字: Submit L GPIJO GPIJI GPIJ2 GPIJ3 eft E e A Sha ows Shadows Right Eye A Sha ows Shadows Submit R Syste VSync Transfe 11.11 ms VSync

  • 每个GPU只需要渲染半只眼睛看到的画面
  • 但是每个GPU还是需要独立的都运算一遍shadow buffers
  • 对于 pixel shadercost来说每个GPU减到原来的1/4,但是对于vertex shader来说,每个GPU都需要做,cost还是1。(相比较于单GPU
  • 太多的GPU驱动的工作协调会带来更高的CPU cost

这里还要注意的是因为GPU渲染结果之间的传输是支持异步的,因此可以来考虑如何把多个次GPu的结果传输到主GPU上去,可以有多种组合方式,第三种方式的最终等待时间最少被采用。

bgt_6_10

计算机生成了可选文字: GPUO GPU2 GPIJ3 GPI-JO GPUI GPU2 GPU3 GPUO GPI-Jl GPU2 GPU3 Submit L Submit R GPI-Jl Is Left Eye B hadows Shadows Right Eye A Submit R Shadows Left Eye B hadows Right Eye A Submit L Sutrnit R Left Eye Shadows Shadows Right Eye A • Trn<er• Right Eye B Shadows

到此大家可能会发现,当单个GPU渲染改成2个的时候,其性能的提升还是非常明显的。但是再次增加GPU数目的时候,从时间效率的角度的性能提升已经不是很有优势了,这是因为当多GPU可以切分的Cost(主要是Pixel shadercost)分的越来越小,其占据GPU运算的主要瓶颈就在于多GPU不可切分的Costshadow buffer运算和vertex shader相关)。也就是每个GPU都在做同样的重复的事情。

下图就是在相同工作量的情况下,不同GPU数量的时间性能比较。

bgt_6_11

计算机生成了可选文字: GPI-JOI GPI-JO GPI-Jl GPI-JO GPI-Jl GPIJ2 GPIJ3 Submit L Submit L Submit R Submit L Submit R hadow Left Eye B Shadows Right Eye A Shadows Submit R VSync 11.11 ms VSync

但是反过来考虑突出多GPU的优点就是,可以获得更高的最终画质(原pixel shader cost非常高)。

下图就是在相同性能时间的情况下,不同数量GPU可以获得的画面质量比较。

bgt_6_12


Fixed Foveated Rendering & Radial Density Masking

这一点主要考虑VR眼镜的光学特性来提升渲染性能。

投影矩阵投影后的在渲染buffer上的像素密度分布是与我们所希望的相反的。

  • Projection matrix: 在画面中心的位置所能采样到的点的数量比画面周边少
  • VR optics:画面中心位置的采样点在VR里面是你最关注的,看得最清楚的

结果就是导致对画面周边的over rendering

Over Rendering 解释:

bgt_6_13

bgt_6_14计算机生成了可选文字:

优化:Fixed Foveated Rendering

按下列模版去渲染,放大画面中心位置,减少画面周边的所需渲染的pixel数量。

bgt_6_15

bgt_6_16

bgt_6_17

计算机生成了可选文字:

这种模式下推荐使用多GPU渲染

Using NVIDIA’s “Multi-Resolution Shading” we gain an additional ~5-10% GPU perf with less CPU overhead (See “GameWorksVR”, Nathan Reed, SIGGRAPH 2015)

接下来作者想到了新的玩法

Radial Density Masking

对于周边的区域采用2*2pixel块间隔渲染来减少渲染的pixel数量。

Skip rending a checker pattern of 2×2 pixel quads to match current GPU architectures

bgt_6_18

计算机生成了可选文字:

然后根据算法填充完没有渲染的pixel区域

bgt_6_19

计算机生成了可选文字: Average 2 neig ors (Average across dia 1/16 1/8 1/16 1/8 1/4 1/8 1/16 1/8 1/16 Optimized Bilinear Samples Weights near to far: 0.375, 0.375, 0.125, 0.125 Weights near to far: 0.5, 0.28125, 0.09375, 0.09375, 0.03125

左边是理论上的算法,右边是可以根据左边方法直接生成权值模版套用。

总结下步骤:

首先是渲染间隔的2*2pixel块,然后就是套用filter来填充其他pixel

这种方式在Aperture Robot Repair的例子里面减少了5-15%的性能消耗。对于低端的GPU来说这种方式特别有效。


Reprojection

如果你的引擎达不到目标的帧率,那么VR系统就应该通过reproject上一帧来设置这一帧的结果。

reproject包括

  • Rotation-only reprojection
  • Position & rotation reprojection

但特别需要注意的是:这里的这种reprojection的方式看作是帧率的最后的安全网,只有在你的GPU性能都低于你的应用程序最低性能要求的前提下才去使用。

Rotation-Only Reprojection

两张前后序的渲染结果对应位置求平均得到的图片会存在judder

bgt_6_20

计算机生成了可选文字:

judder的存在包括很多原因,比如包括相机移动,动画,对象移动等等。

这里judder存在的一个很大的原因就是对相机的模拟方式不够准确。

首先rotation projection应该是以眼睛为中心的,而不是以头部中心为中心的,不然会导致旋转与用户感知的旋转不一致。

bgt_6_21计算机生成了可选文字:

其次需要考虑的是两眼间的间距,两眼间距如果不是和戴头盔的人眼的间距不一致,也就是旋转半径不同,这样得到的旋转结果也和用户的感受不一致。

bgt_6_22计算机生成了可选文字:

但是综合考虑的话,rotation-onlyreprojection可以说已经足够好用,相比起掉帧来说。

Positional Reprojection

仍然是一个没有解决的问题。

  • 传统的渲染方式只会保留一个深度的内容,因此对于半透明的reprojection来说是一种挑战,特别是粒子系统的使用后的positional reprojection
  • MSAA depth buffer已经存了现有的颜色,再当深度信息到达存储的时候可能会导致渗色。
  • 用户的移动会导致看到的内容出现缺漏,补洞算法也是一个挑战。

Asynchronous Reprojection

作者提出的新的概念,理想的安全网

首先这种方式需要GPU可以精确的被抢占(抢占粒度的控制),当前的GPU理论上可以在draw call之间被抢占,但实际上是看GPU现有的功能。

异步的一大问题还是在于不能保证reproject在一次vsync之内完成,而如果完不成就没有任何意义。

作为应用程序如果想要使用异步时间扭曲,必须注重抢占粒度

“You can split up the screen into tiles and run the post processing on each tile in a separate draw call. That way, you provide the opportunity for asynctimewarp to come in and preempt in between those draws if it needs to.” –“VRDirect”,Nathan Reed, GDC 2015

Interleaved Reprojection

对于老旧的GPU来说是不支持异步reprojection的,因为没有GPU抢占功能,这时候我们就需要寻找替代方案。

如果你的系统不支持 always-on asynchronous reprojection 功能, OpenVR API 提供 every-other-frame rotation-only reprojection 的功能。这模式下应用程序可以获得18ms的时间来渲染一张frame。这种模式对于保证帧率来说是很好的交易:

“In our experience, ATW should run at a fixed fraction of the game frame rate. For example, at 90Hz refresh rate, we should either hit 90Hz or fall down to the half-rate of 45Hz with ATW. This will result in image doubling, but the relative positions of the double images on the retina will be stable. Rendering at an intermediate rate, such as 65Hz, will result in a constantly changing number and position of the images on the retina, which is a worse artifact.” –“Asynchronous Timewarp Examined”, Michael Antonov, Oculus blog, March, 2015


Adaptive Quality

保证帧率是非常困难的一件事情。VR相对于传统游戏来说的挑战在于:

  • 用户对相机的精细控制
  • 用户与游戏世界的新的交互模型

这里作者有提到他们为了让Robor Repair达到目标帧率的经历是整个项目中最难的最累的部分精力。为了让用户在任意角度观看和操作都达到90帧的帧率来微调内容和渲染是最痛苦的。

动态的质量变化就是根据当前GPU的能力动态的调整渲染质量来保证帧率。

  • Goal #1: Reduce the chances of dropping frames and reprojecting
  • Goal #2: Increase quality when there are idle GPU cycles

那么首先考虑VR层面哪些渲染设置是可以调整的:

  • Rendering resolution/ viewport
  • MSAA 层数 抗锯齿算法
  • Fixed Foveated Rendering (第二部分的内容)
  • Radial Density Masking(第二部分的内容)
  • Etc.

而有些渲染设置是不可以调整的:

  • 阴影
  • 视觉效果,比如镜面

作者他们使用的一个动态调整质量的例子:

bgt_6_23

计算机生成了可选文字: Defa U Quality Level +6 +5 +3 +2 1 2 3 _4 MSAA 8x 4x Resolution Scale 1.4 1.3 1.2 1.1 1.0 1.1 1.0 0.9 0.81 0.73 0.65 Radial Density Masking On Render Resolution 2116x2352 1965x2184 1814x2016 1663x1848 1512x1680 1663x1848 1512x1680 1360x1512 1224x1360 1102x1224 992xı 102

这里作者展示了一段视频来说明渲染质量之间的切换,上面的拉条标识的就是当前的渲染质量。

bgt_6_24

计算机生成了可选文字: AT-D179b

在自动调整渲染质量的过程中最关键的就是要去衡量GPU的工作负载。

VR系统里面GPU的工作量是变化的,来源于lens distortion, chromatic aberration, chaperone bounds, overlays, etc.

我们需要了解的是整个VR system的时间流,OpenVR提供了对于all GPU workstotal GPU timer的显示:

bgt_6_25计算机生成了可选文字: VSync Application Rendering Start Timer VR System Rendering VSync Time Remainin End Timer

GPU Timers 存在延时

  • GPU 查询到的是前一个frame的结果
  • 在处理队列中的一两个帧也是不能被修改的

下图展示的就是GPU工作流,一帧的内容从CPUGPU一起处理完是横跨几个Vsync的,因此你要修改的瞬间前已经进入CPU处理的还是会被渲染出来,也就是上面第二点会延迟一两个帧再轮到你就该的结果frame的显示。

关于第一点就是说你在当前帧没处理完提交前的查询,查询的是渲染buffer的内容,就是上一帧提交的结果。

bgt_6_26计算机生成了可选文字: Get ms 1 ms CPU GPU vsync submit D3D call$ 11.11 ms ende 1 VSync Game Simu aor Timer 11.11 ms VSync ame Simu atio submit D3D calls Rende VSync Submit D3D all 11.11 end

作者动态调整渲染级别的细节:三条规则

目标:维持GPU 70-90%的利用率

高于 90%:主动降低两个渲染级别

低于 70%:主动提高一个渲染级别

预测 85% + 线性增长:主动降低两个渲染级别,因为处理延时存在2frame,因此需要提前预测。

10%的空闲GPU流出来可以应对其他进程对于GPU的需求或者是一些系统或其他的突发请求,是非常有用的。

因此这里我们要求每一帧的渲染时间控制在10ms以内,而不是去年提出来的11.11ms

这里还有一个问题需要注意的就是当resolution scalar下降太过的时候会导致文本不宜阅读,因此对于GPU性能很差的时候,我们建议开启 Interleaved Reprojection Hint 来维持帧率。

因此在Aperture Robot Repair例子里面我们提供了两套Adaptive Quality的选择。

bgt_6_27计算机生成了可选文字: Option A +6: 8xMSAA, +5: 8xMSAA, +4: 8xMSAA, +3: 8xMSAA, +2: 8xMSAA, +1: 4xMSAA, 1.4x res 1.3x res 1.2x res 1.1x res 1.0x res 1.1x res 1.0x resolution (Default) Option B — Text-friendly +6: 8xMSAA, +5: 8xMSAA, +4: 8xMSAA, +3: 8xMSAA, +2: 8xMSAA, +1: 4xMSAA, O: 4xMSAA, -1: 4xMSAA, 0.9x res -2: 4xMSAA, 0.81x res -3: 4xMSAA, 0.73x res -4: 4xMSAA, 0.65x res, O: 4xMSAA, -1: 4xMSAA, -2: 4xMSAA, -3: 4xMSAA, 1.4x res 1.3x res 1.2x res 1.1x res 1.0x res 1.1x res 1.0x resolution (Default) 0.9x res 0.81x res 0.81x res, Interleaved Reprojection Hint Radial Density Masking

【这部分需要再看下视频】

还有一个需要注意的问题是GPU内存,这也是我们选择光圈大小的依据之一。

bgt_6_28计算机生成了可选文字: Scalar 2.0 2.0 776 мв Aperture 1.4 342 мв 684 мв 1.2 502 мв 1.0 348 мв 1.1 117 мв 234 мв 1.0 194 мв 128 мв MSAA 8х Resolution 3024х3360 3024х3360 2116х2352 1814х2016 1512х1680 1663х1848 1512х1680 1224х1360 GPU Метогу 1 Еуе = Color + Depth + Resolve GPU Метогу 2 Eyes = Color + Depth + Resolve 1,396 мв 0.81 698 мв 388 мв 251 мв 174 мв 97 мв 64 МВ

Aperture allocates both a 1.4 8xMSAA and a 1.1 4xMSAA render target per eye for a total of 342 MB + 117 MB = 459 MB per eye (918 MB 2 eyes)! So we use sequential rendering to share the render target and limit resolution to 1.4x for 4 GB GPUs.

bgt_6_29计算机生成了可选文字: Scalar MSAA 8х Aperture 2.0 2.0 776 мв 1.4 684 мв 1.2 502 мв 1.0 348 мв 1.1 117 мв 234 мв 1.0 194 мв 128 мв Resolution 3024х3360 3024х3360 2116х2352 1814х2016 1512х1680 1663х1848 1512х1680 1224х1360 GPU Метогу 1 Еуе = Color + Depth + Resolve 698 мв GPU Метогу 2 Eyes = Color + Depth + Resolve 1,396 мв 0.81 388 мв 342 мв 251 мв 174 мв 97 мв 64 МВ

For a 2.0 resolution scalar, we require 698 MB + 117 MB = 815 MB per eye.

Valve’s Unity Rendering Plugin

Valve unity中使用的是自定义的渲染插件,该插件即将免费开放给大家用且开源。

The plugin is a single-pass forward renderer (because we want 4xMSAA and 8xMSAA) supporting up to 18 dynamic shadowing lights and Adaptive Quality

CPU GPU 性能解耦

前提条件是你的渲染线程需要自治。

如果你的CPU还没有准备好新的一帧的渲染内容,那么渲染线程根据HMD pose信息和dynamic resolution的设置信息修改并重新提交上一帧的GPU工作任务给GPU

【这边讲动画judder去除的需要再看下视频】

Then you can plan to run your CPU at 1/2 or 1/3 GPU framerate to do more complex simulation or run on lower end CPUs


总结

  • Multi-GPU support should be in all VR engines (at least 2-GPUs)
  • Fixed Foveated Rendering and Radial Density Masking are solutions that help counteract the optics vs projection matrix battle
  • Adaptive Quality scales fidelity up and down while leaving 10% of the GPU available for other processes. Do not rely on reprojection to hit framerate on your min spec!
  • Valve VR Rendering Plugin for Unity will ship free soon
  • Think about how your engine can decouple CPU and GPU performance with resubmission on your render thread

Optimizing the Unreal Engine 4 Renderer for VR

https://developer.oculus.com/blog/introducing-the-oculus-unreal-renderer/

 

For Farlands, the Oculus team wrote an experimental, fast, single-pass forward renderer for Unreal Engine. It’s also used in Dreamdeck and the Oculus Store version of Showdown. We’re sharing the renderer’s source as a sample to help developers reach higher quality levels and frame rates in their own applications. As of today, you can get it as an Unreal developer from https://github.com/Oculus-VR/UnrealEngine/tree/4.11-ofr.

【Oculus团队写了一个试验性的,快速的,单pass forward renderer的unreal engine工具,在这里我们分享出来见github,这工具已经应用在了Dreamdecks等Oculus应用上了】

 

Rendering immersive VR worlds at a solid 90Hz is complex and technically challenging. Creating VR content is, in many ways, unlike making traditional monitor-only content—it brings us a stunning variety of new interactions and experiences, but forces developers to re-think old assumptions and come up with new tricks. The recent wave of VR titles showcase the opportunities and ingenuity of developers.

【渲染沉浸式的VR世界保证帧率是一件非常有挑战性的事情。渲染VR内容不像是传统的显示器渲染,交互的创新带来了很多改变。这对于渲染来说带来的就是去重新审视过去的一些技术的选择,想说的就是适合屏幕渲染的技术不一定还继续适合VR渲染,这里重新来考虑一些技术的比较。】

 

As we worked, we re-evaluated some of the traditional assumptions made for VR rendering, and developed technology to help us deliver high-fidelity content at 90Hz. Now, we’re sharing some results: an experimental forward renderer for Unreal Engine 4.11.

【我们的工作就是来重新考虑这些旧有技术对于VR的价值,下面就是分享一些实验结果。】

 

We’ve developed the Oculus Unreal Renderer with the specific constraints of VR rendering in mind. It lets us more easily create high-fidelity, high-performance experiences, and we’re eager to share it with all UE4 developers.

【我们开发了一个独立的VR内容渲染器,可以获得更高效的渲染结果,见github.】

 

Background

 

As the team began production on Farlands, we took a moment to reflect on what we learned with the demo experiences we showed at Oculus Connect, GDC, CES, and other events. We used Unreal Engine 4 exclusively to create this content, which provided us with an incredible editing environment and a wealth of advanced rendering features.

【我们团队是使用Unreal开发Farlands的,相关内容已经在各大展会分享过,不作具体介绍】

 

Unfortunately, the reality of rendering to Rift meant we’d only been able to use a subset of these features. We wanted to examine those we used most often, and see if we could design a stripped-down renderer that would deliver higher performance and greater visual fidelity, all while allowing the team to continue using UE4’s world-class editor and engine. While the Oculus Unreal Renderer is focused on the use cases of Oculus applications, it’s been retrofit into pre-existing projects (including Showdown and Oculus Dreamdeck) without needing major content work. In these cases, it delivered clearer visuals, and freed up enough GPU headroom to enable additional features or increase resolution 15-30%.

【Ue4很好用但是相对来说渲染性能对于VR程序来说还有可以针对性优化的空间来提升效率并获得更好的渲染结果】

bgt_5_1

Comparison at high resolution: The Oculus Unreal Renderer runs at 90fps while Unreal’s default deferred renderer is under 60fps.

【oculus采用 forward 渲染效率秒杀Unreal 默认的 defered渲染】

 

The Trouble With Deferred VR

 

【这边相关的基础知识可以见Base里面讲述forward/defered rendering的内容】

 

Unreal Engine is known for its advanced rendering feature set and fidelity. So, what was our rationale for changing it for VR? It mostly came down our experiences building VR content, and the differences rendering to a monitor vs Rift.

【UE本身包含大量功能,我们要做的就是选择合适的应用到VR渲染。】

 

When examining the demos we’d created for Rift, we found most shaders were fairly simple and relied mainly on detailed textures with few lookups and a small amount of arithmetic. When coupled with a deferred renderer, this meant our GBuffer passes were heavily texture-bound—we read from a large number of textures, wrote out to GBuffers, and didn’t do much in between.

【VR更高的分辨率要求如果采用defered rendering带来的是对GBuffer数据传输的超高要求】

 

We also used dynamic lighting and shadows sparingly and leaned more heavily on precomputed lighting. In practice, switching to a renderer helped us provide a more limited set of features in a single pass, yielded better GPU utilization, enabled optimization, removed bandwidth overhead, and made it easier for us to hit 90 Hz.

【我们尽量少的使用动态光照和阴影,取而代之的是使用预计算光照。在使用中使用我们提供的渲染器限制了single pass的一些功能,开启了必要的优化关闭了大量的无效功能,最终有助于提升帧率。】

 

We also wanted to compare hardware accelerated multi-sample anti-aliasing (MSAA) with Unreal’s temporal antialiasing (TAA). TAA works extremely well in monitor-only rendering and is a very good match for deferred rendering, but it causes noticeable artifacts in VR. In particular, it can cause judder and geometric aliasing during head motion. To be clear, this was made worse by some of our own shader and vertex animation tricks. But it’s mostly due to the way VR headsets function.

【我们还想要比较的是硬件加速的MSAA和unreal提供的TAA的效果。】

【TAA对于显示器终端的渲染效果非常好且可以很好的配合deferred rendering,但是在VR渲染中使用明显让人感觉到假像。在head motion的过程中会导致judder和geometric aliasing. 】

 

Compared to a monitor, each Rift pixel covers a larger part of the viewer’s field of view. A typical monitor has over 10 times more pixels per solid angle than a VR headset. Images provided to the Oculus SDK also pass through an additional layer of resampling to compensate for the effects of the headset’s optics. This extra filtering tends to slightly over-smooth the image.

【相比较显示器,头盔的每一个像素覆盖的真实范围视觉比较大。Oculus SDK通过一额外的层来resampling补偿来使得最终的效果更平滑】

 

All these factors together contribute to our desire to preserve as much image detail as possible when rendering. We found MSAA to produce sharper, more detailed images that we preferred.

【所有的这些都是为了使最终的渲染效果更加的细腻,而我们发现MSAA提供的效果更佳的shaper,可以保留更多的细节。】

bgt_5_2

Deferred compared with forward. Zoom in to compare.

 

A Better Fit With Forward

 

Current state-of-the-art rendering often leverages(杠杆) screen-space effects, such as screen-space ambient occlusion (SSAO) and screen-space reflections (SSR). Each of these are well known for their realistic and high-quality visual impact, but they make tradeoffs that aren’t ideal in VR. Operating purely in screen-space can introduce incorrect stereo disparities (differences in the images shown to each eye), which some find uncomfortable. Along with the cost of rendering these effects, this made us more comfortable forgoing support of those features in our use case.

【现在的渲染方式通过采用屏幕空间的一些方式来达到更好的效果,比如SSAO,SSR. 但是这些方法都无法直接在VR渲染上面采用。】

 

Our decision to implement a forward renderer took all these considerations into account. Critically, forward rendering lets us use MSAA for anti-aliasing, adds arithmetic(算数) to our texture-heavy shaders (and removes GBuffer writes), removes expensive full-screen passes that can interfere with(干扰) asynchronous timewarp, and—in general—gives us a moderate speedup over the more featureful deferred renderer. Switching to a forward renderer has also allowed the easy addition of monoscopic(单视场) background rendering, which can provide a substantial performance boost for titles with large, complex distant geometry. However, these advantages come with tradeoffs that aren’t right for everyone. Our aim is to share our learnings with VR developers as they continue fighting to make world-class content run at 90Hz.

【我们决定采用一种把上面这些因素考虑在内的forward renderer。 采用MSAA,texture-heavy shader,去掉了full-screen passes(会干扰异步timewarp),还有增加了forward renderer  支持的 monoscopic(单视场) background rendering(就是说原理相机的背景部分不用渲染两次,而是渲染一次同时提交给左右眼,Oculus的SDk里面有。)】

 

Our implementation is based on Ola Olsson’s 2012 HPG paper, Clustered Deferred and Forward Shading. Readers familiar with traditional forward rendering may be concerned about the CPU and GPU overhead of dynamic lights when using such a renderer. Luckily, modern approaches to forward lighting do not require additional draw calls: All geometry and lights are rendered in a single pass (with an optional z-prepass). This is made possible by using a compute shader to pre-calculate which lights influence 3D “clusters” of the scene (subdivisions of each eye’s viewing frustum, yielding a frustum-voxel grid). Using this data, each pixel can cheaply determine a list of lights that has high screen-space coherence, and perform a lighting loop that leverages the efficient branching capability of modern GPUs. This provides accurate culling and efficiently handles smaller numbers of dynamic lights, without the overhead of additional draw calls and render passes.

【这里的实现是 forward+ 的方法,具体内容见2012年的论文,相关基本的概念见我总结的三种渲染方式的比较。这边后面讲的就是forward+的基本原理:通过与处理来挑选对每个pixel有较大影响的光源,在后面处理的时候只考虑这几个光照,就是light-culling的意思。】

bgt_5_3

(Visualization of 3D light grid, illustrating the lighting coherence and culling)

 

Beyond the renderer, we’ve modified UE4 to allow for additional GPU and CPU optimizations. The renderer is provided as an unmaintained sample and not an officially-supported SDK, but we’re excited to give projects using Unreal Engine’s world-class engine and editor additional options for rendering their VR worlds.

【我们搞了个UE4的版本大家可以试试。】

 

You can grab it today from our Github repository as an Unreal Developer at https://github.com/Oculus-VR/UnrealEngine/tree/4.11-ofr. To see it in action, try out Farlands, Dreamdeck, and Showdown.