At this time Unity just started to support Microsofts DirectX 11 and with DirectX 11 came the DirectCompute API. This API opens up a whole new way to use the GPU by writing compute shaders.
【Unity GPGPU 很好用，但是没有教程让使用者非常难受，因此作者准备搞一发。】
I hope to start of with a introduction about why DirectCompute was needed and how it differs to the traditional graphics pipeline, how to write the kernels via compute shaders, how the internal GPU tiling of threads works, how to access textures, how to use the various buffers types, how to use thread synchronization and shared memory and final how to maximize performance.
The Graphics pipline
All developers would need to do was send the geometry that they needed to render to the GPU and OpenGL would pass it through the pipeline until the end output was pixels displayed on the screen. This pipeline was fixed and although developers could enable and disable certain parts as well as adjust some settings there was not a lot that could be changed. For a time this was all that was needed but as developers started to push the GPU further and demand more advanced features it became apparent that this pipeline had to become more flexible. The solution was to make certain parts of the pipeline programmable. Developers could now write there own programs to perform certain parts of the pipeline. These new programs became know as shaders (there is a technical distinction between programs and shaders but that’s another story).
This new programmable pipeline opened up whole new possibilities and without it we would not have the quality of graphic you see in the previous generation of games. While the pipeline was originally only developed for creating graphics the new flexibility meant that the GPU could now be used to process many types of algorithms. Researches quickly began to modify algorithms to run in the multi-threaded environment of the GPU and areas as diverse as physics, finance, mathematics, medicine and many more began to use the GPU to process their data. The raw power of the GPU was too hard to resist and GPGPU or General Purpose Graphical Processing Unit programming was born.
General Purpose Graphical Processing Unit programming
GPGPU quickly began to go mainstream and enter industrial use. The was still one problem however. Graphics API’s were still tied down to the graphics pipeline. It didn’t matter how much flexibility shaders gave you over a certain stage of the pipeline they still had to conform to the restrictions of the pipeline. Vertex shaders still have to output vertices and fragment shaders still have to output pixels. While great strides and creative work arounds were developed it soon became apparent that things had to change if GPGPU was to develop further. A API was needed that would free developers from the shackles of the graphics pipeline and provide a environment where the raw power of the GPU could be harnessed in a non-graphics related setting.
Over the next few years GPGPU API’s started to appear and currently developers are spoilt for choice between CUDA, OpenCL and DirectCompute. These new API’s presented a whole new way to work with the GPU and are no longer bound to the traditional graphics pipeline.
While the desire for GPGPU was to provide a way to work with the GPU in a non-graphics related setting the games industry were naturally just as eager to make use of the new capabilities. As games have become more realistic they have started to simulate real world physics. These computations are often complicated and demanding to process. The power and flexibility offered by GPGPU has allowed these computations to be performed in ever greater detail and scope. The origin graphics API’s opened up a whole new world in 3D games which we now all take for granted. The new GPGPU API’s are now doing the same thing and bringing in a whole new era in gaming. This current generation of gaming will be dominated by the uses of these API’s resulting in ever more realistic graphics and believable physics.
All you need to do now is learn how to use them. Stay tuned for the next part which will show you how to set up the kernels and how the tiling of threads works.