Category: All

DirectCompute tutorial for Unity 5: Append buffers

In today’s tutorial I will be expanding on the topic of buffers covered in the last tutorial by introducing append buffers. These buffers offer greater flexibility over structured buffers by allowing you to dynamically increase their size during run time from a compute or Cg shader.

(append buffer特点是灵活,size可变)

 

The great thing about these buffers is that they can still be used as structured buffers which makes the actually rendering of their contents simpler. Start of by creating a new shader and pasting in this code. This is what we will be using to draw the points we add into the buffer. Notice the buffer is just declared as a structured buffer.

(使用方式和standard一致,下面是shader代码)

 

 

Now lets make a script to create our buffers. Create a new C# script, paste in this code and then drag it onto the main camera in a new scene.

(下面是C#代码)

 

 

Notice this line here…

 

 

This is the creation of a append buffer. It must be of the type “ComputeBufferType.Append” unsurprisingly. Notice we still pass in a size for the buffer (the width * width parameter). Even though append buffers need to have their elements added from a shader they still need to have a predefined size. Think of this as a reserved area of memory that the elements can be added to. This also raises a subtle error that can arise which I will get to later.

(创建append buffer,这里要注意的是还是要设置一个初始大小。)

 

The append buffer starts of empty and we need to add to it from a shader. Notice this line here…

 

Here we are running a compute shader to fill our buffer. Create a new compute shader and paste in this code.

(我们用一个shader来给这个buffer填充数据)

 

 

This will fill our buffer with a position for each thread that runs. Nothing fancy. Notice this line here…

 

 

The “Append(pos)” is the line that actually adds a position to the buffer. Here we are only adding a position if the x and y dispatch id are even numbers. This is why append buffers are so useful. You can add to them on any conditions you wish from the shader.

(这里注意:Append方法就是在buffer里面加一个position。这里就是当x,y超出范围的时候采用这个方法来添加。)

 

The dynamic contents of a append buffer can cause a problem when rendering however. If you remember from last tutorial we rendered a structured buffer using this function…

(这里buffer的动态增长会导致一些问题)

 

Unity’s “DrawProcedual” function needs to know the number of elements that need to be drawn. If our append buffers contents are dynamic how do we know how many elements there are?

(Graphics.DrawProcedural方法需要提前知道buffer size,但是上面的append buffer size是不确定的。)

 

To do that we need to use a argument buffer. Take a look at this line from the script…

(因此我们用到了argument buffer)

 

Notice it has to be of the type “ComputeBufferType.DrawIndirect“, it has 4 elements and they are integers. This buffer will be used to tell Unity how to draw the append buffer using the function “DrawProceduralIndirect“.

(这个buffer会告诉unity buffer的最终大小,绘制方法要改用DrawProceduralIndirect方法)

 

These 4 elements represent the number of vertices, the number of instances, the start vertex and the start instance. The number of instances is simply how many times to draw the buffer in a single shader pass. This would be useful for something like a forest where you have a few trees that need to be drawn many times. The start vertex and instance just allow you to adjust where to draw from.

(DrawProceduralIndirect的参数分别是:instance数量,instance大小,开始顶点,开始instance。这种方式特别适合绘制森林这样的少量对象绘制多次的情况,每次只需要位置变化。)

 

These values can be set from the script. Notice this line here…

(argBuffer数值设置方法一,一般buffer填充四个值再copy给argBuffer)

 

 

Here we are just telling Unity to draw one instance of our buffer starting from the beginning. Its the first number that is the important one however. Its the number of vertices and see its set to 0. This is because we need to get the exact number of elements that are in the append buffer. This is done with the following line…

 

This copies the number of elements in the append buffer into our argument buffer. At this stages its important to make sure everything’s working correctly so we will also get the values in the argument buffer and print out the contents like so…

(第二种方式就是直接赋值,Debug.Log告诉你这四个值啥含义。)

 

Now run the scene, making sure you have bound the shader and material to the script. You should see the vertex count as 256. this is because we ran 1024 threads with the compute shader and we added a position only for even x and y id’s which ends up being the 256 points you see on the screen.

(跑结果,会得到vertex count达到了256.也就是执行了1024线程)

 

Now remember I said there was a subtle error that can arise and it has to do with the buffers size? When the buffer was declared we set its size as 1024 and now we have entered 256 elements into it. What would happen if we added more that 1024 elements? Nothing good. Appending more elements than the buffers size causes a error in the GPU’s driver. This causes all sorts of issues. The best thing that could happen is that the GPU would crash. The worst is a persistent error in the calculation of the number of elements in the buffer that can only be fixed by restarting Unity and means that the buffer will not be draw correctly with no indication as to why. Whats worse is that since this is a driver issue you may not experience the issue the same way on different computers leading to inconstant behavior which is always hard to fix.

(注意append buffer有最大限制,就是看当前支持多少线程并行,像这里大于1024就会挂。)

 

This is why I have printed out the number of elements in the buffer. You should always check to make sure the value is within the expected range.

(使用的时候最好先检查一下GPU支持情况)

 

Once you have copied the vertex count to the argument buffer you can then draw the buffer like so…

(使用了argument buffer后你就可以用下面的函数绘制结果了)

 

Your not limited to filling a compute buffer from a compute shader. You can also fill it from a Cg shader. This is useful if you want to save some pixels from a image for some sort of further post effect. The catch is that since you are now using the normal graphics pipeline you need to play by its rules. You need to output a fragment color into a render texture even if you don’t want to. As such if you find yourself in that situation it means that whatever your doing is probably better done from a compute shader. Other than that the process works much the same but like everything there are a few things to be careful of.

(另一种新的写法如下:)

 

Create a new shader and paste in this code…

 

 

Notice that the fragment shader is much the same as the compute shader used previously. Here we are just adding a position to the buffer for each pixel rendered if the uv (as the id) is a even number on both the x and y axis. Don’t ever try and create a material from this shader in the editor! The reason is that Unity will try and render a preview image for the material and will promptly crash. You need to pass the shader to a script and create the material from there.

(和原来很像,但是增加了每个pixel的buffer的position)

 

Create a new C# script and paste in the following code…

 

 

Again you will see that this is very much like the previous script using the compute shader. There are a few differences however.

(你会发现和之前的代码很像,略有区别)

 

 

Instead of the compute shader dispatch call we need to use Graphics blit and we need to bind and unbind the buffer. We also need to provide a render texture as the source destination for Graphics blit. This makes the process Pro only unfortunately.

(不再采用dispatch方法,而是采用Graphics的方法来替代。Graphics blit就是来生成render texture)

 

Attach this script to the main camera in a new scene and run the scene after binding the material and shader to the script. It should look just like the previous scene (a grid of points) but they will be blue.

(跑这个新代码会得到和原来一样的效果。)

 

 

DirectCompute tutorial for Unity 4: Buffers

In DirectCompute there are two types of data structures you will be using, textures and buffers. Last tutorial covered textures and today I will be covering buffers. While textures are good for data that needs filtering or mipmaping like color information, buffers or more suited to representing information for vertices like position or normals. Buffers can also easily send or retrieve data from the GPU which is a feature that’s rather lacking in Unity.

(你可选的数据结构除了texture,还有buffer。texture在你的数据需要filtering/mipmapping的时候特别好用,buffer适用于存储点信息比如position/normals)

 

There are 5 types of buffers you can have, structured, append, consume, counter and raw. Today I will only be covering structured buffers because this is the most commonly used type. I will try and cover the others in separate tutorials as they are a little more advanced and I need to cover some of the basics first.

(有五种类型的buffer你可以用:structured, append, consume, counter and raw,这一篇只讲structured buffer)

 

So let’s start of by creating a C# script, name it BufferExample and paste in the following code.

t003 - 01

 

We will also need a material to draw our buffer with so create a normal Cg shader, paste in the follow code and create a material out of it.

(创建一个material赋予下面的shader)

t003 - 02

 

Now to run the scene attach the script to the main camera node and bind the material to the material attribute on the script. This script has to be attached to a camera because to render a buffer we need to use the “OnPostRender” function which will only be called if the script is on a camera (EDIT – turns out you can use “OnRenderObject” if you don’t want the script attached to a camera). Run the scene and you should see a number of random red points.

(把上面的c#代码赋给主相机,代码的material挂上上面建的cameraMaterial)

t003 - 03

 

(下面讲解一下代码)

Now notice the creation of the buffer from this line.

 

The three parameters passed to the constructor are the count, stride and type. The count is simply the number of elements in the buffer. In this case 1024 points. The stride is the size in bytes of each element in the buffer. Since each element in the buffer represents a position in 3D space we need a 3 component vector of floats. A float is 4 bytes so we need a stride of 4 * 3. I like to use the sizeof operator as I think it makes the code more clear. The last parameter is the buffer type and it is optional. If left out it creates a buffer of default type which is a structured buffer.

(三个参数:elements数量,每个elements大小,类型默认就是structured buffer)

 

Now we need to pass the positions we have made to the buffer like so.

(传入数据)

 

This passes the data to the GPU. Just note that passing data to and from the GPU can be a slow processes and in general it is faster to send data than retrieve data.

 

Now we need to actually draw the data. Drawing buffers has to be done in the “OnPostRender” function which will only be called if the script is attached to the camera. The draw call is made using the “DrawProcedural” function like so…

(在OnPostRender函数里绘制数据)

 

There are a few key points here.

The first is that the materials pass must be set before the DrawProcedural call. Fail to do this and nothing will be drawn. You must also bind the buffer to the material but you only have to do this once, not every frame like I am here. Now have a look at the “DrawProcedural” function. The first parameter is the topology type and in this case I am just rendering points but you can render the data as lines, line strips, quads or triangles. You must however order them to match the topology. For example if you render lines every two points in the buffer will make a line segment and for triangles every three points will make a triangle. The next two parameters are the vertex count and the instance count. The vertex count is just the number vertices you will be drawing, in this case the number of elements in the buffer. The instance count is how many times you want to draw the same data. Here we are just rendering the points once but you could render them many times and have each instance in a different location.

(材质设置必须在call DrawProcedural 函数之前,不然绘不出来。)

(DrawProcedural函数参数:拓扑结构类型;vertex count;instance count)

 

(渲染线的效果如下:)

t003 - 04

Now for the material. This is pretty straight forward. You just need to declare your buffer as a uniform like so…

(对于material就直接声明structurebuffer来使用吧)

 

Since buffers are only in DirectCompute you must also set the shader target to SM5 like so…

(shader版本要求)

 

The vertex shader must also have the argument “uint id : SV_VertexID“. This allows you to access the correct element in the buffer like so…

(vertex shader存在SV_VertexID这个参数可以让你很容易的进入目标element)

 

Buffers are a generic data structure and don’t have to be floats like in this example. We could use integers instead. Change the scripts start function to this…

(buffer支持多种数据结构,但是比较推荐使用int,下面就是float转int后使用的例子)

 

 

and the shaders uniform declaration to this…

You could even use a double but just be aware that double precision in shaders is still not widely supported on GPU’s although its common on newer cards.

(buffer你可以使用double,但目前GPU对此支持还不够好,建议不要用)

 

You are not limited to using primitives like float or int. You can also use Unity’s Vectors like so…

(你还可以使用unity vectors数据结构!!!)

 

 

With the uniform as…

You can also create you own structs to use.  Change your scripts start function to this with the struct declaration above…

(还可以使用自定义数据结构)

 

 

and the shader to this…

 

This will draw each point like before but now they will also be random colors. Just be aware that you need to use a struct not a class.

(shader中要注意要使用struct关键字而不是class来建结构)

 

When using buffers you will often find you need to copy one into another. This can be easily done with a compute shader like so…

(buffer拷贝)

 

 

You can see that buffer1 is copied into buffer2. Since buffers are always 1 dimensional it is best done from a 1 dimension thread group.

(拷贝的时候注意维度要一样)

 

Just like textures are objects so to are buffers and they have some functions that can be called from them. You can use the load function to access a buffers element just like with the subscript operator.

(buffer使用和texture一样有其他方法使用:load)

 

Buffers also have a “GetDimension” function. This returns the number of elements in the buffer and their stride.

(buffer使用和texture一样有其他方法使用:getdimensions)

 

 

 

 

 

DirectCompute tutorial for Unity 3: Textures

The focus of days tutorial is on textures. These are arguably the most important feature when using DirectCompute. It is highly likely that every shader you write will use at least one texture. Unfortunately render textures in Unity are Pro only so this tutorial covers Pro only topics. The good new is that this will be the only Pro only tutorial. Textures in DirectCompute are very simple to use but there are a few traps you can fall into. Lets start with something simple. Create a compute shader and paste in the following code.

(使用简单,但是注意只有unity pro版本支持此功能)

 

(kernel例子如下)

t002-01

 

And then a C# script called TextureExample and paste in the following code.

t002-02

 

Attach the script, bind the shader and run the scene. You should see a texture with the uv’s displayed as colors. This is what we have written into the texture with the compute shaders.

t002-03

 

Now notice the kernel argument “uint2 id : SV_DispatchThreadID” from last tutorial. This is the threads position in the groups of threads and just like with buffers we need this to know what location to write the results into the texture, like this “tex[id] = result“. This time however we don’t need the “flattened” index like with buffers. We can just use the uint2 directly. This is because unlike buffers textures can be multidimensional. We have a 2D thread group and a 2D texture.

(使用2D线程组,2D纹理的时候,uint2就够用了)

 

Now look at the declaration of the texture variable “RWTexture2D<float4> tex;“. The “RWTexture2D” part is important. This is obviously a texture but whats with the RW? This declares that the texture is of the type “unordered access view”. This just means that you can write to any location in the texture from the shader. You may be able to write to the texture but you can not read from it.  Remove the RW part and its just a normal texture but you cant write to it. Just remember that you can only read from a Texture2D and only write to RWTexture2D.

(RWTexture2D和Texture2D区别:RW代表你可以在这个shader里面写这张texture的任意位置但是不能读,没有RW就是普通的Texture只能读对应位置。)

 

Now lets look at the script. Notice how the texture is created.

 

There are two important things here. The first is the “enableRandomWrite”. You must set this to true if you want to write into the texture. This basically says that the texture can have unordered access views. If you don’t do this nothing will happen when you run the shader and Unity wont give you a error. It will just fail for no apparent reason. The second is the “Create” function call. You must call create on the texture before you write into it. Again, if you don’t nothing will happen and you wont get a error. It just wont work. If you are use to writing into a texture with graphics blit then you may notice you don’t have to call create. This is because graphics blit checks if the texture is created and creates it if its not. The dispatch function cant do this because it does not know what textures are being written into when it is called.

(enableRandomWrite:表示你可以写texture。然后你必须先调用create方法然后才能使用这张texture。)

 

Also note that the texture is released in the “OnDestroy” function. When you are finished with your render textures make sure you release them.

(纹理需要在OnDestroy方法里面释放)

 

Now lets look at the dispatch call.

 

Remember from the last tutorial that this is where we set how many groups to run. So why are the number of groups to run the texture size divide by 8? Look at the shader. You can see we have 8 threads per group running from the “[numthreads(8,8,1)]” line. Now we need a thread for each pixel in the texture to run. So we have a texture 64 pixels wide and if we divide the pixels by the number of threads per group we get the number of groups we will need. We end up with 8 groups of 8 threads which is 64 threads in total in the x dimension and its the same for the y dimension. This gives us a total of 4096 threads running (64 * 64) which is the number of pixels in the texture.

(线程解释,同上一篇,这边不再细说)

 

Now have a look at this part of the script.

 

This is where we are setting the uniforms for the shader. We have to set the texture we are writing into and we also need the textures width and height so we can calculate the uv’s from the dispatch id. Its common to pass variables into a shader like this but in this case we don’t need the width and height. We can get it from within the shader. Change your shader to this…

(设置shader全局量,这里包括纹理的宽高和纹理,其实可以从传入的texture来获得,不需要设置,例子如下:)

t002-04

 

and remove these two lines from your script…

 

t002-05

 

Run the scene. It should look the same. Notice the “tex.GetDimensions(w, h);” line. Textures are objects. This means they have functions you can call from them.

(得到相同结果,GetDimensions函数的作用就是获得texture的维度)

 

In this case we are asking for the textures dimensions. Textures have a number of functions with a number of overloads you can call. I will go through the most common and how to use them but first we need to change the scene a bit. What we want to do now is copy the contents of our texture into another texture and then display that result.

(texture相关的多种使用方式举例:)

t002-07

 

And change the script to this…

t002-08

 

Now bind the new shader you have created to the “shaderCopy” attribute and run the scene. The scene should look the same as before. What we are doing here is to fill the first texture with its uv’s as a color like before and then we are copying the contents to another texture. This is so I can demonstrate the many ways to sample from a texture in a shader. Notice this line “float4 t = tex[id];” in the copy shader. This is the simplest way to sample from a texture. Just like the dispatch id is the location you want to write to it is also the location you want to read from. You can see here that we can sample from a texture just like it was a array.

(texture拷贝的shader实现)

 

There are other ways of doing this. For example…

 

Here we are accessing the mipmaps of the texture instead. Level 0 would be the first mipmap, the one with the same size as the texture. The next dimension in the mipmap array is the location to sample at using the dispatch id. Just note that we have not enabled mipmaps on the texture so the result of sampling levels other than 0 has no effect.

(采用mipmap的值,0代表不压缩的一层)

 

We can do the same with the textures load function.

(相同功能,不同写法)

 

In this case the uint3’s x and y value is the location to sample at and the z value (the 0) is the mipmap level. Both these examples work the same.

 

There is one feature of textures that you will be using a lot. The textures ability to filter and wrap. To do this you need to use a sampler state. If you are use to using HLSL out side of Unity you may know that you need to create a sampler object to use. This works a bit differently in Unity. Basically you have two choices for filtering (Linear or Point) and two for wrap (Clamp or Repeat). All you need to do is declare a sampler state in the shader with the word Linear or Point in the name and the word Repeat or Clamp in the name. For example you could use the name myLinearClamp or aPointRepeat, etc. I prefer a underscore then the name. Change you shader to this.

(采用texture的filter和warp功能,你需要用到sampler state,下面是例子)

 

 

If you run the scene it should still look the same. Notice this line “float4 t = tex.SampleLevel(_LinearClamp, uv, 0);“. Here we are using the textures SampleLevel function. The function takes a sampler state, the uv’s and a mipmap level. The uv’s need to be normalized, that is have a range of 0 to 1. Notice the SamplerState variables at the top. If you are using sampler states you are mostly likely wanting to have bilinear filtering. If that’s the case use the _LinearClamp or _LinearRepeat sampler state.

(注意sampleLevel的使用方式,你还是会看到相同的结果)

(sample方法的定义与HLSL一样,建议参考学习HLSL的texture取sample的方法就知道了,作用主要是定义超出texture边界的区域如何取sample,比如clamp意思是整体扩充重复,repeat就是边界点重复)

 

If you are use to using HLSL in a fragment shader (as opposed to a compute shader here) you may know that you can use this function for filtering.

(HLSL的写法在这里是不可用的!)

 

Notice its called Sample not SampleLevel and the mipmap parameter is gone. If you try to use this in a compute shader you will get a error because this function does not exists. The reason why is surprisingly complicated and gives a insight into how the GPU works. Behind the scenes fragments shaders (or any shader) work in much the sample why as a compute shader as they share the same GPU architecture. They run in threads and the threads are arranged into thread groups. Now remember that threads in a group can share memory. Fragment shaders always run in a group of at least 2 by 2 threads. When you sample from a texture the fragment shader checks what its neighbors uv’s are. From this it can work out what the derivatives of the uv’s are. The derivatives are just a rate of change and in areas where there is a high rate of change the textures higher mipmaps are used and in areas of low rate of change the lower mipmaps are used. This is how the GPU reduces aliasing problems and it also has the handy byproduct of reducing memory bandwidth (the higher mipmaps are smaller).

(为了统一和兼容,以及使用mipmap)

 

All the examples I have given are in 2D but the same principles apply in 3D. You just need to create the texture a little differently.

(上面都是2D的例子,下面举3D的例子)

 

This creates a 3D texture that is 64*64*64 pixels. Notice the “volumeDepth” is set to 64 and the “isVolume” is set to true. Remember to set the number of groups and threads to the correct values. The dispatch id also needs to be a uint3 or int3;

(创建一个3D纹理,执行3D shader)

 

and…

 

That about covers it for textures. Next time I look at buffers. How to use the default buffer, the other buffer types, how to draw data from your buffers and set/get your buffer data.

 

(最后补充一下与上面译文无关的就是:shader里面其实定义采用的是rendertexture, texture2D, texture3D的父类texture。因此在shader里面这些只要是texture子类的纹理都是可以直接使用的。这点很重要,你可以采用导入的texture来在compute shader里面做计算。)

DirectCompute tutorial for Unity 2: Kernels and thread groups

Today I will be going over the core concepts for writing compute shaders in Unity.

 

At the heart of a compute shader is the kernel. This is the entry point into the shader and acts like the Main function in other programming languages. I will also cover the tiling of threads by the GPU. These tiles are also known as blocks or thread groups. DirectCompute officially refers to these tiles as thread groups.

【基本的计算单元是 kernel】

 

To create a compute shader in Unity simply go to the project panel and then click create->compute shader and then double click the shader to open it up in Monodevelop for editing. Paste in the following code into the newly created compute shader.

【unity 创建kernel】

t001 - 01

 

This is the bare minimum of content for a compute shader and will of course do nothing but will serve as a good starting point. A compute shader has to be run from a script in Unity so we will need one of those as well. Go to the project panel and click Create->C# script. Name it KernelExample and paste in the following code.

【C# 代码来执行kernel】

t001 - 02

 

Now drag the script onto any game object and then attach the compute shader to the shader attribute. The shader will now run in the start function when the scene is run. Before you run the scene however you need to enable dx11 in Unity. Go to Edit->Project Settings->Player and then tick the “Use Direct3D 11” box. You can now run the scene. The shader will do nothing but there should also be no errors.

【unity场景中如何使用】

t001 - 03

 

In the script you will see the “Dispatch” function called. This is responsible for running the shader. Notice the first variable is a 0. This is the kernel id that you want to run. In the shader you will see the “#pragma kernel CSMain1“. This defines what function in the shader is the kernel as you may have many functions (and even many kernels) in one shader. There must be a function will the name CSMain1 in the shader or the shader will not compile.

【解释一下Dispatch函数就是来执行computeshader的,一个computeshader可以包含很多kernel,可以按照kernel名字取用。】

 

Now notice the “[numthreads(4,1,1)]” line. This tells the GPU how many threads of the kernel to run per group. The 3 numbers relate to each dimension. A thread group can be up to 3 dimensions and in this example we are just running a 1 dimension group with a width of 4 threads. That means we are running a total of 4 threads and each thread will run copy of the kernel. This is why GPU’s are so fast. They can run thousands of threads at a time.

【numthreads这个用来标识逻辑的X*Y*Z的数据结构与线程的对应关系,三个值积就是一次执行这个kernel的线程总数(线程组)】

 

Now lets get the kernel to actually do something. Change the shader to this…

t001 - 04

 

and the scripts start function to this…

t001 - 05

 

Now run the scene and you should see the numbers 0, 1, 2 and 3 printed out.

 

Don’t worry too much about the buffer for now. I will cover them in detail in the future but just know that a buffer is a place to store data and it needs to have the release function called when you are finished with it.

【buffer先不用管后面讲】

 

Notice this argument added to the CSMain1 function “int3 threadID : SV_GroupThreadID“. This is a request to the GPU to pass into the kernel the thread id when it is run. We are then writing the thread id into the buffer and since we have told the GPU we are running 4 threads the id ranges from 0 to 3 as we see from the print out.

【SV_GroupThreadID用到的GPU线程标识,这个computeshader结果就是写出这个标识】

 

Now those 4 threads make up whats called a thread group. In this case we are running 1 group of 4 threads but you can run multiple groups of threads. Lets run 2 groups instead of 1. Change the shaders kernel to this…

【可以采用不止一组GPU线程来跑这个kernel】

t001 - 06

 

and the scripts start function to this…

t001 - 07

 

Now run the scene and you should have 0-3 printed out twice.

 

Now notice the change to the dispatch function. The last three variables (the 2,1,1) are the number of groups we want to run and just like the number of threads groups can go up to 3 dimensions and in this case we are running 1 dimension of 2 groups. We have also had to change the kernel with the argument “int3 groupID : SV_GroupID” added. This is a request to the GPU to pass in the group id when the kernel is run. The reason we need this is because we are now writing out 8 values, 2 groups of 4 threads. We now need the threads  position in the buffer and the formula for this is the thread id plus the group id times the number of threads ( threadID.x + groupID.x*4 ).

【上面的做法是执行两组线性的kernel操作的概念】

 

This is a bit awkward to write. Surely the GPU knows the threads position? Yes it does. Change the shaders kernel to this and rerun the scene.

【但是这样的写法对于kernel来说,外面用多少组是不透明的,不好,改进如下】

t001 - 08

 

The results should be the same, two sets of 0-3 printed. Notice that the group id argument has been replaced with “int3 dispatchID : SV_DispatchThreadID“. This is the same number our formula gave us except now the GPU is doing it for us. This is the threads position in the groups of threads.

【结果是一样的,但是shader使用的组数对sheder来说透明了】

 

So far these have all been in 1 dimension. Lets step thing up a bit and move to 2 dimensions and instead of rewriting the kernel lets just add another one to the shader. Its not uncommon to have a kernel for each dimension in a shader performing the same algorithm. First add this code to the shader below the previous code so there are two kernels in the shader.

【下一步:使用2D kernel来替代执行1D kernel两遍】

t001 - 09

 

and the script to this…

t001 - 10

 

Run the scene and you will see a row printed from 0 to 7 and the next row 8 to 15 and so on to 63.

t001 - 11

 

Why from 0 to 63? Well we now have 4 2D groups of threads and each group is 4 by 4 so has 16 threads. That gives us 64 threads in total.

 

Notice what value we are out putting from this line “int id = dispatchID.x + dispatchID.y * 8“. The dispatch id is the threads position in the groups of threads for each dimension. We now have 2 dimension so we need the threads global position in the buffer and this is just the dispatch x id plus the dispatch y id times the total number of threads in the first dimensions (4 * 2). This is a concept you will have to be familiar with when working with compute shaders. The reason is that buffers are always 1 dimensional and when working in higher dimension you need to calculate what index the result should be written into the buffer at.

【线程解释:[numthreads(4,4,1)]表示一个group16个线程,computershader.Dispatch(kernel, 2, 2, 1);表示开启了2维的4个线程组,因此一共64个线程。】

 

The same theory applies when working with 3 dimensions but as it gets fiddly I will only demonstrate up to 2 dimensions. You just need to know that in 3 dimensions the buffer position is calculated as “int id = dispatchID.x + dispatchID.y * groupSizeX + dispatchID.z * groupSizeX * groupSizeY” where group size is the number of groups times number of threads for that dimension.

【相同的概念扩展到3维】

 

You should also have a understanding of how the semantics work. Take for example this kernel argument…

【你还需要了解相关语义】

 

int3 dispatchID : SV_DispatchThreadID

 

SV_DispatchThreadID is the semantic and tells the GPU what value it should pass in for this argument. The name of the argument does not matter. You can call it what you want. For example this argument works the same as above.

【前面是变量名,随便改,下面的例子就是和上面功能一样。后面是标识语义,告诉GPU是什么意义的值】

 

Also the variable type can be changed. For example…

See the int3 has been changed to int. This is fine if you are only working with 1 dimension. You could also just use a int2 for 2 dimensions and you could also use a unsigned int (uint) instead of a int if you choose.

【int3可表示三维,int使用在一维的情况下是ok的】

 

Since we now have two kernels in the shader we also need to tell the GPU what kernel we want to run when we make the dispatch call. Each kernel is given a id in the order they appear. Our first kernel would be id 0 and the next is id 1. When the number of kernels in a shader becomes larger this can become a bit confusing and its easy to set the wrong id. We can solve this by asking the shader for the kernels id by name. This line here “int kernel = shader.FindKernel (“CSMain2”);” gets the id of kernel “CSMain2“. We then use this id when setting the buffer and making the dispatch call.

【一个computeshader多个kernel的情况下dispatch方法需要用ID来标识使用的kernel,可以使用FindKernel 来标识。】

 

About now you maybe thinking that this concept of groups  of threads is a bit confusing. Why cant I just use one group of threads? Well you can but just know that there is a reason that threads are arranged into groups by the GPU. For a start a thread group is limited by the number of threads it can have ( defined by the line “[numthreads(x,y,z)]” in the shader). This limit is currently 1024 but may change with new hardware. For example you can have a maximum of “numthreads(1024,1,1)” for 1D, “numthreads(32,32,1)” for 2D and so on. You can however have any number of groups of threads and as you will often be processing data with millions of element the concept of thread groups is essential. Threads in a groups can also share memory and this can be used to make dramatic performance gains for certain algorithms but I will cover that in a future post.

【可以一直只用一个线程一个组,但是线程组共享内存而且性能可以显著提升,另外要注意最大总和不要超过1024个线程】

 

Well I think that about covers kernels and thread groups. There is just one more thing I want to cover. How to pass uniforms into your shader. This works the same as in Cg shaders but there is no uniform key word. For the most part this relatively simple but there are a few “Gotcha’s” so I will briefly go over it.

【最后一点:如何传全局量给shader】

 

For example if you want to pass in a float you need this line in the shader…

【数值的例子】

and this line in your script…

To set a vector you need this in the shader…

and this in the script…

You can only pass in a Vector4 from the script but your uniform can be a float, float2, float3 or float4. It will be filled with the appropriate values.

 

Now here’s where it gets tricky. You can pass in arrays of values. Note that this first example wont work. I will explain why.  You need this line in your shader…

【数组的例子】

and this in your script…

Now this wont work. Whether this is by design or a bug in Unity I don’t know. You need to use vectors as uniforms for this to work. In your shader…

and your script…

So here we have a array of two float4’s and it is set from a array of 8 floats from a script. The same principles apply when setting matrices. In your shader…

and your script…

And of course you can have arrays of matrices. In your shader…

and your script…

This same logic does not seem to apply to float2x2 or float3x3. Again, whether this is a bug or design I don’t know.

DirectCompute tutorial for Unity 1: Introduction

unity的GPGPU的使用方式computeshader真的非常让人震惊,代码已经不再向openCL,CUDA一样的难读难写难于维护,整体上已经非常好用了。
在这个基础上,等待显卡这两年的跨越式发展后,游戏中采用GPU的并行计算能力会变得非常普及。
这样以后对于原来CPU上面的所有的for/while/foreach等操作都可以进行GPU封装,让用户可以无意识的,或者简单选择的使用GPU并行计算能力。

原文地址:https://scrawkblog.com/category/directcompute

下面开始是教程正文翻译,不想看英文只需看中文总结就可以:


At this time Unity just started to support Microsofts DirectX 11 and with DirectX 11 came the DirectCompute API. This API opens up a whole new way to use the GPU by writing compute shaders.

【Unity GPGPU 很好用,但是没有教程让使用者非常难受,因此作者准备搞一发。】

 

I hope to start of with a introduction about why DirectCompute was needed and how it differs to the traditional graphics pipeline, how to write the kernels via compute shaders, how the internal GPU tiling of threads works, how to access textures, how to use the various buffers types, how to use thread synchronization and shared memory and final how to maximize performance.

 

The Graphics pipline

 

All developers would need to do was send the geometry that they needed to render to the GPU and OpenGL would pass it through the pipeline until the end output was pixels displayed on the screen. This pipeline was fixed and although developers could enable and disable certain parts as well as adjust some settings there was not a lot that could be changed.  For a time this was all that was needed but as developers started to push the GPU further and demand more advanced features it became apparent that this pipeline had to become more flexible. The solution was to make certain parts of the pipeline programmable. Developers could now write there own programs to perform certain parts of the pipeline. These new programs became know as shaders (there is a technical distinction between programs and shaders but that’s another story).

【shader作用】

 

This new programmable pipeline opened up whole new possibilities and without it we would not have the quality of graphic you see in the previous generation of games.  While the pipeline was originally only developed for creating graphics the new flexibility meant that the GPU could now be used to process many types of algorithms. Researches quickly began to modify algorithms to run in the multi-threaded environment of the GPU and areas as diverse as physics, finance, mathematics, medicine and many more began to use the GPU to process their  data. The raw power of the GPU was too hard to resist and GPGPU or General Purpose Graphical Processing Unit programming was born.

【发展出GPGPU】

 

General Purpose Graphical Processing Unit programming

 

GPGPU quickly began to go mainstream and enter industrial use. The was still one problem however. Graphics API’s were still tied down to the graphics pipeline. It didn’t matter how much flexibility shaders gave you over a certain stage of the pipeline they still had to conform to the restrictions of the pipeline. Vertex shaders still have to output vertices and fragment shaders still have to output pixels. While great strides and creative work arounds were developed it soon became apparent that things had to change if GPGPU was to develop further. A API was needed that would free developers from the shackles of the graphics pipeline and provide a environment where the raw power of the GPU could be harnessed in a non-graphics related setting.

【VS,PS的结构限制了GPGPU的使用方式】

 

Over the next few years GPGPU API’s started to appear and currently developers are spoilt for choice between CUDA, OpenCL and DirectCompute. These new API’s presented a whole new way to work with the GPU and are no longer bound to the traditional graphics pipeline.

【GPGPU API出现解决上面的问题】

 

While the desire for GPGPU was to provide a way to work with the GPU in a non-graphics related setting the games industry were naturally just as eager to make use of the new capabilities. As games have become more realistic they have started to simulate real world physics.  These computations are often complicated and demanding to process. The power and flexibility offered by GPGPU has allowed these computations to be performed in ever greater detail and scope. The origin graphics API’s opened up a whole new world in 3D games which we now all take for granted. The new GPGPU API’s are now doing the same thing and bringing in a whole new era in gaming. This current generation of gaming will be dominated by the uses of these API’s resulting in ever more realistic graphics and believable physics.

【开始在游戏中考虑GPGPU】

 

All you need to do now is learn how to use them. Stay tuned for the next part which will show you how to set up the kernels and how the tiling of threads works.