DirectX10 Tutorial 9: The Geometry Shader

This tutorial is going to cover the basics of using the geometry shader stage present in DX10+. The geometry stage is extremely useful for rendering sprites, billboards and particles systems. This is the first part of a three part series which will cover geometry shaders, billboards and particles systems.

The Geometry Shader

The geometry shader stage was introduced in DX10 and people initially assumed that it would be useful for tessellation purpose (which is true) but it’s more useful for use with particles systems and sprite rendering. The geometry stage sits between the vertex and pixel shader stages and its primary use is creating new primitives from existing ones.

Just to recap, vertices are sent to the vertex shader within a vertex buffer that is stored on the GPU.  A draw call issued to the API, sends a vertex buffer down the pipeline.  Each vertex first enters the vertex shader where it is transformed as necessary, and its vertex data is modified (if necessary). Once vertices have been processed and outputted by the vertex shader, they get combined into primitives during the primitive setup stage of the API. The type of primitive created from the vertices sent through the vertex buffer depends on the primitive topology set (points, lines and triangles).  Normally, once a primitive is constructed, it moves on to the screen mapping and fragment generation (convert triangle to pixels) stages before reaching the pixel shader stage and finally being drawn to the screen.

The geometry shader (GS) is an optional shader stage that is present between the vertex shader and the primitive construction stage or simply in-between the vertex and pixel shader stages (in a high level view).  Without a GS present, primitives that exit the vertex shader enter the primitive construction stage after which they move the screen mapping stage (figure1a). When a GS is present, vertices will be sent from the vertex shader into the GS prior to primitive construction. The GS takes a whole primitive in as input; this means the number of vertices the GS accepts from the vertex shader is determined by the primitive topology set. If the topology is set to points then the GS will only accept a single vertex, while for lines and triangles the GS will accept 2 and 3 vertices respectively. The GS then uses the vertex data of the input primitive’s to create new vertices which form new primitives. These newly created vertices will be outputted by the GS as a vertex stream and sent through the primitive construction stage before finally continuing through the pipeline (figure1b). In summary, the GS takes whole primitives as input and outputs whole primitives as a list of vertices.

Figure 1. The geometry shader as part of the graphical pipeline.

Now it may not seem very useful to send primitives into the GS just to get primitives out but when considering that the GS can output multiple primitives from a single primitives it becomes more useful. This means that we can input a single triangle into the GS and get 4 triangles out (which was the basis for performing basic tessellation with the geometry shader, see figure 2). There is no correlation between the input primitive and the output primitive in the GS, so you can easily input a line and output triangles. In this tutorial, we are going to examine the GS for use in rendering sprites as this is a simple example that highlights the benefits of the GS. A sprite is a 2D textured quad which is rendered in screen space (homogenous clip space) so that it is always flattened against the screen. Sprites are heavily used in games to render text, cursors and GUIs.

Figure 2. Primitive construction in the GS


So let’s just discuss sprites a little before we move onto the geometry shader. As mentioned a sprite is simply a textured quad. Sprite dimensions and position are usually controlled by two sets of values, the anchor point position and the sprite dimensions. The anchor point is a point on the sprite that is used to position it, while the dimension relate to the size and width of the sprite. The most common anchor point used for sprites at the moment is the top left vertex of the quad (see figure 3a). So for the most basic of sprites we need these two sets of values and since sprites are positioned in screen space these values are 2 dimensional (x,y) and are measured in pixels. For example if we want a 200×100 pixel banner at the top middle of our 1024×768 game. The anchor point would be at 412px, 0px while the width and heights would be 200px and 100px respectively. Most sprites usually have additional properties such as an opacity level and a z-order (a pseudo-depth value). So a basic sprite struct might be defined as shown in figure 3b. Now it is important to note than even though I said the sprites are positioned using pixel values, in actual fact they are positioned in clip space, the pixel coordinates will need to be converted to homogenous clip space coordinate (which is very simple) but more on that later. For now just remember that the final coordinates sent down the pipeline need to be in homogenous clip space.

Figure 3. Sprite construction and data representation.

Okay, so now we want to render this struct. Normally we would need two triangles to render a quad using a triangle strip to reduce the number of vertices we need. Even so this means we need to render 4 vertices for each sprite. Each vertex will contain a 2D clip space position; 2D texture coordinates as well as the sprite properties such as opacity and z-order. Assuming float values are used, each vertex will take up 24bytes of memory, and each sprite will take up 96bytes of memory. Now an average GUI will be made up of numerous sprites so let’s just guesstimate that an average GUI uses around 100 sprites. This means to render the GUI each frame 9.375 KB needs to be sent to the GPU from the CPU. The transfer of data across the PCIE bus is rather slow so reducing the amount of data transferred per frame will help improve performance. Granted 9KB isn’t a lot but consider the fact that particles systems are usually rendered using billboards (which are similar to sprites in that they are just textured quads) and a single particle emitter might be responsible for several thousand particles. In such cases the memory costs add up. Furthermore it is expensive to have to generate 4 vertices for each sprite on the CPU each frame.

Now since we know we can create new primitives from existing ones in the GS, we can use this to our advantage and render the quad inside of the GS. Now optimally we’d like to send our sprite struct straight to the GS and have it render the two triangles for us. Well you know what, we can do exactly that! Vertices are basically just structs, so we will create a vertex for each sprite struct containing all the necessary sprite data. What data do we need to send to the GS? Well pretty much just the anchor point, the dimensions as well as the sprite properties. Texture coordinates can be calculated in the GS as will be shown later. The size of our sprites is now 24bytes (a 75% reduction in size!!).

But how will we send these vertices down the pipeline so that the GS only gets a single sprite’s data? Well, what primitive uses only a single vertex? A point. If we create a vertex buffer containing the sprite data and send it down the pipeline as a point list, then the GS will receive a single vertex each time from which we can now create our sprites. Before we get to the actual sprite creation, let’s discuss the technical details of the GS.

Programming the Geometry Shader

As mentioned above, the geometry shader (GS) takes a single primitive as its input. This primitive is presented to the GS as an array of its vertices (1 vertex for a point, 2 vertices for a line segment and 3 vertices for a triangle). The GS then outputs primitives as a stream of vertices. There are three types of vertex streams that can be outputted: point streams, line streams and triangle streams. For line and triangle streams, the output topology is always a strip, so the outputted vertices are constructed as if they were line strips or triangle strips. The GS specifies both the input format and the output stream type in its declaration. T GS also requires an upper limit on the number of vertices that it can return (I’m assuming so that it can pre-allocate a vertex buffer for the output vertex stream). This is done as follows:

void GS( point GS_INPUT particles[1], inout TriangleStream<PS_INPUT> triStream )

The maxvertex count defines the upper bound on the number of vertices in the output stream. The first argument specifies the type and number (determined by vertex shader topology) of input vertices that the GS accepts from the vertex shader before starting to execute. The second argument specifies the type of output stream returned as well as the vertex type stored in the output stream. To output a vertex from the GS, vertices are appended to the output stream using the append function:

void GS( point GS_INPUT point[1], inout TriangleStream<PS_INPUT> triStream )

Since the output stream is a strip for both lines and triangles, it may be necessary to end the strip and start a new one. This is useful when trying to render to unconnected triangles that cannot be rendered using a triangle strip (e.g. the triangles outputted in figure 2 require two triangle strips for them to be rendered). Restarting a strip is done with the restartStrip function (simply hey?). For example to render to unconnected triangles you would do the following (assume that the coordinates in V1-V6 are set correctly).

void GS( point GS_INPUT point[1], inout TriangleStream<PS_INPUT> triStream )
	PS_INPUT v1,v2,v3,v4,v5,v6;

And that’s all there is to the GS! The GS outputs vertices in homogenous clip space just like the vertex shader so if you outputting 3D primitives don’t forget to multiply with your world and projection matrices (this will be covered in detail in the next tutorial on billboards).

A Geometry Shader Example: Sprite Rendering

So let’s come back to our simple example of rendering sprites. We create the sprite data in our application and store it within a vertex buffer as follows. I’ve left out all the necessary steps needed to create the input layout as so on (please look at the source code for more details).

SpriteVertex verts[3];
numSprites = 3;

//header is positioned at 312, 0 with dimensions 400x42
verts[0].topLeft[0] = convertPixelsToClipSpace(800,200);
verts[0].topLeft[1] = -convertPixelsToClipSpace(600,0);
verts[0].dimensions[0] = convertPixelsToClipSpaceDistance(800,400);
verts[0].dimensions[1] = convertPixelsToClipSpaceDistance(600,42);
verts[0].opacity = 1;

verts[1].topLeft[0] = convertPixelsToClipSpace(800,0);
verts[1].topLeft[1] = -convertPixelsToClipSpace(600,500);
verts[1].dimensions[0] = convertPixelsToClipSpaceDistance(800,100);
verts[1].dimensions[1] = convertPixelsToClipSpaceDistance(600,100);
verts[1].opacity = 1;

verts[2].topLeft[0] = convertPixelsToClipSpace(800,700);
verts[2].topLeft[1] = -convertPixelsToClipSpace(600,500);
verts[2].dimensions[0] = convertPixelsToClipSpaceDistance(800,100);
verts[2].dimensions[1] = convertPixelsToClipSpaceDistance(600,100);
verts[2].opacity = 0.3;

//create vertex buffer
initData.pSysMem = &verts;

bd.Usage = D3D10_USAGE_DEFAULT;
bd.ByteWidth = sizeof( SpriteVertex ) * (numSprites);
bd.BindFlags = D3D10_BIND_VERTEX_BUFFER;
bd.CPUAccessFlags = 0;
bd.MiscFlags = 0;

if ( FAILED( pD3DDevice->CreateBuffer( &bd, &initData, &pVertexBuffer ) ) ) return fatalError("Could not create vertex buffer!");

Since the coordinates have to be in clip space I have used two simple functions to convert from pixel positions to clip space positions and to convert from pixel distances to clip space distances. The conversion is dependent on the resolution of the frame buffer and simply maps the ranges [0,max screen pixel width] to [-1,1] and [0,max screen pixel height] to [-1,1] (though this needs to be negated as -1 is at the bottom of the screen). The these functions are shown below.

inline float convertPixelsToClipSpace( const int pixelDimension, const int pixels )
	return (float)pixels/pixelDimension*2 -1;

inline float convertPixelsToClipSpaceDistance( const int pixelDimension, const int pixels )
	return (float)pixels/pixelDimension*2;

Our sprite vertex struct is defined as shown; it contains all the data necessary to create our sprite quad in the GS.  The pixel shader input struct is also shown.

	float2 topLeft : ANCHOR;
	float2 dimensions : DIMENSIONS;
	float opacity: OPACITY;

struct PS_INPUT
	float4 p : SV_POSITION;
	float2 t : TEXCOORD;
	float opacity : OPACITY;

The GS simply creates 4 vertices and outputs then as follows, remember that the sprite’s topLeft and dimensions are in homogenous clip space. So all thats necessary is to add them to the topleft position to get the coordinates of the other 3 vertices. We also set the texture coordinates here. Nothing else needs to be done to the vertices and so we simply append them to the output vertex stream.

void GS( point SPRITE_INPUT sprite[1], inout TriangleStream<PS_INPUT> triStream )
	v.opacity = sprite[0].opacity;

	//create sprite quad

	//bottom left
	v.p = float4(sprite[0].topLeft[0],sprite[0].topLeft[1]-sprite[0].dimensions[1],0,1);
	v.t = float2(0,1);

	//top left
	v.p = float4(sprite[0].topLeft[0],sprite[0].topLeft[1],0,1);
	v.t = float2(0,0);

	//bottom right
	v.p = float4(sprite[0].topLeft[0]+sprite[0].dimensions[0],sprite[0].topLeft[1]-sprite[0].dimensions[1],0,1);
	v.t = float2(1,1);

	//top right
	v.p = float4(sprite[0].topLeft[0]+sprite[0].dimensions[0],sprite[0].topLeft[1],0,1);
	v.t = float2(1,0);

The technique now needs to include the geometry shader, this is done as follows:

technique10 RENDER
    pass P0
        SetVertexShader( CompileShader( vs_4_0, VS() ) );
        SetGeometryShader( CompileShader( gs_4_0, GS() ) );
        SetPixelShader( CompileShader( ps_4_0, PS() ) );
        SetBlendState( SrcAlphaBlendingAdd, float4( 0.0f, 0.0f, 0.0f, 0.0f ), 0xFFFFFFFF );

This is pretty much all there is to the geometry shader, there is a lot of power available in using it so dont be afraid to. In the next several tutorials I will cover some more uses for the geometry shader. I’ve included a demo program that implements the above example and shows the basics of rendering sprites with opacity. If you are interested in building a custom DX10 GUI manager the example code is a good starting point.

Figure 4. Demo Application

Download Source Code: GitHub


20 thoughts on “DirectX10 Tutorial 9: The Geometry Shader

  1. Thiago January 16, 2011 / 9:26 pm

    nice tutorial
    Cant download the source code !!! No link
    I want to know how you change the textures ? you have just one sampler and render all sprites that use that texture in one batch, and all sprites that uses other texture in another batch ….

  2. joey zureik January 17, 2011 / 10:38 am

    I working on import 3d models in to direct x 10 I can not wait to see tutorial I used your blog a lot thanks a bunch for all the good information bobby

  3. Rudy January 19, 2011 / 8:01 am

    Hi Bobby,
    Nice tutorials, at least so ive seen so far in tutorial #1.. I know you posted that tutorial a while back so I’m wondering if by any chance can I get the source code for does older tutorials?
    Thanks for your time!

  4. Bobby January 19, 2011 / 8:47 am

    hey guys,

    i havent uploaded the source code for this tutorial since I only have access to the server from work. I’ll try upload it a bit later today.

    @rudy: all the source code for previous tutorials is available!

  5. Thiago January 19, 2011 / 4:23 pm

    i still cant download the source >.<
    error 404

  6. Thiago January 19, 2011 / 4:24 pm

    … i did not read your comment …. sorry

  7. Rudy January 19, 2011 / 5:58 pm

    @bobby the downloads for previous tutorials wasnt working for some reason, tried it this morning n it worled :)…. Btw I was checking out your book reviews n they seem real interesting books but a little too advance, I was wondering if u knew of any good begginers book, im comfortable with cpp but know nothing about graphics, been reading the tuts that come with DirectX10. Was just wondering if theres a book thats super begginer just for me to reassure myself that im understanding truly what im reading!
    Thx once again for your time!

  8. Bobby January 19, 2011 / 6:38 pm

    Ttutorial downloads should all work, they are hosted on my work’s server and so that might be a reason they aren’t working… I wish wordpress would let me upload zip files.

    Anyways I’m trying to find a new cheap host for the files.

    @rudy: I think the best beginner book I’ve encountered is “Introduction to game programming with Directx10”, its far from perfect but it is leaps an bounds better than the other books I’ve encountered.

  9. Justin Grant July 3, 2011 / 4:53 am

    Hey Bobby, thanks for the great tutorial. This was just what I needed to port my particle system from DX9 to DX10. I’m having some issues though with alpha blending. It looks like it’s not blending at all when you are rendering two images on top of each other. Is there a render setting or something I need to set to get this working? The easiest way to see what I’m talking about it by running your example and moving one of the sprites on top of another one.

    • Justin Grant July 3, 2011 / 5:55 am

      Nevermind! I figured it out. All I was missing was setting the AlphaToCoverageEnable flag to true.

  10. ljb August 22, 2011 / 5:56 pm

    nice tutorial,go on!
    how to use D3DX11CreateFFT in dx11

  11. Vidya August 24, 2011 / 1:01 pm

    Thanks a lot Buddy!

    But I am really looking for the tutorial =>

    “How to give the camera mouse control using Windows Raw Input”….

    I would be free from more internet search :))

  12. Michael April 25, 2012 / 2:30 pm

    Hi, thanks for another great tutorial :). I was wondering if you could maybe explain a bit more how the transformation happens onto screen space ( I don’t really need it right now, but its driving me crazy that I don’t understand whats going on 😀 ).

    What exactly happens when you are trying to map a certain texture to a screen at all times ? It sounds more logical for me to map entire 3D objects in a world than it is to have a simple image on my screen :(.


  13. Bobby May 1, 2012 / 3:59 pm

    you dont actually map a texture to the screen, you are mapping the texture onto a quad that you create in the GS. So it’s exactly the same as if you’d have rendered a quad and textured it. The only trick is to orient the quad so that its always facing the camera or in the same position relative to the camera (e.g. when rendering GUIs).

  14. akismet-a83dca4c2ab5a8a4489ff45d7efe8a1f May 15, 2012 / 5:47 pm

    Hi Bobby.
    Thanks for this tutorial, it rely help me out allot.
    Any idea when the next tow parts are coming?

  15. Mohammad Adil June 25, 2012 / 9:51 am

    I am a graphics enthusiast with some experience in offline rendering (ray tracing and stuff). I am new to the GPU programming and looking to implement deferred shading using directx. I am unable to find much resources about deferred shading on the internet. Can you please guide me on how to implement deferred shading on directx

  16. jason July 24, 2012 / 10:30 am

    Hi Bobby,
    Thanks for the great tutorial! But when I use the PS_Input struct in VS(), I got this error:

    Warning Warning: Unknown Input Semantic: ANCHOR0

    This is my code:

    struct psInput
    float4 pos : SV_POSITION;
    float2 t : TEXCOORD;
    float opacity : OPACITY;

    vsInput mainVS(vsInput input)
    return input;

    is there anything wrong with it?

    thanks a lot !

    • jason July 24, 2012 / 10:45 am

      sorry I pasted the wrong code, here is the correct one:

      struct vsInput
      float2 topLeft : ANCHOR;
      float2 dimensions : DIMENSIONS;
      float opacity : OPACITY;

      when I try to use it in VS() like this:
      vsInput mainVS(vsInput input)
      return input;

      I got the error:
      Warning Warning: Unknown Input Semantic: ANCHOR0

  17. reikken December 9, 2013 / 2:04 am

    Why does the alpha transparency not work in this code? It worked in tutorial 6. What’s the difference now?

    (I know that setting AlphaToCoverageEnable to true will enable a transparency of sorts, but it pixellates the image (why?), and it isn’t set in tutorial 6, which has transparency just fine without it. (has better transparency, even, because the image doesn’t get pixellated))

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s