SpriteBatching Super Easy Optimization.

Sprite Batching is a really easy to do optimization for rendering sprites. It is so easy, in fact, that any “engine” that does not support it, isn’t worth crap. (Exaggerating a bit there)

I made this tutorial a while ago and never posted it here. At the end of it I will give you a link to another tutorial by a very talented coder on how to do a much more modern and robust sprite batcher.

So after spend countless hours on the inter webs and pushing my googlfu skillz to their limit I got a simpe but fast SriteBatcher working. I am going to explain the process of making a SpriteBatcher and then give you the code if you would just like to modify it for your own needs.

Note: I hate it when mathematics books give you the simplest example problem and then tell you do to all sorts of tricky problems. So I will be showing you something a little more complicated than need be so you will know what to do if you want to change something.

Lets get started

What you need: Computer, EDI/Notepad, have gotten something more then glBegin/glEnd working, have at least done some basic things with Vertex Arrays/VBO in LWJGL before.
Tip: http://www.java-gaming.org/topics/introduction-to-vertex-arrays-and-vertex-buffer-objects-opengl/24272/view.html
A big plus is knowing what a Texture Atlas is.

We need to understand what a SpriteBatcher is. A huge slow down in programing shtuffs in openGL is draw calls. By lowering draw calls you reduce the load on the CPU. By batching as many sprites as we can into one draw call, we reduce the CPU load thus improving performance. We can do this with Vertex Arrays. Why not VBO? Because sprites are very dynamic little buggers and can change possible every frame. (and then some) VBOs can be faster when the data is not so dynamic which is not the case for a spritebatcher. Now that I have blabbered on for w while lets look at some actual code.

Quick Note: This is my batcher and I use a class called TexRegion which is what it sounds like, a texture region. This is to show you how to set it up for working with Texture Atlases.

public class SpriteBatcher {
   private static float[] empty = new float[8];
   private static Vector4f empty1 = new Vector4f(0,0,0,0);
   
   private float[] vertArray;
   private byte[] colorArray;
   private float[] texArray;
   private int draws;
   private int maxDraws = 1000;
   private int vertIndex;
   private int colIndex;
   private int texIndex;
   private int currentTex;
   private FloatBuffer vertBuff, texBuff;
   private ByteBuffer colBuff;
   
   static{
      empty[0] = 0;
      empty[1] = 0;
      empty[2] = 1;
      empty[3] = 0;
      empty[4] = 1;
      empty[5] = 1;
      empty[6] = 0;
      empty[7] = 1;
   }

So what is all this jazz? The two static fields are for when you may want to have the sprite batcher draw something with out specifying a color or texture region. It is better to not create these things every time we need them via new Vector4f or new float[].

We have 3 float arrays, one for vertex coords, one for tex coords, and one for color coords. You can guess that the three ints are what we will be using to keep track of where we are in filling up the batcher. Then we also have an int to keep track of what texture we are working with.

We have 2 float buffers for vertex and texture coords and one byte buffer for color. Why a byte buffer? Since we want to be able to do sprites that can change transparency every frame we will need RGBA. If we used floats this would be 4*4*4 bytes. Bye reducing the bytes we send to the gpu, we can increase performance slightly. If you would like the more actuate float, simply drop the byte buffer and add another float buffer.

Last we have max draw calls and current draw calls. Why do we have these? There is an optimal size for VBOs and vertex arrays. That is to say, you want to give things to the GPU in byte sized chunks. The most optimal I have found for this batcher is between 1000-1500 sprites at a given time. So lets make a constructor for this class.

   public SpriteBatcher()
   {
      this(1000);
   }
   
   public SpriteBatcher(int size)
   {
      vertArray = new float[size*2*4];
      vertBuff = BufferUtils.createFloatBuffer(vertArray.length);
      colorArray = new byte[size*4*4];
      colBuff = BufferUtils.createByteBuffer(colorArray.length);
      texArray = new float[size*2*4];
      texBuff = BufferUtils.createFloatBuffer(texArray.length);
      vertIndex = 0;
      colIndex = 0;
      texIndex = 0;
      maxDraws = size;
      draws = 0; 
   }

The default constructor calls sets the size to 1000 but we also will let people choose what they want the size to be.
Most things here are straight forward. vertArray needs to have the size * 2 (vertices at each corner) * 4 (number of corners). The vertBuff will have the vertArrays length. You could also put size*2*4. The same goes for the other arrays. Only thing to note is that the byte array will have a multiplier of 4 because we are using RGBA. Set all indexes to 0, draws to 0, and maxDraws to size.

Lets keep things in openGL style and create two methods that will be used to start and end rendering with the batcher, begin() and end().

   public void begin()
   {
      glEnableClientState(GL11.GL_VERTEX_ARRAY);
      glEnableClientState(GL11.GL_TEXTURE_COORD_ARRAY);
      glEnableClientState(GL11.GL_COLOR_ARRAY);
   }
   
   public void end()
   {
      render();
           
      glDisableClientState(GL11.GL_VERTEX_ARRAY);
      glDisableClientState(GL11.GL_TEXTURE_COORD_ARRAY);
      glDisableClientState(GL11.GL_COLOR_ARRAY);
   }

Very simple. Enable the client states and then render() and disable client states in the end(). Now lets look at the render().

   private void render()
   {
      glBindTexture(GL11.GL_TEXTURE_2D, currentTex);
      vertBuff.put(vertArray);
      vertBuff.flip();
      colBuff.put(colorArray);
      colBuff.flip();
      texBuff.put(texArray);
      texBuff.flip();
      glVertexPointer(2, 0, vertBuff);
      glColorPointer(4,true, 0, colBuff);
      glTexCoordPointer(2, 0, texBuff);
      glDrawArrays(GL_QUADS, 0, draws*4);
      vertBuff.clear();
      colBuff.clear();
      texBuff.clear();
      vertIndex = 0;
      colIndex = 0;
      texIndex = 0;
      draws = 0; 
   }

Still very simple. Bind what ever texture is being used, fill the buffers, flip the buffers, (never forget that) and specify the pointers. Note the color pointer. We are using bytes and saying that they are unsigned. Then we draw using draws*4 because there are 4 indices for each sprites. Why are we not using the whole indices buffer trick and drawElements or drawRangeElements? Due too sprites dynamic nature, they will rarely share triangles so you will lose 1-2 fps by adding in an indices buffer. If you do not know what I mean when I say indices buffer, do not fret! Use the googlefu! or just ignore it and continue on.

We will finally clear the buffers, set the indexes back to 0, and set the draws to 0. Woh! really simple! well….no comes the complex part…actually filling the arrays with useful information such as where are sprites is, what size it is, what texture it is using, what color is it if any at all, and yes…if it is rotated at all.

So here is the scariest looking method in the whole class. draw(blah blah blah sprite stuff)

   public void draw(int texID, float[] region, float x, float y, float width, float height, float rotation, Vector4f col )
   {
      if(texID != currentTex)
      {
         render();
         currentTex = texID; 
      }
      if(draws == maxDraws)
      {
         render();
      }

      final float p1x = -width/2;
      final float p1y = -height/2;
      final float p2x = width/2;
      final float p2y = -height/2;
      final float p3x = width/2;
      final float p3y = height/2;
      final float p4x = -width/2;
      final float p4y = height/2;

      float x1;
      float y1;
      float x2;
      float y2;
      float x3;
      float y3;
      float x4;
      float y4;

      // rotate
      if (rotation != 0) {
      final float cos = (float) FastMath.cosDeg(rotation);
      final float sin = (float) FastMath.sinDeg(rotation);

      x1 = cos * p1x - sin * p1y;
      y1 = sin * p1x + cos * p1y;

      x2 = cos * p2x - sin * p2y;
      y2 = sin * p2x + cos * p2y;

      x3 = cos * p3x - sin * p3y;
      y3 = sin * p3x + cos * p3y;

      x4 = cos * p4x - sin * p4y;
      y4 = sin * p4x + cos * p4y;
      } else {
      x1 = p1x;
      y1 = p1y;

      x2 = p2x;
      y2 = p2y;

      x3 = p3x;
      y3 = p3y;

      x4 = p4x;
      y4 = p4y;
      }
      x1+=x;
      x2+=x;
      x3+=x;
      x4+=x;
      y1+=y;
      y2+=y;
      y3+=y;
      y4+=y;
      
      vertArray[vertIndex]    = x1;
      texArray[texIndex]       = region[0];
      vertArray[vertIndex+1]    = y1;
      texArray[texIndex+1]    = region[1];
      
      vertArray[vertIndex+2]    = x2;
      texArray[texIndex+2]    = region[2];
      vertArray[vertIndex+3]    = y2;
      texArray[texIndex+3]    = region[3];
      
      vertArray[vertIndex+4]    = x3;
      texArray[texIndex+4]    = region[4];
      vertArray[vertIndex+5]    = y3;
      texArray[texIndex+5]    = region[5];
      
      vertArray[vertIndex+6]    = x4;
      texArray[texIndex+6]    = region[6];
      vertArray[vertIndex+7]    = y4;
      texArray[texIndex+7]    = region[7];
      
      colorArray[colIndex]     = getColor(col.x);
      colorArray[colIndex+1]    = getColor(col.y);
      colorArray[colIndex+2]    = getColor(col.z);
      colorArray[colIndex+3]    = getColor(col.w);
      
      colorArray[colIndex+4]    =  getColor(col.x);
      colorArray[colIndex+5]    =  getColor(col.y);
      colorArray[colIndex+6]    =  getColor(col.z);
      colorArray[colIndex+7]    =  getColor(col.w);
      
      colorArray[colIndex+8]    =  getColor(col.x);
      colorArray[colIndex+9]    =  getColor(col.y);
      colorArray[colIndex+10] =  getColor(col.z);
      colorArray[colIndex+11] =  getColor(col.w);
      
      colorArray[colIndex+12] =  getColor(col.x);
      colorArray[colIndex+13] =  getColor(col.y);
      colorArray[colIndex+14] =  getColor(col.z);
      colorArray[colIndex+15] =  getColor(col.w);
      
      
      vertIndex+=8;
      texIndex+=8;
      colIndex += 16;
      draws++; 
   }

Woh! Lots of stuff happening here. Lets explain. First we check if the texture is different from the one we are using, if it is, we render() and then set that as our texture. Then we will check to see if we have hit the max draw calls and again, if we have, render().

Now comes the fun part, rotation. If you would like you can skip this but I think you should read on.

We are going to render the quads with the center of the quad at the x and y given. This means that we need to divide the width/height by 2 and minus or subtract it depending on what corner of the quad we are specifying. We could just draw like we would in Java2D by using the x and y as the top left corner point but by making it the center we greatly simplify the stuff the user has to manage. Why are we not using x and y here? The coordinates are not in screen space because we are going to rotate them at the origin which we assume is (0,0). Now we will set up vars for our final coordinates after we rotate and translate into screen space. But WAIT!! what if we don’t need to rotate? Well we have the IF statement to check to see if we need to rotate, if not, we will just set the final coords to the p1x/p1y stuff and translate them into screen space by adding either x or y. Now for the rotation.

We will store the Sin and Cos of the degree so we only have to calculate these once. After that, we will use them as a rotation matrix.
R = Cos(degree) -Sin(degree)
Sin(degree) Cos(degree)
To see where I get the sin and cos multiplication and addition go here.
http://en.wikipedia.org/wiki/Rotation_matrix#In_two_dimensions Pointing

We have rotated coordinates so we can translate them into screen space by adding x to x coords and y to y coords.

Now that we have all the information we need and can fill up the arrays with the new data. We will use the index then add 1 to it for each subsequent placement into the array. The texture array is getting the tex coordinates from a float[] that is given in the method call. This is so we can specify only partial regions of a texture. (IE: texture atlas ). The we add the the same color for each corner of our quad. This is the method the getColor().

   private byte getColor(float f)
   {
      return (byte) (f*255);
   }

Now that everything is filled up, we increase the indexes and add 1 to the draw count.

Here are some convenience methods for rendering if you don’t specify a color or float[].

   public void draw(int texID, float x, float y, float sizex, float sizey )
   {
      draw(texID, empty, x, y, sizex, sizey, 0, empty1);
   }

   public void draw(int texID, float x, float y, float sizex, float sizey,float rotation, Vector4f col )
   {
      draw(texID, empty, x, y, sizex, sizey, rotation, col);
   }

And here is the whole class.

http://pastebin.java-gaming.org/f5814300d3e

This will work on just about every system out there because it only uses GL11. (even on mobile devices…although I don’t know why you would EVER not use libgdx)

Now if you want to improve even more for those many proud owners of a graphics card supporting opengl 3.0 or better, you can use geometry shaders which will take off even more stress from the cpu. I will add this some time in the future in such a way that you do not have to change any way you call stuff to be render from the SpriteBatcher. (Just plug it in and it works)

Quick performance specs:
On an integrate chip, you will get fillrate limited before you will come even close to cpu.
On a dedicated gpu, you will hit a cpu bottle neck first but is still much faster then fixed-function.

On my 6 year old computer: quad core @2.6ghz, GeForce 250 1Gig V-ram, 4 Gig ram (3 gig effective) can do 50k sprites at 60fps no problem.
On my 2 year old laptop: i5 2.8ghz, GeForce 420m Pointing is never used, 4 Gig Ram can do 50k at 30fps on integrated chip Pointing fillrate limited.

Have a nice day,
Stumpy.

Now this was a while ago and I have learned that you really do not need to drop the colors to a byte. The performance gain is almost no existent. This is also a very primitive batcher. It is fast but primitive. It works really well for just throwing gobs and gobs of sprites and particles without much thought.

Here is an infinitely better tutorial. More complex but more robust.

https://github.com/mattdesl/lwjgl-basics/wiki/Sprite-Batching

And Finally here is a cool image of a game that uses this sprite batcher in action.

Advertisements

Sorry

Sorry I have not been active. I really did not think many people would read this and I got caught up in studies. I will finish the particles tutorial. However, I have learn a whole lot since I started it. It will still be all in Java2D but I may change a few things to help give a better understanding.

Also, I am going to post a bunch more tutorials and code snippets here. Stay tuned.