Wednesday, July 3, 2013

WebGL instancing with ANGLE_instanced_arrays

And still no Shakespeare...
Ever find yourself in a position where you absolutely, positively have to get a ton of monkeys on screen with a single draw call?

Yeah, okay, me neither. (At least, not until I put together the demo for this post.)

But there's certainly occasions when it makes a lot of sense to redraw the same mesh many times with some minor tweak like position, orientation, color, etc. Simple examples of this are trees and other vegetation, streetlights,  boxes in a warehouse, soldiers in an army, and so on.

Traditionally with WebGL the optimal way to draw those repeated meshes, commonly referred to as instances, would be something like the following pseudocode:

for (i = 0; i < meshInstances.length; i++) {
  var instance = meshInstances[i];
  gl.bindUniform3fv(meshPosition, instance.position);
  gl.bindUniform4fv(meshColor, instance.color);
  gl.drawElements(gl.TRIANGLES, indexCount, gl.UNSIGNED_SHORT, 0);

This is good, because we only set up the vertex attributes (implied to be in bindMeshArrays) once,  then make many draw calls in quick succession, only changing the properties that are different from instance to instance. In this case just position and color. This makes for a near minimal amount of overhead for each draw call, and in an environment like Javascript where each call is expensive that's important!

(This example is pretty simple, and in fact you can actually optimize it further by getting creative with how you provide the GPU with the data. Check out Gregg Tavares' wonderful Google I/O talk for more ideas about how to render lots of geometry very quickly.)

Even in this scenario, however, you have to make at least three WebGL calls for every instance of the mesh. That doesn't sound terrible but in the real world that would probably be quite a bit more, and the fact is that in Javascript every call into native code (Like the WebGL API) carries a certain expense. Thus while drawing repeated meshes in this way works, and generally works well, it could still be better.

And that's exactly why we have ANGLE_instanced_arrays.

[Update: Peter Jacobs on Google+ brought up a potential point of confusion. Despite the name, this extension works on any device with the appropriate hardware support (ARB_instanced_arrays and ARB_draw_instanced), not just on Windows when using ANGLE. The ANGLE in the name simply indicates that the extension spec was written by the ANGLE authors.]

This extension, which is currently available in the Chrome Dev channel behind the draft extensions flag, allows you to pack information about each mesh instance into an attribute array, just like you do vertex position, normal, etc. But instead of listing these values once per vertex, you only have to specify them once per instance. Lets take a look at how it works:

// A nice little line of monkeys down the X axis
var offsets = new Float32Array([
  0.0, 0.0, 0.0,
  1.0, 0.0, 0.0,
  2.0, 0.0, 0.0,
  3.0, 0.0, 0.0

var colors = new Float32Array([
  1.0, 0.0, 0.0, 1.0, // Red monkey
  0.0, 1.0, 0.0, 1.0, // Green monkey
  0.0, 0.0, 1.0, 1.0, // Blue monkey
  1.0, 1.0, 1.0, 1.0, // White monkey

var offsetBuffer = gl.createBuffer();
gl.bindBuffer(gl.ARRAY_BUFFER, offsetBuffer);
gl.bufferData(gl.ARRAY_BUFFER, offsets, gl.STATIC_DRAW);

var colorBuffer = gl.createBuffer();
gl.bindBuffer(gl.ARRAY_BUFFER, colorBuffer);
gl.bufferData(gl.ARRAY_BUFFER, colors, gl.STATIC_DRAW);

This should look pretty familiar! In fact, there's no difference between this code and your standard Vertex buffer setup except that the information is intended to be used once per instance. The real difference come in when you go to draw with that data.

var instanceCount = 4;
var ext = gl.getExtension("ANGLE_instanced_arrays"); // Vendor prefixes may apply!

// Bind the rest of the vertex attributes normally

// Bind the instance position data
gl.bindBuffer(gl.ARRAY_BUFFER, offsetBuffer);
gl.vertexAttribPointer(offsetLocation, 3, gl.FLOAT, false, 12, 0);
ext.vertexAttribDivisorANGLE(offsetLocation, 1); // This makes it instanced!

// Bind the instance color data
gl.bindBuffer(gl.ARRAY_BUFFER, colorBuffer);
gl.vertexAttribPointer(colorLocation, 4, gl.FLOAT, false, 16, 0);
ext.vertexAttribDivisorANGLE(colorLocation, 1); // This makes it instanced!

// Draw the instanced meshes
ext.drawElementsInstancedANGLE(gl.TRIANGLES, indexCount, gl.UNSIGNED_SHORT, 0, instanceCount);

This is actually doing the same thing as our first code snippet with the for loop, but as you can see there's only one draw call here! The shader would have to be updated, but essentially you would only change the position and color inputs from a uniform to an attribute. The important bits in this code are the calls to vertexAttribDivisorANGLE and drawElementsInstancedANGLE.

vertexAttribDivisorANGLE specifies for the given attribute location how often a value should be repeated. A divisor of 0 means the attribute isn't instanced. A divisor of one means that each value in the attribute stream is used for a single instance. A divisor of two would mean that each value is used for two consecutive instances. For example:

ext.vertexAttribDivisorANGLE(offsetLocation, 1);
ext.vertexAttribDivisorANGLE(colorLocation, 2);

Using the data above, this would render two red monkeys at X=0 and X=1, and two green monkeys at X=2 and X=3.

ext.vertexAttribDivisorANGLE(offsetLocation, 2);
ext.vertexAttribDivisorANGLE(colorLocation, 1);

This, on the other hand, would render a green and red monkey at X=0 and a blue and white monkey at X=1.

(To be honest, I don't see much real-world use for divisors other than 0 and 1, but it's nice to know what they do.)

drawElementsInstancedANGLE (and drawArraysInstancedANGLE) are straightforward replacements for their non-instanced counterparts drawElements and drawArrays, the only difference being that each takes the number of instances to render as an additional argument.

Pretty simple, right? When used correctly this can lead to better performance and less code, both noble goals.

And now for a quick confession: The demo I created is actually a pretty terrible example of this API. On most systems I've tested it on using hardware instancing (that is, using the extension) either yields the same performance as the software instancing (using a for loop) or only provides 2-3 extra FPS. That's because this scene isn't really draw-call bound, and so instancing doesn't have much opportunity to speed it up.

As with anything in graphics, you shouldn't just blindly throw this API at everything you can but instead measure what the bottlenecks actually are. If you are fill-rate bound or vertex-bound then instancing won't solve your problems. But if your seeing your app get hung up on making lots of draw calls while your GPU spins it's wheels waiting for Javascript to catch up, this is the API for you!