Wednesday, July 3, 2013

WebGL instancing with ANGLE_instanced_arrays

And still no Shakespeare...
Ever find yourself in a position where you absolutely, positively have to get a ton of monkeys on screen with a single draw call?

Yeah, okay, me neither. (At least, not until I put together the demo for this post.)

But there's certainly occasions when it makes a lot of sense to redraw the same mesh many times with some minor tweak like position, orientation, color, etc. Simple examples of this are trees and other vegetation, streetlights,  boxes in a warehouse, soldiers in an army, and so on.

Traditionally with WebGL the optimal way to draw those repeated meshes, commonly referred to as instances, would be something like the following pseudocode:

for (i = 0; i < meshInstances.length; i++) {
  var instance = meshInstances[i];
  gl.bindUniform3fv(meshPosition, instance.position);
  gl.bindUniform4fv(meshColor, instance.color);
  gl.drawElements(gl.TRIANGLES, indexCount, gl.UNSIGNED_SHORT, 0);

This is good, because we only set up the vertex attributes (implied to be in bindMeshArrays) once,  then make many draw calls in quick succession, only changing the properties that are different from instance to instance. In this case just position and color. This makes for a near minimal amount of overhead for each draw call, and in an environment like Javascript where each call is expensive that's important!

(This example is pretty simple, and in fact you can actually optimize it further by getting creative with how you provide the GPU with the data. Check out Gregg Tavares' wonderful Google I/O talk for more ideas about how to render lots of geometry very quickly.)

Even in this scenario, however, you have to make at least three WebGL calls for every instance of the mesh. That doesn't sound terrible but in the real world that would probably be quite a bit more, and the fact is that in Javascript every call into native code (Like the WebGL API) carries a certain expense. Thus while drawing repeated meshes in this way works, and generally works well, it could still be better.

And that's exactly why we have ANGLE_instanced_arrays.

[Update: Peter Jacobs on Google+ brought up a potential point of confusion. Despite the name, this extension works on any device with the appropriate hardware support (ARB_instanced_arrays and ARB_draw_instanced), not just on Windows when using ANGLE. The ANGLE in the name simply indicates that the extension spec was written by the ANGLE authors.]

This extension, which is currently available in the Chrome Dev channel behind the draft extensions flag, allows you to pack information about each mesh instance into an attribute array, just like you do vertex position, normal, etc. But instead of listing these values once per vertex, you only have to specify them once per instance. Lets take a look at how it works:

// A nice little line of monkeys down the X axis
var offsets = new Float32Array([
  0.0, 0.0, 0.0,
  1.0, 0.0, 0.0,
  2.0, 0.0, 0.0,
  3.0, 0.0, 0.0

var colors = new Float32Array([
  1.0, 0.0, 0.0, 1.0, // Red monkey
  0.0, 1.0, 0.0, 1.0, // Green monkey
  0.0, 0.0, 1.0, 1.0, // Blue monkey
  1.0, 1.0, 1.0, 1.0, // White monkey

var offsetBuffer = gl.createBuffer();
gl.bindBuffer(gl.ARRAY_BUFFER, offsetBuffer);
gl.bufferData(gl.ARRAY_BUFFER, offsets, gl.STATIC_DRAW);

var colorBuffer = gl.createBuffer();
gl.bindBuffer(gl.ARRAY_BUFFER, colorBuffer);
gl.bufferData(gl.ARRAY_BUFFER, colors, gl.STATIC_DRAW);

This should look pretty familiar! In fact, there's no difference between this code and your standard Vertex buffer setup except that the information is intended to be used once per instance. The real difference come in when you go to draw with that data.

var instanceCount = 4;
var ext = gl.getExtension("ANGLE_instanced_arrays"); // Vendor prefixes may apply!

// Bind the rest of the vertex attributes normally

// Bind the instance position data
gl.bindBuffer(gl.ARRAY_BUFFER, offsetBuffer);
gl.vertexAttribPointer(offsetLocation, 3, gl.FLOAT, false, 12, 0);
ext.vertexAttribDivisorANGLE(offsetLocation, 1); // This makes it instanced!

// Bind the instance color data
gl.bindBuffer(gl.ARRAY_BUFFER, colorBuffer);
gl.vertexAttribPointer(colorLocation, 4, gl.FLOAT, false, 16, 0);
ext.vertexAttribDivisorANGLE(colorLocation, 1); // This makes it instanced!

// Draw the instanced meshes
ext.drawElementsInstancedANGLE(gl.TRIANGLES, indexCount, gl.UNSIGNED_SHORT, 0, instanceCount);

This is actually doing the same thing as our first code snippet with the for loop, but as you can see there's only one draw call here! The shader would have to be updated, but essentially you would only change the position and color inputs from a uniform to an attribute. The important bits in this code are the calls to vertexAttribDivisorANGLE and drawElementsInstancedANGLE.

vertexAttribDivisorANGLE specifies for the given attribute location how often a value should be repeated. A divisor of 0 means the attribute isn't instanced. A divisor of one means that each value in the attribute stream is used for a single instance. A divisor of two would mean that each value is used for two consecutive instances. For example:

ext.vertexAttribDivisorANGLE(offsetLocation, 1);
ext.vertexAttribDivisorANGLE(colorLocation, 2);

Using the data above, this would render two red monkeys at X=0 and X=1, and two green monkeys at X=2 and X=3.

ext.vertexAttribDivisorANGLE(offsetLocation, 2);
ext.vertexAttribDivisorANGLE(colorLocation, 1);

This, on the other hand, would render a green and red monkey at X=0 and a blue and white monkey at X=1.

(To be honest, I don't see much real-world use for divisors other than 0 and 1, but it's nice to know what they do.)

drawElementsInstancedANGLE (and drawArraysInstancedANGLE) are straightforward replacements for their non-instanced counterparts drawElements and drawArrays, the only difference being that each takes the number of instances to render as an additional argument.

Pretty simple, right? When used correctly this can lead to better performance and less code, both noble goals.

And now for a quick confession: The demo I created is actually a pretty terrible example of this API. On most systems I've tested it on using hardware instancing (that is, using the extension) either yields the same performance as the software instancing (using a for loop) or only provides 2-3 extra FPS. That's because this scene isn't really draw-call bound, and so instancing doesn't have much opportunity to speed it up.

As with anything in graphics, you shouldn't just blindly throw this API at everything you can but instead measure what the bottlenecks actually are. If you are fill-rate bound or vertex-bound then instancing won't solve your problems. But if your seeing your app get hung up on making lots of draw calls while your GPU spins it's wheels waiting for Javascript to catch up, this is the API for you!


  1. Instancing really comes into play when you have tens or hundreds of thousands of objects to display (large crowds, asteroid fields, particle systems, etc.) and when these objects are substantially parametrized (position, rotation, bones, displacements, etc.)

    Classical instancing breaks down in these situations pretty quickly.

    Pseudo-instancing (repeating an object into one buffer and duplicating attributes) works for a while, but it too breaks down as it quickly gobbles up all available ram and vertex bandwidth.

    Instanced drawing will consume the same ram that instancing will consume, while delivering better performance than pseudo instancing or instancing. This is a substantial advantage, especially on VRAM challenged devices (mobiles, ahem).

  2. I'm using Canary. The extension is available and is being used. But the scene is only being rendered at 2 frames per second.

    I can run Doom 3 and Portal etc smoothly enough on my hardware so a face full of monkeys shouldn't be a problem. Maybe you should create more demos for low-to-mid range hardware so you don't scare away any potential WebGL noobs who are looking to move away from Flash etc.

    Great stuff though. I'm glad I found your blog today :-)

  3. This is a great extension. I re-jiggered an old application to use it and saw a pretty substantial performance bump. Looking forward to it getting out from behind the draft extensions flag!

    1. Unless we find some major problem the extension should be out from behind the flag in Chrome 30. We were able to push it's release forward because there's people in Google that have found it very useful as well. :)

  4. i got 3FPS. instancing does not seem to work yet.

  5. I am instancing tens of thousands of circles and projected them into a fisheye lens and this extension works great. I took me a day to realize that if you do not turn off instancing (via setting Divisor to 0) it will screw up regular (not instance) drawArray calls -- I would call this a bug more then a feature.

  6. Interesting. On my iPad Air 2 (Safari), I get ~20 FPS using hardware instancing, but ~31 FPS with instancing disabled.

    1. WebGL instancing does seems to help with performances as a whole, however. The three.js demo at with 65,000 instanced triangles has little trouble running at 60 FPS on the same iPad.  That demo appears to be using the same ANGLE_instanced_arrays extension.

  7. This was very helpful. I am rendering 100k randomly-generated 'circles', each made up of 10 triangles. My first approach got me 3 FPS. With instancing, I am getting a healthy 60 FPS. Thanks!

  8. Should be streamlined now with three.js

  9. Win10, Chrome 56, Radeon R7 370. W/o HW instancing I get about 27 fps, w/ HW instancing - steady 60 fps. It DOES work for me.

    1. In chrome instanced works bad. I found that on Intel HD cards it works bad even in firefox and other browsers.

      For chrome and "low" card they uses CPU implementation for OpenGL ES, and we can see in code "SwiftShaders" they simply call drawElements*instancesCount so its like u not even uses batching.

      But on better card (i test on nVidia) instanced works little better. But in case when u need draw more vertices then can let us UNSIGNED_SHORT with same geometry instanced extension give CRAZY perfomance improvement.

      But still in Chrome in common cases in little faster on some cards in best cases while FireFox shows much better results

  10. As of now, Chrome supports WebGL 2 with native HW instancing (based on OpenGL ES 3.0 specs, not some ANGLE extension). I've made a fur demo using this tech -

  11. It is not only about performance, but also overcoming memory limitations. Take for example component: it can draw only around 1e5 points due to GPU memory restrictions. Instancing in turn expands number of points up to 1e7 or more, due to avoiding repeating vertices.

    1. Btw the instancing is implemented in regl-error2d:, and that extended memory limitations 40 times