Wednesday, June 2, 2010

Stupidly fast WebGL Matrices

I've made a matrix library for WebGL. It's fast. Really fast. It's linked right here: glMatrix
[EDIT: And just been updated! Now even faster!]

And, lest some scoff at my claims of fastness, I've also made a benchmark: glMatrix Benchmark

Don't bother with that last link if you're browser doesn't have WebGL enabled. None of the comparison libraries really work without it. (Though glMatrix will function with any javascript sequence.)

Okay, so let's get a little more serious. Why in the world would I want to make yet another matrix library when there are already a good selection of them?  That's a tough one, especially because I'm a very firm believer in not reinventing any wheels. What it really comes down to is this: I tried all the other matrix libraries I could find and didn't feel like any of them were meeting my needs.
  • Sylvester is a very nice and robust library that I've seen some people using with WebGL, and I can't really blame them. It's probably the most complete one that I've seen, and is aimed at people doing heavy duty math in their browser. The biggest problem with it is simply that Sylvester was built for robustness, not speed, and as such is ill suited for realtime graphics.
  • Khronos recommends using CanvasMatrix in their introductory tutorial. It's apparently from Apple (according to the comment at the top), and has a nice interface, but once again isn't really built with an eye towards performance.
  • mjs was built to be far more speed conscious, and it shows. Done by the guy who did the original Spore WebGL demo, it's a nice little library and it's what I've been using so far. I only have a few complaints with it: I'm not a huge fan of the syntax ('M4X4' is awkward to type over and over again), and you end up creating a lot of temporary matrices to store intermediate values in (passing a single matrix as both the source and result matrix can corrupt the value). There's also some odd bits of missing functionality (like a full, generic inverse) that may or may not be a problem. Still, it's very workable.
  • EWGL Matrices appear to be the latest of the bunch, and the idea behind them was very specifically to be extremely fast. To that end they did a great job, but there were still a couple of things that I was annoyed by. All operations take place on the source matrix, which can lead to some unnecessary duplications in certain scenarios. (Basically the inverse of mjs's problem) And it's object oriented nature tends to force you to use, for example, their vector types when passing translation or axis data, which I find somewhat cumbersome.
Now, all of these complaints are, in reality, quite petty. If that were the extent of it I'd just use one of the latter two and be done with it. But there was one feature that was missing from all of them (except Sylvester, which doesn't count for speed reasons) that killed it for me: none of them had facilities to multiply a vector by a matrix! To me that seems like a rather obvious one, because while certainly in an environment like WebGL we want to let the GPU do as much work as possible there's some times where you simply want to rotate a point in memory.

So, of course, noticing a small deficiency in the existing options I did what any normal programmer would do: Wrote my own. :) 

There were a few key things I was aiming for when I wrote it:
  • The interface needed to be clean and consistent
  • Operations needed to be able to happen in-place OR written out to a destination array, leaving the original untouched.
  • The library should not lock you into using a certain type or set of classes. (ie: It should work just as easily with Arrays as WebGLFloatArrays)
  • And for crying out loud it needs to be able to transform vectors!
Believe it or not, speed was a completely secondary concern while building it. Once I got all my desired functionality in there and started doing some benchmarks, however, I realized that I wasn't too far off from the faster existing libraries. So I spent another evening optimizing the crap out of it to make my little matrix library "stupidly fast". A few key optimizations were:
  • Unroll EVERYTHING. There's not a single loop in the code.
  • Take advantage of javascript's variable caching. If a matrix element was read more than once in any function it was stored in a local variable first. I was honestly surprised at just how much of a difference this one made!
  • Inline anything that made sense. ie: Although I have a vector normalize function, I did an inlined version in the matrix inverse to cut down on call overhead and additional variable creation.
  • Take shortcuts when possible! If the source and destination matrix for a transpose are the same, we don't need to alter the elements on the diagonal. Or when rotating if we notice that it's along the X, Y, or Z axis we can cut out a lot of calculations that would just end up as 0 anyway.
And the end result is that glMatrix outperforms even EWGL in every scenario I've tested! No small feat!

Of course, "stupidly fast" is probably pushing it. After all, this is Javascript we're talking about. Even the most naive of C matrix libs would run circles around the best javascript libs. But when it comes to WebGL we really don't have much of a choice now do we? At the very least I feel confident in saying that this library is one of the fastest (if not THE fastest) javascript matrix libs available today.

This being a first release, I fully expect a few bugs here and there and welcome any feedback on how to improve things. But hopefully it can serve as a meaningful contribution to the WebGL development scene.

Happy coding!

17 comments:

  1. Congratz, it is the fastest.
    There are still some things that can be faster but you are indeed the fastest released library out there.

    Also I will probably add some issues for speed issues.

    many greetz

    EasyWebgl

    ReplyDelete
  2. Great work, things just keep getting faster. Take a look at the benchmark for Rotation (Arbitrary axis), you seem to be selling your self short aren't you're multiplying as well which some of the others aren't making you nearly twice as fast ;-)

    ReplyDelete
  3. http://learningwebgl.com/blog/?p=1828

    How do you do compared to GPU accelerated?
    This guy says his proof of concept sped Sylvester up roughly 7 fold.

    ReplyDelete
  4. @gero3 - Please do! Any improvements are welcome.

    @Paul - I'm not sure what you mean when you say I'm selling myself short. In what way?

    @Kyber - Comparing this to the GPU accelerated matrix is rather apples to oranges. They're not really aimed at the same thing. If you look at the post you linked to you'll notice that the matricies he's multiplying are 1024x1024, which is a massive amount of calculations and has no practical use in a realtime app. And if you're doing standard 4x4 matrices the overhead of pushing the matrix data to the GPU and reading back the result will completely destroy any performance gains. (I'd be willing to bet at that point Sylvester would be far faster, actually).

    So in short: the GPU will easily outperform my library on the actual multiplication, but the read/write overhead will mean that my library will be much faster overall.

    ReplyDelete
  5. FYI: I've also updated the Spore Critter demo to use glMatrix, if anyone is looking for a real-world demo.

    ReplyDelete
  6. FYI, I think the mat4.multiplyVec3 function is incorrect. You just need to transpose the matrix indices to match the convention being used:

    [x, y, z, w][[i0, i1, i2, i3], [i4, i5, i6, i7], [i8, i9, i10, i11], [i12, i13, i14, i15]] = [x*i0 + y*i4 + z*i8 + i12, ...]

    It would also be nice to support transforming vec3 (or vec4) by a matrix that has a w-column != [0,0,0,1]^-1.

    Enjoying playing with the code otherwise. Thanks!

    ReplyDelete
  7. Good catch! I'll get that in the next release along with some other fixes/optimizations. Should be within the next day or so.

    ReplyDelete
  8. Whoa, sorry for the comment spam, blogspot is having issues today.

    ReplyDelete
  9. No problem, I've cleaned it up for you.

    And I can agree that GPU calculations in general should really get some more attention. A lot of people don't seem to "get" what GPU acceleration of that nature is useful for, though. They want it to be another CPU. That's a shame, really, because under the right circumstances it can be spectacularly useful.

    ReplyDelete
  10. It's a small world I guess. Here I am scouring the web for information on how to do skinning in webgl. I'm reading over this entry in awe of the brilliance of this developer and them I'm, like, "Wait a second, I know that guy!".
    I am excited that you're so into webgl. With your brains and...and your brains we could do great things! No, I really do want to pick your brain on a project I've been thinking about for a couple of years.

    ReplyDelete
  11. Wow, so over a month later I notice that my old FATPOT buddy has been looking at my site! Silly blogger, I can't seem to figure out a way to get it to notify me when I get new comments on my posts. :( I'd love to chat some time! Drop me a line!

    ReplyDelete
  12. You can make it faster in Firefox by using Float64Arrays instead of Float32Arrays. Might be because you're not converting from double to float each time you set a value.

    ReplyDelete
  13. Very nice :) Any thoughts on where GL's Shader Language fits in? Will it ever make sense to try to encapsulate it too behind prettier js interfaces? how might that work alongside glmatrix? http://en.wikipedia.org/wiki/GLSL etc...

    ReplyDelete
  14. I'm just plowing through some of the tutorials out there, and was concerned that Sylvester/glUtil was used as the main matrix/vector library.

    I got a bit spooked looking at the glUtil code because it had a few problems. For example, the author did not understand javascript var hoisting. Also, makeOrtho was duplicated!

    So the question is: can I completely replace Sylvester/glUtil with glMatrix?

    ReplyDelete
  15. Hey, I just noticed Quaternions! Way to go! My opengl prof is really delighted with using them rather than the usual transforms. Nice to avoid gimbal-lock.

    ReplyDelete
  16. Sorry if this is spamming but I've been digging into the various matrix packages so ..

    On my MB Air, using Chrome, and running your benchmark, CanvasMatrix is the fastest! I was surprised so looked at the code and it may be due to the difference between using arrays/indexing vs using objects/properties (m11,m12 .. etc).

    This is absolutely weird! I tried Firefox, and glMatrix was great. Oddly enough, Safari only showed glMatrix results.

    I hope the various JS engines get array processing equally fast, especially with the gl matrix types like Float32Array .. presumably they can do real indexing rather than JS pseudo indexing with the object/property approach (0,1,2..as object properties rather than true indexes.)

    Thanks a bunch for the great library! I'll let you know how it goes in our openGL/Graphics class. I hope to rewrite all of Ed Angel's examples in WebGL+glMatrix.

    ReplyDelete
  17. Yea, this is one of my main concerns – the performance with javascript. That, and the “experimental-“ part on initializing OpenGL ES.
    My software rasterizer can perform ~ 1,000,000 T&L calculations under 30 ms on a mobile processor using assembly.

    ReplyDelete