Thursday, April 5, 2012

If I built a physics engine...

So let's be clear about something right from the very start: I'm not a physics guy. I love me a good physics engine, but I will likely never build one because that's just not where my expertise lies. Far be it from me to criticize the algorithms used by your broadphase collision pass, because I only have the most high level of ideas what that actually is. I'm a graphics guy through and through and that's where I'm happy to stay.

However, I do know a thing or two about writing fast Javascript code, which is something that will become fairly important to the new breed of browser-oriented physics engines that are starting to crop up, like Ammo.js, Cannon.js, and the various Box2D ports. Unfortunately, however, I've seen a few common threads within these engines that I think will limit their effectiveness in real-world use, and I wanted to get some of it off my chest while the development of these libraries is still in the early days. Some of these trends are inherited from the underlying C++ libs these libraries wrap (like Ammo.js, which is built from Bullet with Emscripten), but others appear to be self imposed. That's the group I'm talking to.

In my previous post about optimizing the garbage created by my texture loading I linked to a great article by Ashley Gullen that contained a couple of lines that I don't think were given enough emphasis:
Also, avoid vector objects if at all possible (as in vector2 with x and y properties). While again it may be convenient to have functions return these objects so they can change or return both values at once, you can easily end up with hundreds of these created every frame, resulting in terrible GC performance.
This is something that every speed-oriented Javascript library needs to take to heart, and fast! Unfortunately we've all been trained by C++ and friends that encapsulation of these things is good and desirable and as such the first thing that jumps to mind when we see an x, y, and z component is "Class it!" That's a good principle in most cases, but here it's subversively damaging.

Consider the following: I'm making a simple physics demo with the latest and greatest physics engine, "Physics", that's rendered with a separate 3D framework, "Renderer". (Yeah, people don't have very creative naming conventions in my imagination.) I'll probably end up with a couple of code chunks like this somewhere:
function setupBox(pos, size) {
    var physicsBox = new Physics.Box();
    physicsBox.setPosition(new Physics.Vector3(pos.x, pos.y, pos.z));
    physicsBox.setSize(new Physics.Vector3(size.x, size.y, size.z));

    physicsWorld.add(physicsBox);

    renderBox = new Renderer.Cube();
    renderBox.setPosition(new Renderer.Vector3(pos.x, pos.y, pos.z));
    renderBox.setSize(new Renderer.Vector3(size.x, size.y, size.z));

    scene.add(renderBox);
}

function updateBox() {
    var pos = physicsBox.getPosition();
    var rot = physicsBox.getRotation();

    renderBox.setPosition(new Renderer.Vector3(pos.x, pos.y, pos.z));
    renderBox.setRotation(new Renderer.Quaternion(rot.x, rot.y, rot.z, rot.w));
}
Actually, that doesn't look too bad! Very clean and readable, if overly simplified. But let's examine what's happening here. First off, since the physics and rendering libraries don't share a math lib I can't simply pass vectors from one to the other. This means that I'm doing a lot of translating back and forth, which not only costs me execution time but also produces heaps of garbage for my browser to pick up because I'm constructing a new vector every other line! This is especially harmful in the update function, which I expect to be called every frame, and probably multiple times each frame to update multiple objects. (After all, what good is a physics simulation with only one box?)

I should point out that there's nothing really objectionable about the creation of the new Physics.Box and Render.Cube. These are both objects that we expect to stick around for a while and track important state for us. There's no reason they shouldn't be a nice encapsulated object. It's really just the throwaway vectors that we're concerned about.

Of course the above code sample is pretty naive. We could cache off some vector instances for data shuffling like this to prevent some of the garbage creation, but there's only so much that we as the library users can do. You know that the library creators are making use of these classes internally too. Take this simplified version of the naive broadphase code from Cannon.js:
for(var i=0; i < n; ++i) {
    for(var j=0; j < n; ++j) {
        // If body[i] is a plane and body[j] is a sphere...

        r = new CANNON.Vec3(x[i]-x[j],
                  y[i]-y[j],
                  z[i]-z[j]),
        quat = new CANNON.Quaternion(qx[i],qy[i],qz[i],qw[i]),
        normal = quat.vmult(body[i]._shape.normal),
        q = r.dot(normal)-body[j]._shape.boundingSphereRadius();

        if(q < 0.0){
            pairs1.push(i);
            pairs2.push(j);
        }
    }
}
There is the potential to create three new vector or quaternion objects for every potential object pair in the physics simulation per pass. (The third is hidden away under quat.vmult, which actually returns a new Cannon.Vec3) There's really no way the user of a library can avoid these kind of allocations, but it's also hard to fault the library builders for making use of their own classes.

(As a side note, it's rather unfair of me to pick on Cannon.js here. A good deal of their internal simulation loop actually appears to be quite memory conscious. The above code is actually something of an outlier. Many of the user interactions with the library still require use of their vector classes, though.)

So how could this situation be improved? Well, for one we can observe that in my fake example code above both the physics and rendering libraries are relying on vectors that are basically just x, y, and z floats bundled together. What if we were to make our vector operations separate functions rather than methods on the vector prototype? So instead of this:
var v3 = v1.add(v2);
We do:
var v3 = Vector3.add(v1, v2);
Or, even better:
Vector3.add(v1, v2, vOut);
By making the output an explicit parameter rather than an implicit return we eliminate the need to create new objects inside the function and encourage object reuse.

Yeah, it's not quite as pretty, but now v1, v2, and vOut can be ANY object with x, y, and z properties. That means that our Renderer, for example, could potentially operate directly off of the vectors used by the physics library. That would turn our example from earlier into this if both the physics and rendering libs were onboard with the idea:
function setupBox(pos, size) {
    var physicsBox = new Physics.Box();
    physicsBox.setPosition(pos);
    physicsBox.setSize(size);

    physicsWorld.add(physicsBox);

    renderBox = new Renderer.Cube();
    renderBox.setPosition(pos);
    renderBox.setSize(size);

    scene.add(renderBox);
}

function updateBox() {
    renderBox.setPosition(physicsBox.getPosition());
    renderBox.setRotation(physicsBox.getRotation());
}
Quite a bit cleaner, and with no throwaway object creation! Yay!

Now that's great, and it would likely be a marked improvement over the current state of things, but if it were me personally building a physics engine I'd take it one step further. Rather than use an arbitrary object with x, y, z properties I'd discard the idea of an object entirely and just use a 3-element arrays, preferably TypedArrays. (Hm... why does that sound familiar?) It's a bit awkward to use v[0] instead of v.x, but there are some real benefits to doing so! For one, array indexing is very fast in Javascript, especially for TypedArrays. Second, whenever you pass data to WebGL (or WebCL in the future, or a number of other APIs like Web Worker Transferrables) it's expected to be in the form of a TypedArray anyway. Essentially, TypedArrays are fast becoming the language that the web speaks when performance matters.

If your vectors are in an object, you're going to be doing a lot of new Float32Array([v.x, v.y, v.z]); as you pass data around to different APIs, which is just another variation of what we've spent this entire post trying to get away from. If your vectors (and quaternions and matricies!) are TypedArrays from the beginning they can be passed to WebGL and friends with no data duplication and no garbage creation. In fact, you may be able to work your way down to zero garbage creation per frame! That's an extremely lofty goal, but one that's plausible if your core libraries don't force you to create throwaway objects.

Allow me to end by saying that despite my griping I've really been very impressed by the capabilities of the Javascript physics engine options available today. Despite their flaws they are all accomplishing something that I would have thought to be nearly impossible only a few years ago. And this is only the first round! I completely expect that they'll just get better and better as time goes on. One of the things that will make them better, though, is to embrace the realities and limitations of Javascript as a language, and at some point that means that new Vector() is gonna have to die.

[UPDATE]

So after seeing a couple of frameworks toy with revamping some of their vector math (most notably Three.js discussing changes to their vertex class and Cannon.js trying to optimize their vector math), it's become apparent that there's some clarification and a slight bit of back tracking that needs to be done here.

I failed to mention (because frankly I had forgotten) that creating TypedArray instances is a very expensive operation in Javascript. How expensive relative to other data structures will vary from environment to environment but the point is that you want to create as few of them as possible. That, of course, is the main thrust of this entire post, regardless of your chosen data type. Depending on how your library needs to use it's vector/matrix/etc classes, though, TypedArrays may very well be the wrong choice for you!

If you need to be creating new vectors all the time, or will never be pushing your data into an API that requires TypedArrays, then there's actually a lot of benefit to either using a standard javascript array or a lightweight object. (The rest of my post still stands, though! {x: 0, y: 0, z: 0} is still preferable to new MyCustom.Vector())

If you are frequently passing data to WebGL, Audio APIs, Web Workers, or other TypedArray-based libraries, however, you will probably be forced to take that expensive creation hit at some point anyway. In those cases the absolute worst thing you can be doing for your performance is something like:
gl.uniform3fv(uniformX, new Float32Array([v.x, v.y, v.z]));
At that point you have probably destroyed any performance gains you may have otherwise had with the overhead of not only the TypedArray creation but also the intermediate JS array used to package the data. It would be far better to either use TypedArrays from the get go or keep a cache of them around explicitly for data passing.

Like so many other things in software development this is an area where there are no silver bullets. It's up to you to examine the needs of your code, experiment with the alternatives, and profile your performance every chance you get!

10 comments:

  1. Very well written, and very true. I've found the same thing over and over again when looking to use libraries and I really hope people take your words to heart.

    I'm really hoping that people get over the current trend to make things easy to use at the cost of performance. It's easy to use a code translator/abstractions/etc to make an API more usable, but it's impossible to solve systemic performance problems in a large codebase.

    My math library of choice is goog.vec (http://code.google.com/p/closure-library/source/browse/trunk/closure/goog/vec/) partly because I'm using Closure but also because it's insanely fast - not because it's doing anything hacky, but because it follows the kind of rules you've mentioned. It uses Float32Array for everything, and the API design strongly discourages allocation of temporary values. If you're really sensitive to GC (as you should be!) using some rather painful-but-easy-to-understand workarounds for the lack of stack variables in JS it's possible to get math-heavy apps running within 10% of native perf and no garbage-per-frame.

    ReplyDelete
  2. Thanks for your post and for looking into my code, Brandon! I have to admit that I followed the "Class it!"-way of thinking when building Cannon.js, with a lot of unnecessary object allocations. I am an impatient physics guy, I want to see the algorithms work and the rigid bodies moving on the screen without too much hassle :P

    I've definitely listened to your words of expertise, and hope to get more efficient Cannon.js code in the future. After reading this, I plan to put TypedArrays in each RigidBody object (for positions, orientations etc) and throw away the World.x, World.y, ... data arrays. Does that sound like a good strategy?
    The "Vector3.add(v1, v2, vOut)" structure sounds very promising for performance and for encouraging others to reuse objects.

    Your work is awesome, keep it up!

    Cheers!

    ReplyDelete
  3. I like this post a lot because I was thinking the same to. It bugs me that we have to write the same classes in CS. We make a Generic Matrix class in Java for example, knowing that primitive values in Java can't be put in generic types and so, we have to use Boxing classes (Like Float instead of float) which means that EVERY operation on two float values requires two unboxing and one boxing operation plus one more Object for the GC. imagine a Matrix multiplication with two 4x4 Matrices and you know that this will never be of any practically use in any gameloop...

    ReplyDelete
  4. Enjoyed the article and agree with the principles...

    On the other hand, compilers have a way of confounding my preconceptions. For the longest time in C/C++ I'd fret about the copy cost of functions that return by value. And yet, when I look at the ASM I would see the compiler cleverly simply re-assign 'ownership' of a local from a called function rather copy it. That meant I could write code more cleanly and not lose performance.

    I have a sneaking suspicion that the same dynamic could be happening with newer JS compilers. With C++ I could look at the ASM - what tools can we use to see if the boogeyman is real?

    ReplyDelete
  5. You bring up a good point, Matt, and for testing optimization techniques on the level that you're talking about I highly recommend looking at Florian Loitsch's series "Optimizing for V8" (http://floitsch.blogspot.com/2012/03/optimizing-for-v8-introduction.html) I must admit that his some of his articles are a little lower level than I'm used to dealing with but they're still a fascinating peek into the core of a javascript engine.

    In the meantime, though, we've heard from several engineers representing the various browsers lately all urging us to be careful about unnecessary object creation. So I think it's safe to say that minimizing potential garbage collection scenarios is still a valid optimization technique for the immediate future.

    ReplyDelete
  6. I recommend this article by one of the guy who works on V8: http://blog.mrale.ph/post/12396216081/the-trap-of-the-performance-sweet-spot

    I'm researching ways to improve my library for mapping JS to binary data and came upon this. Part of that is understanding exactly the ways such a thing would be used. Right now it's optimized more for translating structures at less frequent intervals or just once (like loading a file format) but I'd like to make it usable for the case you outlined here.

    As it is currently it would allocate a new object each pass to make the translation and then destroy it after use, but that's almost definitely not even good enough here, despite the fact that it would be cleaning up after itself. The cost to constantly make those new objects so often would be too high I think.

    https://github.com/Benvie/reified

    ReplyDelete
  7. I think it's really terrible that JavaScript is still as bad as it ever was at the basic task of managing data, and that people are now having to manually pack data structures into arrays in a language without anything resembling pointers or data references. I guess even the worst possible language for your application is the language of choice if it's the only one that runs in browsers. :/

    In the interim, if you care about this problem you should make sure to take a look at the ES6 Binary Data proposal at http://wiki.ecmascript.org/doku.php?id=harmony:binary_data , and make sure to let those involved (David Herman is probably still the primary person there) know what you think and make sure they understand your use cases. Even if we can ever use Binary Data in the real world (likely in 5+ years based on how fast JS moves), it won't entirely eliminate garbage, but it would at least reduce the number of GC objects for data structures like matrices and vectors to 1, while still preserving named members and allowing you to mix types. Binary data also allows having a structured window into an ArrayBuffer that you can move around, which approximates a lot of pointer use cases in a straightforward way, so that's a step forward too.

    Matt: It's my understanding that JS semantics are still far too complex for SpiderMonkey or V8 to safely do anything like elide copies or do escape analysis to hoist object members onto the stack or anything like that. It is possible to do those sorts of optimizations on another language that compiles down to JS, though; emscripten is able to implement types like Vector3 without incurring GC overhead by mapping them to a virtual heap, and my JSIL .NET->JS translator is able to eliminate many copies by doing simple escape analysis (someone with better CS chops could probably also do stuff like named return value optimization).

    ReplyDelete
  8. Classifying is not as bad as you make out actually (as long as you provide enough methods in the prototypes). And as long as you make sure not to create a lot of new objects of those. A good example is still EWGL-math as you can see in http://stepheneb.github.com/webgl-matrix-benchmarks/matrix_benchmark.html

    ReplyDelete
  9. Hey, Toji! That's a nice post out there.

    Some things I'd like to mention:

    1. The choice of operators. We're kind of used to mathematical notation and use +, - and * all around, but perhaps it could be good for performance to reset the mind to use +=, -= all around instead? This is a habit I got from C++ for some reason; it's probably another flavour of the same notion you suggested with Add(v1, v2, vOut).


    2. Buffers, buffers, buffers!
    If typed arrays are expensive to create, why create them? Let's use as few of them as it's possible! Like, I don't know, one? One per subsystem? And keep all the vectors inside, and always refer to them as their indices, not as objects - so that we'd basically have functions like:

    function Add3(buf, idx, idxx, outIdx) {
       buf[outIdx+0] = buf[idx+0] + buf[idxx+0];
       buf[outIdx+1] = buf[idx+1] + buf[idxx+1];
       buf[outIdx+2] = buf[idx+2] + buf[idxx+2];
    }

    The whole physics could perhaps operate on a constant amount of large fixed-size typed arrays. That could work a lot...


    Let me stress that the above are ideas and nothing more, until somebody implements and profiles them.

    ReplyDelete
  10. If you (or anyone who reads this) would write a small webgl demo that would need some physics (nothing too advance, just simple triangular rigid bodies), which one of the currently available 3d engines would you use? I see all has some drawbacks, and I couldn't decide which to pick yet..
    thanks

    ReplyDelete