Friday, March 23, 2012

Javascript memory optimization and texture loading

Over the past couple of weeks I've seen a few great resources about optimizing your javascript. One is a wonderful GCD presentation by Lilli Thompson about best practices when writing code for V8 (Chrome's Javascript engine.) Although the presentation uses Chrome for it's concrete examples, much of the information is very practical for any browser.

GDC 2012: From Console to Chrome

It's a long presentation, but well worth the time. If you simply can't bring yourself to sit and listen for a bit, though, at the very least take a look through her slide deck.

The second resource is a great blog post by Ashley Gullen of Scirra about reducing the amount of time your app spends in that dreaded garbage collecting coma.

How to write low garbage real-time Javascript

I highly encourage you to go through both of those if you are a web developer. Even as an expert dev with years of experience, I'm sure you'll learn something!

So after going through the above resources I had an itch to try an optimize something as an coding exercise. Browsing through my WebGL code, I stumbled on a function that seemed like a perfect target for my scrutiny. If you do much with WebGL I'm guessing that you have a VERY similar function in your own code:

function loadTexture(gl, src, callback) {
    var texture = gl.createTexture();
    var image = new Image();
    image.addEventListener("load", function() {
        gl.bindTexture(gl.TEXTURE_2D, texture);
        gl.texImage2D(gl.TEXTURE_2D, 0, gl.RGBA, gl.RGBA, gl.UNSIGNED_BYTE, image);
    
        if(callback) { callback(texture); }
    });
    image.src = src;
    return texture;
}

Really simple and self contained. Provide a url of an image, get a texture back, and have a callback fired when the texture is fully loaded if desired. I've been using this function in one form or another for almost all of my WebGL texture loading since my first demos.

Looking at it with new eyes after reading Ashley's garbage collection post, though, I recognize that there are a couple of drawbacks to the code. We create a new image, a new event handler, and a new closure along with every texture. That doesn't sound like too much, but we know that they will need to get garbage collected at some point! It's probably not a big issue for most WebGL apps, which commonly load several textures on launch and then remain fairly static. But what if you had an app that was continuously streaming in new textures?

Using Chrome's profiler I recorded a portion of a walkthrough with the RAGE demo to see what the memory usage looked like. The results were less than pretty:


That sawtooth pattern you see at the top is how much memory I'm using, and it's exactly what we don't want to see. The sharp drops are garbage collection sessions, each of which can halt the browser and make my framerate stutter. According to this we have the potential to see performance hiccups about every 2-3 seconds.

Also worrying is the steady climb of the graph at the bottom. That line represents both the number of DOM nodes created by my app AND the number of event listeners. They're exactly 1:1, which makes sense: I create one event listener for every Image. It's troubling to me that this keeps going up and never falls, since that says to me that I likely am hanging onto a reference somewhere that I shouldn't be.

So what can we do to this simple function to make it better?

Well, first off let's consider that even if we attempt to load several hundred images at the same time the browser will only download a few at a time. As such, we can limit ourselves to a fixed number of images that can be loaded at any time (let's say 16) and not have it be detrimental to our performance. This means that we can keep a fixed cache of Image elements that can be reused over the life of the application for texture loading. It also means that we can pre-allocate an array to store them, and since the array size remains static the browser should be able to make it fairly speedy to use:

var MAX_CACHE_IMAGES = 16;
var imageCache = new Array(MAX_CACHE_IMAGES);

We could also pre-allocate 16 Image elements to fill that cache with, but that would be a bit overkill for simple apps. Instead, we'll allocate them as needed up until we've reached the max cache count. That will keep the overhead minimal for those apps that load 3 textures during their lifetime.

We'll set up a system that pushes the images back onto an available stack once they're done loading whatever was requested of them last, and when new image requests are made they're pop from the available Images first. If none are available we'll create a new one if we haven't reached the max image limit yet, and if we've already created the max number of images and none are currently available we'll simply queue up the request. The code for all that will look something like this:

var cacheTop = 0;
var remainingCacheImages = MAX_CACHE_IMAGES;
var pendingRequests = [];


function releaseImage(img) {
    var req;
    if(pendingRequests.length) {
        req = pendingRequests.shift();
        img.src = req;
    } else {
        imageCache[cacheTop++] = img;
    }
}


function loadImage(src) {
    var img;


    if(cacheTop) {
        img = imageCache[--cacheTop];
        img.src = src;
    } else if (remainingCacheImages) {
        img = new Image();
        img.addEventListener("load", function() { 
            releaseImage(img); 
        }, false);
        img.src = src;
        --remainingCacheImages;
    } else {
        pendingRequests.push(src);
    }
};

A few notes about this the choices I've made in this code. First off, in a situation like this where we have a stack of objects that are growing and shrinking (imageCache) it would seem like a natural choice to use push/pop to place images on the array and remove them when needed. That would mean the array size is constantly changing, though. The browser could probably handle that pretty well, but in this case it's an incredibly simple thing to keep our own pointer to the top of the stack (cacheTop). We use it to tell us where the first available image element is in the array, decrementing as we use the images and incrementing as we add them back which means the array never needs to change size.

Also note that we never actually remove images from the array, we just shift the pointer back and forth. This actually means that it's possible for an image to be in the array more than once but we don't care. Any images at indexes higher than cacheTop are considered invalid. Removing them from the array may feel like a good practice but in reality it's just unneeded work.

If our cache is empty and we haven't created 16 images yet we'll allow the call to create a new image and add an event listener to it. This event listener is the only one we'll add to the image during it's lifetime, so the listener and the accompanying closure are a fixed, one time cost just like the image itself.

Things are a little bit messier when we look at the case when no images are available in the cache but we've created all that we're allowed to. We don't want to throw an error back at the user, so instead we push it into a queue of pending requests. And this time we do use push, since we have no idea how many pendingRequests we may accumulate before another cached image is available for loading. It's not going to be as fast as the static array access, but it's flexible and the use is limited enough that the browser should be able to optimize it fairly well. Plus, we should hope that we'll only hit this overflow in extreme situations so it shouldn't need to be super optimal.

So now we've got a decently efficient system for loading lots of images, but it's not doing anything with the image data once loaded. In order for this to be of much use we need it to upload that image data to the GPU as a texture before discarding it. Also, we would like to be notified when the texture is fully loaded, so we'll need a way to provide a callback for that too. In the old code this was all communicated to the image loaded event through the closure, which exposed the variables from it's containing function. The problem that we have here is that we only create the event listener once (and we want it to stay that way!), so while it may have the right information accessible for the first image loaded it won't be right for subsequent ones.

We somehow need to associate additional data like the texture object and callback function with the image being loaded. We could actually add these items directly to the Image element itself. After all, Javascript says that the following is perfectly legal:

var img = new Image();
img.texture = gl.createTexture();

But we also know from Lilli's presentation that slapping new properties onto an object like that makes V8 sad. What to do?

My solution is to create our own object that associates all the required objects for us. So instead of using Image directly in the load/release functions above we can use a custom object like this instead:

var TextureImageLoader = function() {
    var self = this;


    this.gl = null;
    this.texture = null;
    this.callback = null;


    this.image = new Image();
    this.image.addEventListener("load", function() {
        var gl = self.gl;
        gl.bindTexture(gl.TEXTURE_2D, self.texture);
        gl.texImage2D(gl.TEXTURE_2D, 0, gl.RGBA, gl.RGBA, gl.UNSIGNED_BYTE, self.image);


        releaseImage(self);
        if(self.callback) { self.callback(self.texture); }
    });
};


TextureImageLoader.prototype.loadTexture = function(gl, src, texture, callback) {
    this.gl = gl;
    this.texture = texture;
    this.callback = callback;
    this.image.src = src; // Triggers a new load
};

Now the event callback has all the context that it needs and can still be allocated once. This object should also be pretty efficient since it has a well defined property set that we never alter, and we'll only allocate 16 of them at most.

Of course, we'll need to store the additional properties in our pending requests as well, so we'll create another object for it:

var PendingTextureRequest = function(gl, src, texture, callback) {
    this.gl = gl;
    this.src = src;
    this.texture = texture;
    this.callback = callback;
};

This is somewhat unfortunate because it means that every request for an image that we can't handle immediately will create some garbage to track the pending request, but it's probably a safe bet that this simple little object will be a lot less impactful on our memory than creating a new Image for every texture. (Have you ever inspected an Image element? They're crazy complex! All DOM elements are.)

So now lets put that all together and wrap it up in a nice little function to keep our global scope clean:


var loadTexture = (function() {
    var MAX_CACHE_IMAGES = 16;


    var textureImageCache = new Array(MAX_CACHE_IMAGES);
    var cacheTop = 0;
    var remainingCacheImages = MAX_CACHE_IMAGES;
    var pendingTextureRequests = [];


    var TextureImageLoader = function(loadedCallback) {
        var self = this;


        this.gl = null;
        this.texture = null;
        this.callback = null;


        this.image = new Image();
        this.image.addEventListener("load", function() {
            var gl = self.gl;
            gl.bindTexture(gl.TEXTURE_2D, self.texture);
            gl.texImage2D(gl.TEXTURE_2D, 0, gl.RGBA, gl.RGBA, gl.UNSIGNED_BYTE, self.image);
            // Mipmaps generation/Filter modes/etc


            loadedCallback(self);
            if(self.callback) { self.callback(self.texture); }
        });
    };


    TextureImageLoader.prototype.loadTexture = function(gl, src, texture, callback) {
        this.gl = gl;
        this.texture = texture;
        this.callback = callback;
        this.image.src = src;
    };


    var PendingTextureRequest = function(gl, src, texture, callback) {
        this.gl = gl;
        this.src = src;
        this.texture = texture;
        this.callback = callback;
    };


    function releaseTextureImageLoader(til) {
        var req;
        if(pendingTextureRequests.length) {
            req = pendingTextureRequests.shift();
            til.loadTexture(req.gl, req.src, req.texture, req.callback);
        } else {
            textureImageCache[cacheTop++] = til;
        }
    }


    return function(gl, src, callback) {
        var til;
        var texture = gl.createTexture();


        if(cacheTop) {
            til = textureImageCache[--cacheTop];
            til.loadTexture(gl, src, texture, callback);
        } else if (remainingCacheImages) {
            til = new TextureImageLoader(releaseTextureImageLoader);
            til.loadTexture(gl, src, texture, callback);
            --remainingCacheImages;
        } else {
            pendingTextureRequests.push(new PendingTextureRequest(gl, src, texture, callback));
        }


        return texture;
    };
})();

Whew! That's, um... a bit more complex than the old code. Surely replacing ~10 lines of easily readable code with that 60+ line monster can't actually be beneficial, can it? Well... let's give it a spin in the same RAGE demo from earlier:


Wow! What a difference! We still have that unfortunate sawtooth effect, which means that I still have work to do on other parts of this app, but changing this single utility function in my code (albeit a function used incredibly often) had a very visible effect on the amount of garbage created. Whereas before we were doing a garbage collect every 2-3 seconds now we're doing it every 6-7 because the memory grows much slower.

And check out the bottom graph! Whereas previously we had a steadily growing collection of Image elements and event listeners here our graph is completely flat! Our image recycling has made a profound difference. (Both recordings were started after the initial page load, so the image cache had already been filled out.)

But the best part, in my opinion? The function signature didn't change at all! That means that the optimizations made here can be dropped into any code that used the old function and they "magically" get a bit better with no additional effort. Can't beat that! :) Of course this is a relatively minor optimization in the grand scheme of things, but make a few more of these minor optimizations and they can add up to a noticeably smoother experience overall.

This was a fun little exercise for me (that actually resulted in useful code! Woohoo!) and hopefully one that makes for at least an interesting read for the rest of you. I think I'll try and do some more posts like this in the future if the feedback is favorable and I find some bits of code that could use some more love. :)

11 comments:

  1. Great post. I watched that video and have wanted to do some optimisation. Nice to see a good case study :)

    ReplyDelete
  2. Thanks for the wisdom.

    Curious to know how you got the counters to show up in the profiler (in your screenshots: DOM nodes count, event listeners count etc...)? Did you set some flags in about:flags or are you running an early dev version of chrome? (all I know is that the counters don't show up by default in Chrome 17.0.963)

    ReplyDelete
    Replies
    1. right click on screen in Chrome -> Inspect Element -> Timeline -> Memory -> Record (The round grey dot at the bottom left hand side of the developer console.

      Delete
  3. I really dug this post and love the end result.

    Instead of an array `textureImageCache` could just be a normal object literal because you aren't using any of the features of the array (its methods/length).

    You could avoid creating a new function for each `load` event listener by adding a `handleEvent` method to `TextureImageLoader.prototype` and adding the listener via `this.image.addEventListener("load", this, false)`. The `this` binding of the listener, `handleEvent`, will be the TextureImageLoader instance so no need to pass a `self` variable around :D
    See http://www.w3.org/TR/DOM-Level-2-Events/events.html#Events-EventListener.

    Also, it looks like most modern browsers allow 6 parallel connects per hostname (though 35-60 if you start using subdomains) so your queue could most likely be smaller:
    http://www.browserscope.org/?category=network

    Again, awesome post!

    ReplyDelete
  4. In my own vector code I've usually used this style:

    function vadd(a,b,r){ r = r || {x:0,y:0}; r.x = a.x+b.x; r.y = a.y + b.y; return r; }

    That way I get the convenience and later improve the performance where necessary.

    ReplyDelete
    Replies
    1. Just like gl-matrix (https://github.com/toji/gl-matrix) :)

      FYI there are 2d extensions to gl-matrix on this fork: https://github.com/vilya/gl-matrix

      Delete
  5. Thanks! That might become handy in my terrain streaming experiments. I have bigger garbage producers to fix first though. :)

    ReplyDelete
  6. Great article! Thanks for sharing.

    Just a remark, i think the name pool (instead of cache) would be more appropriate to show the intention of reusing this objects.

    ReplyDelete
  7. I've tried these codes in optimizing images, and though what I did was a whole day trial and error, everything here are useful codes. Thanks for sharing!

    ReplyDelete
  8. thank you for the advice, hopefully this way I can blog faster loading

    ReplyDelete