Tuesday, August 23, 2011

jsStruct: C-style struct reading in Javascript

Protip: I ramble a lot before getting to the link in question. You probably just want to jump straight to the project.

So in case you haven't picked up on it yet I tend to work with a lot of binary files in Javascript. This is, to put it kindly, an absolute mess. (I would put it unkindly, but this is a family friendly blog!)

Now, to the credit of the browser makers, binary parsing has certainly gotten a lot better in a very short period of time. When I was doing my Quake 2 and Quake 3 demos, the only way to parse binary was to request you file as a raw string from the server and use String.charCodeAt() to grab the bytes one by one and reconstruct them into the appropriate data types. This meant that parsing a float looked like this:

BinaryFile.prototype.readLong = function() {
var off = this.offset;
var buf = this.buffer;
var b0 = buf.charCodeAt(off) & 0xff;
var b1 = buf.charCodeAt(off+1) & 0xff;
var b2 = buf.charCodeAt(off+2) & 0xff;
var b3 = buf.charCodeAt(off+3) & 0xff;
this.offset += 4;
var result = (b0 + (b1 << 8) + (b2 << 16) + (b3 << 24));
return result;

// Code "borrowed" from Google's Numbers.java file in the GWT Quake2 port
BinaryFile.prototype.readFloat = function() {
var i = this.readLong(); // TODO: inline
var exponent = (i >>> 23) & 255;
var significand = i & 0x007fffff;
var result;
if (exponent == 0) {
result = (Math.exp((-126 - 23) * bin_log2) * significand);
} else if (exponent == 255) {
result = significand == 0 ? +Infinity : NaN;
} else {
result = (Math.exp((exponent - 127 - 23) * bin_log2) * (0x00800000 | significand));
return (i & 0x80000000) == 0 ? result : -result;

Yikes! Not only does it look terrifying, it's also slow as dirt.

Fortunately some bright fellow out there (I wish I knew who, I'd love to shake his hand) looked at the ArrayBuffer code that was being built for WebGL's vertex arrays and said "Hey! With just a tiny bit of tweaking we can use this for arbitrary binary manipulation, not just WebGL!" And suddenly the above blob of bit shifting turned into this:

var floatValue = dataView.getFloat32(offset, true);

And there was much rejoicing!

So now we have the very nice Typed Arrays specification, and have even gone so far as to allow XHR calls to return ArrayBuffers directly, which is awesome! Binary is a first class citizen in Javascript for the first time!

Despite the massive improvements, however, there is at least one thing left to be desired: c-style struct reading. For those of you not familiar with the concept, in C/C++ you had the ability to "map" a random chunk of binary data onto a struct with nothing more than some pointer fiddling/casting. This also made reading binary structures from a file incredibly easy and insanely fast. For example, take this structure from the Quake (original) source code:

typedef struct
float mins[3], maxs[3];
float origin[3];
int headnode[MAX_MAP_HULLS];
int visleafs;
int firstface, numfaces;
} dmodel_t;

If we have a large array of those in a binary lump somewhere (like Quake does), we can interpret them in a single, superfast call like so:

dmodel_t* models = (dmodel_t*)(binaryBufferPtr + modelByteOffset);

models is now a pointer to an array of dmodel_t's! Hooray! (C experts: please forgive the gross oversimplification!)

Now, let's say that you want to read this same structure into your javascript code from binary. With the latest and greatest Typed Array-powered code, that would look something like this:

var view = new DataView(arrayBuffer, modelByteOffset);
var model = {
mins: [
view.getFloat32(0, true),
view.getFloat32(4, true),
view.getFloat32(8, true),
maxs: [
view.getFloat32(12, true),
view.getFloat32(16, true),
view.getFloat32(20, true),
origin: [
view.getFloat32(24, true),
view.getFloat32(28, true),
view.getFloat32(32, true),
headnode: [ // I'm assuming MAX_MAP_HULLS == 4, that's probably wrong
view.getInt32(36, true),
view.getInt32(40, true),
view.getInt32(44, true),
view.getInt32(44, true),
visleafs: view.getInt32(48, true),
firstface: view.getInt32(52, true),
numfaces: view.getInt32(56, true),

And, of course, that only reads in a single struct. You need to do that in a for loop if you want to accurately match the original code. (And make sure that the offsets are updated for each loop!)

Now, realistically that isn't too bad. It's certainly legible enough, and as long as you don't inadvertently goof up a byte offset somewhere it's not too hard to write out either. But it's a far cry from our one-line "parse" in good ol' C.

For me, after writing the 21st variation on the above code in my current experimental project, I got sick of counting bytes and decided that there must be a better way. After a bit of research online I didn't come up with anything too promising, so I decided to do what any good programmer would do and write my own! The result is jsStruct.

jsStruct allows you to declare Javascript objects in a way that mimics C declarations. For example, if we wanted to rebuild our previous example struct, it would look like this:

[EDIT: After valid ordering concerns raised by some-truth-some-guy the syntax has been tweaked]
var dmodel_t = Struct.create(
Struct.array("mins", Struct.float32(), 3),
Struct.array("maxs", Struct.float32(), 3),
Struct.array("origin", Struct.float32(), 3),
Struct.array("headnode", Struct.int32(), MAX_MAP_HULLS),

Nice and compact! Of course, the cool part is reading, which now looks like this:

var models = dmodel_t.readStructs(arrayBuffer, modelByteOffset, modelCount);

Yay! Back to one line! models will now contain an array of modelCount dmodel_t objects, which will in turn contain all the appropriate data from your binary buffer. Easy as that!

Now for all the appropriate disclaimers: I have only tested this in Chrome/Chromium, so it may need some tweaking on other browsers. I know that Firefox doesn't yet support Typed Arrays fully, so it may be a bit before this works there. Also, there is absolutely no consideration given to older browsers here, you either support Typed Arrays or you don't use this utility. Same goes for ECMAScript 5. I also haven't added struct writing yet, so this only helps you at the moment if you want to read binary files, not create them.

It should also be pointed out that while this gets us closer to the convenience of C-style struct manipulation, it's still going to be far slower. I've tried to make the struct reading pretty efficient: A new "readStructs" function is custom when you call Struct.create(), so there will be an on-load performance hit as we create the required code dynamically but thereafter it should be about as speedy as javascript can be for this type of operation. At the end of the day, though, we still have to read the values one by one, so we'll never have a prayer of being as fast as a simple pointer assignment.

I honestly have no idea if there's anyone out there other than myself that will find this useful (not too many people are crazy enough to want to muck with binary in Javascript in the first place) but hopefully this will make life a little easier for the next guy that's as crazy as me! :)


  1. Awesome. This is a great idea, and an insanely clean API. Plus it's hidden behind an object so there is the possibility to improve it as time progresses. Sure it's not going to be as fast as pointer math, but who knows, it get close one day if ECMAScript includes optimized binary handling functions.

    As an aside:
    I notice that you're using strict mode. You may wish to 'freeze' your object and 'prevent extension' and by implication make it a static object which may give it a performance benefit to an advanced JIT. I don't think that V8 optimizes for this sort of thing yet, but one day it more than likely will.


  2. Javascript hash tables are specified as unordered. Are you assuming an ordering to define your deserialization routine? Shouldn't a struct definition be an array of name/type pairs?

  3. That's a good point. I did some tests in that regard, and while the keys may show up in any order when inspecting the object through, say, the console they always appear to enumerate in they order they were declared with. That's the case with the various versions of Chrome I tested with, anyway. It would be interesting to see if any of the other browsers actively re-organized the keys. In any case, if they did that would certainly break the current methodology.

    Frankly I'm a little hesitant to switch over to an array-based system, if only because the current syntax is so dang clean, but sticking to the letter of the standards is a pretty compelling argument. :( I'll think about it.

  4. Okay, figured out a good compromise on the ordering issue. I'll push it to GitHub when I get home. The new syntax will look like this:

    var simpleStruct = jsStruct.create(

    var complexStruct = jsStruct.create(
    jsStruct.struct("myStruct", simpleStruct),
    jsStruct.string("myString", 4),
    jsStruct.array("myArray", jsStruct.int8, 4),
    jsStruct.array("myStructArray", simpleStruct, 2)

    Not quite as clean as before, but still pretty manageable! This approach also will improve scaleability in the future, I think.

  5. Very nice!

    Might want to add a "packed" pragma to change the packing to be on 1-byte, 2-byte, etc, boundaries. (I assume you currently use a default alignment of 4 bytes.)

    Might also want to add a big endian / little endian flag.

    Then, maybe bitfields. :-)

  6. *sigh* Nothing is ever simple, is it? :)

    Little/Big endian would be simple, as the DataView already has that built in. Bitfields would be doable, but interesting. Not sure how I would want to handle that. (I have several ideas)

    I'll have to give the packing pragma idea some thought.

  7. Hey Brandon,

    I've been working on something related a year ago, you might find it useful: