Saturday, August 14, 2010

Rendering Quake 3 maps with WebGL: Tech talk

So, I promised I would talk more about the the development side of the Quake 3 demo and I'm here to deliver. Warning! This is going to be somewhat long and technical, so steer clear if you're just here for the shiny graphics!


So first off, why Quake 3? After all, CopperLicht has already been there and done that, and I think many people won't recognize this as much of a step up from the Quake 2 demo I already did. Not to mention that along with my Doom 3 model loader this is my 3rd id-related demo in a row, which may seem a bit boring to some people.


For me it mostly comes down to what I wanted to achieve personally. Thus far each of my demos have had been the result of a specific goal on my part: My Spore demo was just to familiarize myself with WebGL. The Hellknight model was to toy around with animation and game-oriented file formats. The Quake 2 demo was a (somewhat failed) stab at large scenes that the user could navigate. I had intended next to move on to some special effects (I still want to try Megatexture-style streaming, for example), but the problems with the Quake 2 demo left me itching for some closure. So I redirected my efforts and decided that I wanted to do a project that created a "real world" game environment in WebGL. Something that looked and acted like an actual game, running at framerates that would actually be considered playable, without "dumbing it down" for the browser. Partially for the challenge, and partially to demonstrate that yes, WebGL is (or at least will be when released fully) a viable platform for game development.




Given that criteria, Quake 3 very quickly became a natural fit. For one, I own the game and therefore had the appropriate resources at hand. Two, the formats it uses are well documented and easy to extract. Three, the format is similar enough to the Quake 2 maps that I could hit the ground running with the previous projects code. Finally, Quake 3 is very well known. People know what it looks like, what it FEELS like. If I could capture even some of that feeling and bottle it up in a canvas tag I knew it would catch people's attention. Oh, and Quakecon was coming up and I thought it would be really cool to release a demo right around the same time! :)


So, with that in mind the first and foremost concern I had was speed. Quake is fast. Quake is twitchy. Quake is not a 5 FPS slideshow because we can't convince javascript to do any better. I knew that I had to produce something that ran smoothly or the internet at large would dismiss it as "WebGL just isn't fast enough". This performance oriented mentality led to some interesting optimizations, some of which are a bit counter-intuitive at first glance.


The biggest thing I did was to try and reduce the number of draw calls as much as possible. To this end I packed all lightmaps into a single texture so that we could avoid switching textures where possible. I also pre-calculated all curved surfaces in the level (using a fixed number of segments per patch) and stored them as static geometry. All geometry in the level is stored in a single Vertex/Index buffer pair, so I can bind the VBO once at the start of the draw routine and never touch it again. Additionally the indexes were pre-sorted by shader. This allows me to draw all geometry that shares a single shader with one call, and subsequently means that I never make more than one draw call per shader stage.


Along those lines it may surprise some people to know that I am doing absolutely no geometry culling in the demo. The entire map, visible or not, is rendered brute force every frame. This probably sounds like a bad idea but the fascinating part is that I was never able to implement a culling scheme that actually improved performance! Brute force rendering was always faster. And when you think about it that makes a lot of sense.


When a WebGL draw call is made the actual rendering of geometry happens for all intents and purposes at native speeds, since most of it is simply spinning around on the GPU anyway. The only thing that really slows it down at all is jumping back and forth to Javascript, which is many orders of magnitude slower than your GPU and always will be. Thus the more processing we do in Javascript the slower we will go, and the more we can push of the the GPU the better. As such, in the time that it would take us to (in script) figure out where our camera is, trace through the BSP tree, find the potentially visible node set, test all the node geometry for visibility, flag it for rendering, and then actually do the rendering your graphics card could probably have rendered all of that unseen geometry 10 times and not have broken a sweat. So it makes a lot of sense (in this case) to just let the card do it's thing and not worry about whether or not you can see that particular triangle.


Now, that's probably not going to hold true for all WebGL apps, so there's a couple of pointers to keep in mind if you are trying to implement visibility culling for your project:
  • Don't break your geometry up for the sake of culling! Unless you are dealing with many millions of polygons you will be very hard pressed to gain any benefits from doing so. If you do cull, do so based on state: If you can determine that no geometry using a certain shader will be visible then skip that shader entirely, but otherwise try to draw all geometry using a specific shader in one call.
  • Only calculate visibility when you must. For example, when I was still trying to do some culling I would only recalculate the visible shaders if I detected that I had changed nodes in the BSP tree. Otherwise there was no chance that the visible set would have changed, and recalculating it per-frame would just be wasted cycles.
  • If possible, offload your visibility calculations to a web worker. Anything that can avoid blocking the UI (render) thread is a good thing, and most people will be pretty forgiving if a small chunk of a wall isn't visible for a fraction of a second after turning a corner.
Speaking of web workers, it's worth noting that all of the level loading and pre-processing takes place in a worker thread in the demo. This allows the page to remain responsive while the map loads, which is a problem that plagues a lot of WebGL demos. It does take a bit more planning to break up the work in this way, but in the end it makes a big difference in how your app will be perceived. There are parts of this that could be improved with better browser support (For example, allowing typed arrays in a web worker would be beautiful) but even without those niceties the performance is noticeably better. In fact I would like to move more logic to worker threads, such as the collision detection, but just didn't have the time.


Oh, and on the subject of collision detection: there really isn't much to say. A lot of my code is just a tweaked version of the Quake 3 movement code (it's about the only part of the demo that could legitimately be called a "port") because there's not much more that could be done in the way of optimization. Don't fix what isn't broken, I guess. The quirks of web-based controls cause the movement to be a bit more unstable than I would like, but overall it gets the job done.


The other big thing I wanted to talk about was shaders, and it's probably the part that will be the most interesting to the graphically minded amongst you. When Quake 3 was launched the cutting edge cards of the day were all fixed function with, if you were lucky, cool new features like multitexture and register combiners! *Oooooh!* The Quake 3 engine (Or, idTech 3 I guess) was built with this in mind and features a "shader" system that was targeted at the capabilities of the day. Consisting of a plain text set of instructions about things like blending functions, texture coordinate manipulations, and multiple render stages it's a primitive system by today's standards but is amazingly robust in the effects that it can produce.


It's also a reasonably complicated format, with a lot of keywords to track and a bunch of variables that can look fairly obscure until you go digging through the source to find out what they mean. For that reason almost every Quake 3 loader I've seen outside of some commercial ones simply ignore the shaders altogether (including CopperLicht, BTW). And since most of the time the shader names also correspond to a texture name this works pretty well and you can get about 80-90% of you level to look right. That last 10-20% stands out pretty badly, though, and since I was determined to capture that Quake feeling as I mentioned before I couldn't avoid them. In the end I'm very glad I didn't because it makes a huge difference on how people perceive the level as they walk through it.


Initially I was reading in the shaders and translating them into a series of state and uniform changes that were fed into a single global GL shader. This worked for a while, but as the number of shader keywords I was processing increased and I encountered more and more special cases it quickly became apparent that that approach was going to become unworkable. In the end I switched the whole system over so that each stage is now compiled into a custom GL shader that handles all of the texcoord transforms, color modulation, and other effects that don't involve setting GL state. This gives me a couple of benefits in that the shaders are now very flexible and the state I have to bind for each stage is pretty minimal. The code took a bit of tweaking to get right, but I'm very happy with how it ended up. The code to do this is in http://media.tojicode.com/q3bsp/q3shader.js which is used by the worker thread, and http://media.tojicode.com/q3bsp/glq3shader.js which is used by the render thread.


Now if anything I do feel that this is the biggest area for potential improvement in the demo. Right now I'm creating a new GL shader program for each stage of each Quake 3 shader I process. This could probably be reduced quite a bit if I was: A) checking for duplicate stages and re-using the same shader and B) checking to see if multiple stages could be collapsed into one multitextured stage. There's also the fact that although at this point I am able to render most of the game's shaders correctly there are still some parts of the shader format that I'm not handling, and a couple of parts that I could be handling better. For example: I'm not processing alphaGen lightingSpecular properly at all, and I'm not doing anything to handle volumetric fog (which is probably the demo level's biggest omission.) But while there's room to grow I think the basic framework laid down is pretty solid, and I'm happy with it.

The last think I wanted to touch on is that I did make some changes to the original resource files in order to make the demo work online. It was my goal to alter as little as possible about the game files in creating the demo, and I feel like I largely succeeded in this area. I few concessions I did make to the browser are pretty small in the end: 
  • All texture files (JPG and TGA) were converted to PNGs. Obviously this worked out well for the TGAs (which are basically used any time the artists wanted an alpha channel), but in the case of the JPGs it actually increased the filesizes, which wasn't exactly the happiest thing. Considering that WebGL will use a JPG image as a texture just fine this may seem like an odd choice, but it was primarily done to avoid multiple hits against the server. The problem is that Quake never specifies the format of the image it's trying to load. It either has no extension or a "default" extension of TGA, and internally the engine will attempt to open a TGA version and if that fails it will fallback to a JPG. That's all well and good, but that "fail and fallback" process gets a whole lot longer and uglier when the texture file is sitting on a server at the other end of the country somewhere. It certainly would be possible with a few server hacks to allow the engine to request a TGA and receive a JPG back, which may in the end be a better solution. I wanted to have my demo code be 100% client side, however, so the unifying the texture formats was the better plan.
  • Some textures were resized to ensure their dimensions were powers of 2.
  • For any surfaces that have some sort of special effect the engine parses and pulls the effect definition from one of several shader files, each of which describes multiple shaders. The map files don't tell you which shader file the one they're looking for is in, though, and the game simply loads all of them into memory on startup. Once again, this isn't exactly optimal when talking about a web environment so when I was ready to post the demo I hunted down all of the shaders the map I chose used and compiled them by hand into a single shader file. The shaders themselves were not changed, and it's worth noting that demo code has the ability to load and run with multiple shader files (if you look at the source you can see the list of shaders I used to develop locally commented out). In this case it's just much more efficient to have the user download a single file which they will use all of instead of multiple files that they may use part or none of.
And that's about it! Everything else is pulled straight from the PAK files. That means that aside from some minor image conversions (which many programs can do in batches) everything can be dropped in from presumably any Quake 3 level and it will just "go".


So that's my brain dump on the subject, hopefully someone out there finds it at least mildly interesting! This demo has been another great learning experience for me, and has given me some new subjects that I want to investigate in the future. Hopefully the next demo won't take as long as this one did!



UPDATE: It's been almost two years since this was posted, and Daniel P. in the comments below wanted to know how I would improve on the demo today. You can read all about that right here.

6 comments:

  1. Very interesting, thanks. One particular thing I'm snatching from this and scurrying away with is sharing a coordinate VBO among multiple index VBOs.

    A big bottleneck I've found is the actual creation of textures in WebGL. Things get lumpy when a big bunch of JPEGS asynchronously arrive at once and get created together- if only they could be created in a separate thread from the renderer! Do you have any thoughts on that at all?

    cheers,
    Lindsay

    ReplyDelete
  2. Very nice demo and very interesting read. My suggestion for a next project would be webgl loading and displaying doom3 maps, with shadows, normalmapping and stuff. That might probably be too difficult for todays webgl runtimes, but Id find that very interesting :)
    Cheers, Peter

    ReplyDelete
  3. On the subject of textures, I am not doing anything for that in particular but I did have a similar problem with my shaders. The worker thread would finish building them and send them all in one big lump to the render thread, which then stalled for a few seconds while it compiled them all in a tight for loop. My solution was to use setInterval to effectively offload my loop to the message pump, compiling one shader per interval event. (You can see this code in q3bsp.js in the q3bsp.prototype.bindShaders function around line 283) This does slow down the arrival of the compiled shader a bit, but it does allow the app to continue rendering smoothly and if you have a good default material in place (I'm just using a basic checkered pattern) it doesn't cause too much of an issue for the user.

    In regards to the Doom 3 maps, I've actually tried it already (before I did my quake 2 demo) and decided that it wasn't a very appropriate format for the web. Doom 3 maps are broken into multiple files (.map, .proc, materials, etc) all of which contain some little pieces of the final map and usually a lot of editor specific stuff on top of that. You end up reading a lot of information that you don't need, waiting on multiple files before you can show anything, and generally getting annoyed at the format. :)

    Actually, if anything I would love to try loading maps from Valve's source engine next (FYI: They're another variant of the BSP format), and the only thing holding me back from that is that a single Team Fortress 2 map is about 65 megs in size! And that's not including textures or other external resources! The resources for this demo are about 12 megs total, and that works pretty well but I can't imagine trying to load something 6-7 times that size through a browser! Maybe someday, but not right now...

    ReplyDelete
  4. 65 megs in size!

    For "web games" to really blur the lines with local games they're going to have to pre-load and cache game assets over the long term.

    ReplyDelete
  5. almost 2 years ago since you did this, my question is how much could you improve the rendering with your today's knowledge???.

    ReplyDelete
    Replies
    1. That's a great question, and I think it deserves more than just a quick comment on a two year old post! I'll answer it as blog post soon.

      Delete