Sunday, July 4, 2010

WebGL's greatest challenge as a gaming platform

[UPDATE: I noticed this post still gets a decent amount of traffic, so I figured it's worth pointing out that it's hopelessly out of date at this point. Microsoft supports WebGL now (mostly), JS is faster all the time and we have asm.js now to boot, and browser have largely embraced fullscreen, pointer lock, gamepad, and camera APIs. So, yeah... don't reference this post for anything. Ever.

Post is preserved here simply for the sake of mocking my (lack of) predictive powers.] 

This may come as a surprise to some of you, but I'm rather fond of WebGL. :) And if you've been following some of my demos you'll probably notice that they tend to have a somewhat game-oriented tilt to them. This is fairly natural for me, since I'm both a gamer and a programmer. It's two passions in one sweet package, kinda like a Reeses Peanut Butter cup.

Anyway, I'm a firm believer that as it matures WebGL will play a big part in game development, just as the web has already altered the game industry. I doubt that we'll ever reach a point that browser based games take over entirely, but they're going to keep getting bigger and better and the lines between desktop and browser games are going to blur. Some people will embrace this, some will fight it, and if nothing else it will be an interesting ride for everyone involved.

There are some key limitations, however, that need to be overcome before WebGL (and the browser as a whole) can really truly be considered a full fledged, general purpose gaming platform. And those issues are going to be hard to solve. Incredibly hard, for both technical and theological reasons. So hard, in fact, I'm not sure that the minds behind the web today are able to fix them, or even that they want to. In fact, I'm not convinced that they should.

Before we jump into that, though, I want to talk about some perceived "issues" that I've seen cropping up frequently around the web. There are a myriad of reasons people don't think WebGL will work out, and most of them are either very shortsighted or downright wrong.

Please keep in mind that I'm talking mostly about 3D gaming here, since I think just about any other application of WebGL (ie: medical imaging, mapping, architecture, education, data visualization, etc.) will succeed practically without trying. Gaming stands alone in how demanding it can be on the average users system, and thus is concerned about factors that many other applications aren't. That being the case it's easy to see why people might have their doubts, but in my opinion the following problems aren't going to be very problematic in the long run:

It seems that any time people start talking up WebGL there's at least one guy in the crowd that has to throw in: "Yeah, that's cool and all. But Microsoft is going to do their own thing and everyone will just use that, so WebGL won't really matter." This is so painfully narrow minded that it physically hurts me to hear otherwise intelligent people say it.

Let's look at the known facts on this one: Microsoft has yet to announce support of any browser-based 3D API for IE9, WebGL or their otherwise. And... that's about it! They've been pretty tight lipped on the subject. We do know is that Microsoft is pushing very hard for a standards based, HTML5 oriented browser with their next revision. We know that they will be including items like Canvas, Video, and Audio tags, vastly improved CSS support, and a host of other happy things that have nothing to do with promoting a closed Microsoft ecosystem. Given that, it would actually seem quite out of place for them to about-face and try pushing a Windows-specific "DirectWeb". If anything my impression is that they're waiting for WebGL to mature and prove itself a bit more before announcing support (since WebGL is still a very young and experimental tech.)

But let's look at the different scenarios and their likely fallout anyway, just for kicks:
  • Microsoft supports WebGL - Yay! And there was much rejoicing!
  • Microsoft doesn't support a 3D API at all - In this case someone will probably develop a plugin that allows IE to run WebGL content, or those individuals that want to use it will install Chrome Frame and be done with it. This still leaves WebGL as the defacto standard, though, so there's no reason not to code to it.
  • Microsoft rejects WebGL and pushes a proprietary API - This is where things get more interesting. So now we have ~45% of the web that can use WebGL and ~55% of the web that uses some other API. For starters, those are large enough numbers on both sides of the fence that no serious commercial venture would dare to ignore either of them, and you would probably see a lot of sites that can use both APIs depending on the browser, or simply use a wrapper layer or engine to hide the difference away. But wait! There's more! You see, those numbers aren't entirely accurate since they represent browsers as a whole, not browsers with 3D support. So let's look a bit closer:

    (FYI: I'm looking at Wikipedia's entry for my numbers. Percentages can vary wildly depending on which stat counter you ask, but this one seems like it hits a good middle ground)

    According to these stats, IE is used by a little over 50% of the market, but it's the breakdown of browser versions that we care about. IE6 weighs in at about 30%, IE7 around 21%, and the remainder is IE8, which means only about 50% of IE users are working with the latest version. Now when IE9 is introduced we can't expect every IE user to jump ship and start using it. I would imagine that most of those still on IE6 are there by company mandate, not by choice, so that number will probably shift very little. People on IE7 obviously aren't in much of a hurry to upgrade either, so we can probably assume that number will fall pretty slowly too. IE8 users are going to be the most likely to upgrade, so best case scenario is that a little over 50% of all IE users make the move. But some of them won't be able to, since IE9 will be Vista/7 only. Digging for some stats there shows that only about 26% of Windows machines reported are running Vista or higher. But it's probably fair to say that people running these OSes are more of your early adopter types and are probably more likely to upgrade too, so that can skew the number of IE9 users a bit higher. Let's just be generous and presume that within a few months of release about a third of all IE users will be running IE9, OK? Great.

    So now we're looking at 33% of 55% of the browser market share (~18%) that are using a browser with Microsoft's proprietary solution. And that's not even really taking into account mobile devices, which are becoming an ever more important part of the web ecosystem. Now, ask yourself: how much traction is that API really going to get? Thing is, I bet Microsoft has run the same numbers (probably far more accurately than I have here) and can see the same thing. I doubt they want to waste resources fighting that battle.

So when it comes right down to it, Microsoft can do whatever they want in regards to 3D online and I don't think it will change the situation significantly. WebGL is here to stay.

Javascript Performance
This is an argument that has a little more weight to it because it's undeniable that Javascript is slow. Yes, sorry Chrome, but even with your fancy V8 engine JS performance pales in comparison to native code. Asking an industry that is impossibly processing power hungry to accept huge performance hits on top-of-the-line hardware is a rough pill to swallow. There are a few points to keep in mind in this area though:
  • We have proven time and time again that we don't need massive processors to create compelling experiences. Take the Nintendo DS, PSP, or iPhone for example. All are great pieces of hardware, but they all fall far short of a 360 or PS3 hardware wise, and are miles behind a high end PC. Yet gaming is alive and well on these devices. Surely if the game industry can produce AAA quality titles for these devices we can work with some less-than-blazing javascript performance.
  • Web workers will go a long way towards allowing us to intelligently split up the workload. The trend towards multiprocessor development is well underway, and web workers have some of the cleanest, safest implementations of threading I've seen to date. This isn't anything unique to the web either, and we should be able to harness that power just as easily as anyone else. Probably better, actually, since your average web developer is forced to think asynchronously from day 1.
  • There are some very large companies out there that stand to benefit greatly from making Javascript faster, and these same companies have the time, resources, talent, and motivation to make it happen. There's no question that performance will continue to get better as time goes on, and I'd be willing to bet that Moore's law will be in full effect here.

So yes, Javacript is slow, but it's continually getting faster, and we know we can work within those limits anyway. It's not going to stop anyone. Finally, WebGL does nothing to diminish the processing speed of your GPU. Once that draw call is made you're basically running at native speeds, and there are some positively jaw dropping things that can be done in shaders these days. So, is Crysis 3 going to be browser based? No. But I'm certain that we'll be amazed at what will be done with this tech, processing limitations and all.

Even ignoring Javascript, WebGL itself is slow!
This tends to be true right now, but not for the reasons many people would expect. In a traditional desktop 3D app there tends to be two places that things get bottlenecked: You're either CPU limited, where the CPU is constantly running at 100% and the GPU is sitting around waiting for work to do or you are GPU limited, which is just the opposite. There are other things that can act as the limiting factor, but those are the big ones.

WebGL is similar, but with a few twists thrown in. For one, it's pretty difficult to become GPU limited in WebGL in a real-world scenario, since Javascript typically isn't capable of generating draw calls fast enough to overwhelm the GPU. It certainly is possible to become CPU limited if your script it too complex, and this may be the cause for the slowdown on some demos. We have a third bottleneck point here that's not present in your typical apps, though, and that's what I'll call "compositing limited".

In a web page all of the visual elements (text, divs, images, etc) need to be composited together to form the final page you see. It's kinda like working with layers in Photoshop. Some plugins like Flash can subvert this, since they're explicitly opaque (You can't see any of the page underneath them), but WebGL elements can mesh completely with other page elements, creating some really cool effects. There's a great demo highlighting this effect here. It's (in my opinion) one of the most powerful aspects of WebGL. That means, though, that a WebGL element has to synchronize it's display with the rest of the page, which can lead to some pretty sever limitations on display rates, especially for larger WebGL canvases.

The practical effect of this is that scenes which would otherwise be running at insanely high framerates are limited to 30-40 fps, which is not exactly awe inspiring performance. But it's not the Javascript or the rendering that's slowing it down, it's the process of embedding it in the page.

The fortunate part here is that there are individuals at Google and Mozilla right now working on this exact problem, and I have a lot of faith that they will solve it. Please keep in mind that WebGL is still in it's infancy and there are still some bugs to be worked out. This is one of them, and I fully expect that it will disappear in the future, especially as browsers move to hardware accelerated compositing.

Intel's OpenGL drivers suck
Yes. Yes they do. Actually, many Intel chips can't even run GL2.0 level shaders so it's a moot point. For the rest of them: ANGLE.

Okay, so if none of the above items are the problem, what is? The full answer is somewhat complex and multi-faceted, but it really comes down to one overarching theme:

Browsers don't want to ever take control away from the user

But that's a good thing, right? I mean, I WANT to be in control of my browsing experience. Well... yes and no.

Consider this: When was the last time you were playing a game that didn't allow you to directly control your mouse cursor? This is standard fare for pretty much any FPS or third-person action game. Moving the mouse is typically thought of as "looking around" or "aiming". In most of these cases you never even have a cursor, just a little permanently centered aiming reticle. Now consider your browser. How would you replicate that same type of control scheme? Short answer: You can't. (Not without some sort of plugin, but this is HTML 5, remember. We're trying to escape plugins, not require more of them.) Your browser simply doesn't have a built in mechanism for hiding the mouse or restricting it's movements. So now you've got a cursor flying around the screen while you play your in browser FPS, which means that if you fire your gun at the wrong time you've just closed your window, opened another app, deleted some program or shortcut... Bad Things© all around. In order to prevent this kind of inadvertent clicking the web app would have to take control of the mouse cursor away from the user for a little while, probably hiding it altogether until the user gives some sort of indication (like pressing escape) that they want it back, at which point they need to trust that the web app will actually respond to that request.

So now we should ask the big question: Should the browser allow for a page to take that kind of control of your system? Probably not, because anything that your game can do will also be available to any other website for any other purpose. That means every ad, scam, and phishing site would also have the ability to hijack your cursor at any time without your permission, and that's also a Bad Thing©. After all, would you trust a phishing site to give you back your mouse cursor when you asked it to? More to the point, do you want it to take it away in the first place?

And there we have our paradox. In order to allow for certain types of applications we have to implicitly take control away from the user, possibly to their detriment. But by taking away that control we lose a piece of what makes the web such an attractive thing in the first place. In a space as volatile and potentially malicious as the internet it is a perfectly reasonable choice on the part of the browser to say "Sorry, but I favor the user". But that puts some fairly severe constraints on what kind of games we can reasonably create inside a browser. And it's not just mouse control either, there's a myriad of different limitations like this. For example, while developing my Quake 2 demo I frequently and unconsciously found myself trying to "crouch" (move down) and forward at the same time (Ctrl+W). This was... frustrating to the say the least. We also don't have a true fullscreen mode, or access to gamepads, or the ability to use a webcam or microphone... the list goes on. All of these things seem perfectly reasonable in the context of a game and absolutely nightmarish when you consider giving that same capability to an online advertisement. Thus the biggest challenge WebGL faces as a gaming platform is getting the browser to let go enough that we can craft the experiences we want while still giving the user a safe environment to play in. No small feat.

So what can we actually do about it? Well, I would imagine that the first round of WebGL games that we see will either avoid these issues by design (a Diablo clone would work great, for instance) or will use a mixture of HTML 5 and plugins like Flash to achieve the desired effects. Sure, that breaks the "pure HTML" model that we all claim to love, but when it comes to creating a product most companies could care less about such idealism and will just go with what works. After that... who knows! Maybe there is a way that we can allow all of these things and still keep the user in control. If there is I certainly haven't thought of it, but I'm sure there are better minds than mine working on this and other problems right now. Until that time we're going to have to accept that there are trade-offs for working on the web, just as there always has been.

But maybe that's not such a bad thing either. Necessity is the mother of invention after all, and some of the most amazing advances of the digital age have come out of people working with the worst of limitations. Perhaps what we really need isn't more control but a different perspective, a new way of looking at things like gaming that we've never thought of before. Heaven knows that if there's anything the internet is good at it's crushing the old ways of doing things in favor of the new.


  1. Microsoft doesn't just have one proprietary 3D API, but it has three!
    First, there is Direct3D for ActiveX (which is still the native plug-in API of choice for Internet Explorer). However, I don't believe in that as a real contender.
    Then, there's SilverLight and its close cousin the Windows Presentation Framework. This is still getting a fair bit of support from within Microsoft, and it runs in Microsoft browsers.
    Finally, there is the Microsoft.Xna framework, which is available on Windows through trusted code plug-ins, and on Xbox, and also announced for the Windows Phone. This, if anything, will be Microsoft's challenger to WebGL.

    That being said, there are much lower-level problems to gaming on WebGL. How about trying to write a keyboard command binding editor for international keyboards in WebGL, JavaScript and HTML DOM? No can do. Hard-coded US keyboard layout assumptions are unlikely to wow with user friendliness...
    Low-latency audio and networking also comes to mind, as does a bunch of other similar, pesky things that we take for granted in the native development world, but really don't have at all inside a browser.
    Hopefully, this can all be solved in the end, and someone will provide Gecko with WebGL as an ActiveX plugin for IE -- but there's a long, long way to go until we're there.

  2. Regarding the mouse capture problem, this seems similar to the fullscreen video tag problem. In the HTML5 spec they say,

    "User agents should not provide a public API to cause videos to be shown full-screen. A script, combined with a carefully crafted video file, could trick the user into thinking a system-modal dialog had been shown, and prompt the user for a password. There is also the danger of "mere" annoyance, with pages launching full-screen videos when links are clicked or pages navigated. Instead, user-agent specific interface features may be provided to easily allow the user to obtain a full-screen playback mode."

    Similarly, perhaps they could allow mouse capture with a canvas tag? (with the same considerations... e.g. that it would have no programatic fullscreen but it would have it available under a right-click context menu or something like that).

    This also works around the compositing problem of video and canvas tags.

  3. I think if we want to go down the html5 apps road we will need some kind of privilege system, not just for games. Something a bit like the system already in place for the geolocation api. In the end I think it should be like the dialog you get when you install apps on android.

  4. Yeah, the most effective thing I can think of in these kind of scenarios is to have a dialog of some sort that says "This page is requesting greater control of your mouse/webcam/whatever. Do you wish to allow?" Or maybe an explicit, browser controlled button that the user must click to enter that privileged mode. In both cases, it would be good to have an always-available key that you could hit to force the app to release control. My biggest worry there is that users are notorious for ignoring dialogs, so I don't know how much protection that would actually offer.

    Actually, one tradeoff that I just considered which might make life a little better is that when the browser is in "privileged mouse mode" any attempts to navigate to new pages via the mouse should be ignored. That would alleviate some of the fears of rouge apps trying to force you to click on their ads.

    In any case, there's no question that it's a tricky problem but I think we'll eventually come up with a decent system to it. The evolutionary path of the web pretty much demands it!

  5. I have actually thought up your concern about user input all by myself earlier today. Well, in a more compact fashion.

    There is actually a rather simple solution to this problem: implement these features, but only allow them on those specific sites that you trust. Much like how scripts, popups and embedded content can now be blocked, it would be nice if the security measures are a required part of the API's implementation.

    Going a step further, perhaps "trusted certificates", much like the ones used on secure web pages, could be used. And a blacklist of bad scripts/sites would be a must, too.

  6. Random Thoughts

    Flash needs gamepad support as much as webgl. Maybe the 2 can collaborate on a standard. Do macs support gamepads as well as windows? Maybe apple could help out with a cross platform gamepad interface interaction system that also works for the web and flash. Afterall, aren't iOS games essentially "full screen"?
    I hate controls in flash games. Not only do they not cleanly take over control of the mouse, they almost NEVER allow you to configure your input keys like a proper pc/console game would.


    As for taking control away from the user, here's what I think would work safely:

    The os will run a service (the user will have to install this at least once, maybe it'll be pre-installed in the future) that is designed to interact with the mouse input, keyboard input, gamepad input and the web (canvas/flash).

    The users browser will need to have an add-on or plug-in (I remember instant-action using addons while interstellar marines uses a plug-in) that they must manually install (hopefully only once) which works for all sites and then...

    If a website is in my user defined list of web games I play.
    And if a certain setting/variable exists in the javascript somewhere on the page.
    And if I have the service runnign and listening in the background of the OS.
    And if I have the addon installed.
    And if a special canvas/flash object on the page has focus (will canvas be able to go "full screen"?)
    And if I then press Shift+F1
    Then windows will take control of the keyboard and mouse in a manner similar to the way it does when I lunch a full screen session of peggle.

    If the game happens to be full screen the contol will most likely feel like a normal full screen game.

    If the game is a box on a page, you'll still see everything else you'd normally see but windows makes an invisible barrier over everything and the mouse/keyboard can only affect what's in the area where the game-box is.

    Hit ESC and the overlay disappears while the game menu appears. The user can click the game menu or the page or their taskbar.

    The browser and everything else will have to know not to respond "normally" to any input while the overlay is active.

    The browser shouldn't have the ability to take contol. It should just be able to follow standardized protocols that give special control to the users operating system.

    "Windows Advanced Web Input Control Panel" in windows
    "Web Input" in OS X
    "some randomly named system daemon" in Linux

    I've seen too much cool webcam stuff from Mr. Doob. The open web needs webcam and microphone access. Flash has done a good job of keeping the user in control of their hardware. The browsers can do it just as well.

  7. I would agree that the control for games that heavily relied on buttons and mouse actions could be a big problem for WebGL. however I think in the FPS scenario, mouse aiming can be avoid altogether by using the video game console method: dual stick controllers. for example, "wasd" to move, "ijkl" to aim. with some gamepad/keyboard mapping software, it is possible to create experience close to game console on WebGL. It does suck you need a gamepad mapping software and configure it in the first place. I would hope that all major bowrsers includes gamepad support, that would truly make WegGL worth game developers full attention.

  8. One of the reasons that the dual-stick configuration works on a console is that the thumb sticks are analog. Tapping the stick only slightly will move your view only slightly, while pushing it to it's extents will whip your view around quickly.

    Keyboards, on the other hand, are binary on/off states, so movement of your viewport would always be at a constant speed. This would make for a very frustrating game experience. Imagine trying to use a precision sniper rifle and make quick 180 turns with the same movement speed. Not fun.

    As for gamepad support, the big question there is what standard to support. There's only two really well accepted gamepad standards in PC land right now: Direct Input and XInput (360 controller API), which of course are both Microsoft controlled. Now these are both great API's, especially XInput's ease-of-use, but I can't imagine the open-standards-happy HTML 5 crew adopting a Microsoft protocol for Microsoft hardware as easily as they did OpenGL. Far more likely is that they would want to roll their own interface, but that opens a totally different can of worms about what kind of hardware you target and how it interacts with drivers and so on.

    It's all doable, of course, but considering that it would only really benefit games (whereas something like WebGL can be used for many applications) I doubt it's high on anyone's priority list.

  9. IF you really think that programming games in javascript inside a browser is a step forward, you really need to lay off whatever the hell it is you're smoking. Javascript was designed as a scripting language. Contorting it to the point where it is competing with superior languages (read: C) is insane. Sure, people may write games in this interpreted language which lacks support for many OO concepts, but in reality, the best games will never be programmed in javascript. Bleeding edge games will always be written in lower level languages because optimization is not expendable. You will not see Call of Duty, or Final Fantasy ever written in JavaScript.

    Why anyone seems to think that microsoft is going to play nice with regards to browser compatibility is beyond me. When has MS EVER played nice? They can't even get their html rendering to play nice, what makes you think they can get (or even want to get) a full blown graphics language to play nice?

    Why does anyone out there want to make life harder for themselves by having to support and take care of multiple browsers and their various inconsistencies? It's far easier to use a plugin where you don't have to worry about cross-broswer bullshit.

  10. @solid: I agree with some of the things you said, but I must correct you on some things...

    "Sure, people may write games in this interpreted language which lacks support for many OO concepts"

    Having been using Javascript a lot lately I can say that you're wrong here. It fully supports OO concepts, you just have to implement a bit differently than you're used to.

    "You will not see Call of Duty, or Final Fantasy ever written in JavaScript."

    To me this seems to have a LOT more to do with licensing issues rather than technological issues to be honest.