Monday, July 30, 2012

Sprite tile maps on the GPU

I had a fun idea yesterday while playing (what else) Spelunky and decided I'd give it a go. The result is this demo, which I'm reasonably proud of despite it's simplicity.

I know it doesn't look all that impressive, after all the SNES was doing that kind of stuff a long way back and they didn't need any fancy WebGL to do it. Heck, even a rudimentary HTML Canvas renderer can get similar results, so what makes this so special?

Well, what if I told you the entire thing was done on the GPU by drawing a full-screen quad? Interest piqued?


This is a fun little exercise in GPU abuse that may even prove useful in the right context. I'm not going to go super in-depth on every line of code, but I did want to explain the basic technique and discuss some avenues for expansion in the future.

The core concept is this: Everything is driven by two textures. One is your standard tile-based sprite sheet.



And the other is a specially built map of each tile on the screen.



Yes, it's tiny, but that's actually the image that's generating the map you see in the demo! The trick is that in the shader we treat each pixel of that image as a lookup table for the sprite sheet. Sprites are identified by storing the X and Y coordinate in the Red and Green channels respectively. So a Red value of 1 and a Green value of 2 (out of 255) indicates that the tile at (1, 2) - in this case the first cluster of buried gold - is the tile at that point on the map. Thus while the map appears to be drawn in black and white the black contains subtle variations that indicate different tiles. A fully white pixel is taken to mean "No tile here" and nothing is rendered.

Put that basic idea in place along with some minor math-fu to properly lay out all the pixels and take into account things like view offsets and you've got an entire map that can be drawn with a single call! In order to achieve the parallax scrolling as seen on the demo background we actually render two passes, and so the demo draws two quads per frame instead of just one. But still, hard to complain about that! :)

I'm not sure how novel of an idea this is. Certainly all of the elements that make it up have been used elsewhere (the tile lookup is vaguely reminiscent of virtual texturing as seen in RAGE) but I've never seen anyone actually do this kind of rendering on the GPU before. If anyone has some examples of previous implementations of the same concept I'd love to see them!

I'll get a prettified version of the main rendering code on Github soon, but in the meantime you can take a gander at the file used for the demo. At 250 lines it's not a lot of code to parse through, and most of the interesting bits come from the size calculations in the vertex and fragment shaders.

As it's stands right now, the code is surprisingly flexible given it's simplicity:
  • It can render tiles of any size, no need to be powers of two
  • It can handle 65,535 different types of tiles, which should be enough for anybody! ;)
  • The maps can be incredibly large (at least 2048x2048 tiles, depending on max texture size) with effectively no performance hit
  • Visibility culling is "free" and extremely precise (per pixel)
  • The code can scale the tiles by any factor you want (in the demo the 16x16 tiles are rendered at 2X scaling)
  • You can have as many different layers as you want (though there is a performance hit for those.)
  • I really love the idea of being able to embed full game levels in a web page as an image! It would be super easy to share them, and the nature of the image makes it almost a little preview of the map contained within!
Not bad for a morning of hacking, no? There are some downsides, though, so you'll have to take that into consideration.
  • Only supports square tiles aligned to a grid
  • Layer rendering is back-to-front to ensure proper transparency, which means there can be quite a bit of unnecessary overdraw.
  • Viewport offsets are currently snapped to the nearest pixel. Floating point offsets introduce artifacts at tile edges. (Might be able to fix this in the shader)
  • This only really makes sense for static tiles. Anything that moves (player sprites, enemies, etc) would be drawn separately
  • Changing tiles on the fly will require a gl.texSubImage2d call, which will almost certainly be slower to update than a more traditional tile renderer. (But should be acceptable for infrequent changes.)
  • It should also be noted that this IS a WebGL technique, which prevents it from being used on many mobile devices and older hardware where a traditional tile renderer would work just fine.
There are some obvious areas that can be improved: Linear filtering on the tiles is possible, but would require a gutter around each one which the current code hasn't taken into account. Animating tiles (making water bubble or grass wave) should also be possible but hasn't been implemented. You could probably reduce the overdraw too by flagging tiles with transparency to be drawn in a separate pass.

There's also a question of what do with the remaining color channels. I'm only making use of the Red and Green channels, which means the Blue and Alpha channels are available to do... something. My first instinct would be game logic flags, like "Solid" and "Water" or tying into script triggers. You could also use the extra bits for the aforementioned animation or transparency flags.

Finally, building the "map" textures for this kind of rendering by hand is a royal pain. It would need a good editor around it to be really useful, but that shouldn't be difficult to slap together.

In any case, it's a good foundation to build out from. I'm not sure if I'll have the time to flesh it out any further on my own, but I'd be happy to lend a hand to anyone that would like to experiment with adding this technique to their own renderer!

16 comments:

  1. planning on digging deeper into HTML5 gameprogramming and this is awesome :)

    probably a silly question, but do you think an engine like http://browserquest.mozilla.org/ could use this technique?

    wouldn't this decrease "renderingload" a lot for inefficient javascript engines, making it way more scaleable? (if I understand correctly they store mapdata in json)

    ReplyDelete
    Replies
    1. It certainly could, although the screen-by-screen approach they use for rendering makes a technique like this have less of an impact.

      Delete
  2. Very smart, your entire map data is less than 1 Kb!. Like Jeroen said, it's a great improvement over others tiles engines.

    ReplyDelete
  3. Hm, it seems like the dependent texture reads involved in this would be kind of expensive. Is it possible to do the kind of procedural geometry generation you'd need to generate a quad-per-tile in WebGL?

    Also, to change individual tiles you could probably use sub region updates (texSubImage2D, etc), and you could possibly just store the tile indexes in a different kind of buffer - assuming WebGL has wide enough support for that, it would probably also let you index more than 65k tiles.

    ReplyDelete
    Replies
    1. "Is it possible to do the kind of procedural geometry generation you'd need to generate a quad-per-tile in WebGL?"

      Since WebGL lacks geometry shaders the obvious way of going about that isn't available. It would be easy, however, to figure out how many tiles can be on screen at once and generate a buffer of that many quads + some bordering geometry to account for scrolling. You still need to adjust the texture coords on the fly, though. While you could do that in Javascript with a separate texcoord buffer, I'm not sure that would be faster (unless you were standing still). May be interesting to test.

      I'm going to add some simple toggles to the demo soon for the sake of benchmarking (turning off the background layer, skipping the tile lookup, etc) so hopefully that will let us see how expensive the technique really is.

      Delete
    2. Dependent reads aren't so bad for modern hardware...

      While not quite the same, http://developer.download.nvidia.com/shaderlibrary/webpages/screenshots/shaders/post_halftone_dots.html is similar -- it grabs grayscale as a texture lookup (into a volume texture -- if all your tiles were the same size, you could do that too)

      Delete
  4. The dependent texture fetches are highly coherent, and are likely cached on-chip. Doing any kind of work per-tile would have to happen in JS and cause more CPU->GPU communication than you want. It might make more sense if the tiles were highly dynamic. If you're doing 2D, I can't image you're likely to be GPU-bound.

    If you want to do filtering, you might have some issues since your texture derivatives are going to be strange at tile boundaries (a similar problem to virtual texturing).

    ReplyDelete
  5. I did a quick port today of this code to XNA: https://bitbucket.org/nickgravelyn/gputilemap. Also has a small app to convert TMX maps made with the Tiled editor into the image maps used by the algorithm. This is a really cool algorithm; thanks for sharing!

    ReplyDelete
  6. This mechanism is more commonly known as a Texture Atlas http://en.wikipedia.org/wiki/Texture_atlas

    Recent interest in games like Minecraft has renewed people's curiosity for this technique.

    "Viewport offsets are currently snapped to the nearest pixel. Floating point offsets introduce artifacts at tile edges. (Might be able to fix this in the shader)"

    You can indeed alleviate the artifacts (AKA bleeding) at tile boundaries by introducing a "margin" factor into your texture coordinates. I have employed this specific solution with success in my Minecraft-inspired WebGL game: https://github.com/feisty/toe/blob/master/lib/extract.coffee#L15

    ReplyDelete
  7. Damn, you beat me to implement this idea. As soon as I read about virtual texturing, I was reminded of tile based systems like the early consoles. You could easily do multiple layers with blending like the SNES.

    For animation you could use some of the variables for shaders which would just count up and would add an offset to the tile index. The tile map would then just have animated tiles adjacent to each other. You could have several variables for different number of animation cycles, like 4, 8 and 16 frames etc. If you have a lot of different animations cycles, then just use another lookup texture which would be pretty quick to update once per frame.

    ReplyDelete
  8. Several render warnings in the Chrome console for this demo? Something up?

    ReplyDelete
    Replies
    1. I've been seeing that a lot lately but haven't dived into it too much. I think it's that I create a texture and return it while downloading the associated image. This causes several frames to be drawn with a texture that has no initialization other than gl.createTexture(), which apparently the gl backend is not happy with.

      I could fix it by initializing those textures to a 1x1 black until the image arrives, but I'd want to know the performance characteristics of that and haven't had the time yet to measure it. I could also only return the texture when it's done loading and throw some tests around the render loop, but I feel that clutters up the code unnecessarily. So for the moment the warning has stayed.

      I really SHOULD do something about it, though.

      Delete
  9. Instead of sorting shit back to front, why don't you simply enable depth testing and store the depth in the remaining color channel?
    And then when drawing the sprites, simply take the depth value from that color channel.

    That would be no solution for transparency though... just for the two states: "Transparent" and "Opaque"

    ReplyDelete
  10. You said you draw two quads?
    Does that mean you draw the "Mask" over the whole screen first, and then, in the 2nd pass, you render the tiles onto it?
    If yes, why don't you simply do it in a single pass? Simply change the texture coordinates when reading the mask...

    ReplyDelete
  11. This comment has been removed by the author.

    ReplyDelete
  12. This is a neat idea and I thank you for presenting it. I suspect, however, that it is likely to be quite a bit slower than the usual alternatives for the majority of situations. You're essentially trading fill rate (that is, you're making your fragment/pixel shader much more expensive) in exchange for geometry (you're eliminating N quads and using just 1). This would tend to be a winning trade only where (1) fragment shading is very cheap and (2) the particular game in question would otherwise use a very great number of quads. But fragment shading is generally not cheap, especially when dependent texture reads are required (as they are here). And few games use tiles that are so small and numerous that the geometry cost dominates even an inexpensive fragment shader cost; in other words, a typical, cheap fragment shader is still expensive enough that it dominates performance for most games.

    My hypothesis would be that unless your tiles are each just a few screen pixels in size, this technique will be slower. Have you done any comparative performance testing that would prove or disprove my hypothesis?

    ReplyDelete