Wednesday, April 25, 2012

Total Annihilation Graphics Engine

For a long time I've wanted to spend some time writing down my recollections of what I did on the TA graphics engine.  It was a weird time, just before hardware acceleration showed up.  Early hardware acceleration had pretty insane driver overhead.  For example the first glide API did triangle setup because the hardware didn't have it yet. Accelerated transform was out of the question. Anyway none of this was really a factor because that stuff was just showing up when we were working on TA and we couldn't have sold any games on it.

Anyway I met Chris at GDC in 1996 and he fairly quickly offered me a job working on the game.  I had just wrapped up work on Radix a few months before and was looking for something new since most of the Radix guys were going back to school.

So I went back to Ottawa and while I waited for visa paperwork to move to the states I ended up writing Thred which became a whole other story that I'll talk about some other time.  Once the visa paperwork came through I moved to Seattle at the end of July 1996 just in time for Seafair.

Monday morning rolls around and I start meeting my new co-workers and getting the vibe.  I got a brand new smoking hot Pentium 166Mhz right out of a Dell box.  Upgraded to 32mb of ram even!  That was the first time I ever saw a DIMM incidentally. We all ooo'd and ahhh'd over this new amazing DIMM technology. I was super excited to be there and actually getting paid too!

I had already done a bit of work remotely so I had a little bit of an idea about the code but I hadn't seen the whole picture. The engine was primarily written in C using mostly fixed point math.  At that point using floats wasn't really done but it made sense to start using them. So we did.  This means we ended up with an engine that was a blend of fixed point and floating point, including a decent amount of floating point asm code.  Ugh.  Jeff and I both tried to rip out the fixed point stuff but it was ingrained too deep.  Oh well.

So my primary challenge on the rendering side was to increase the performance of the unit rendering, improve image quality and add new features to support the game play.

The engine was also very limited graphically in a lot of ways because it was using an 8-bit palette. This meant I had to use a lot of lookup tables to do things like simple alpha blending.  Even simple Gouraud shading would require a lookup table for the light intensity value. Nasty compared to what we do today.  The artists did come up with a versatile palette for the game but 256 colors is still 256 colors at the end of the day.

Getting all of the units to render as real 3d objects was slow.  Basically all of the units and buildings were 3d models.  Everything else was either a tiled terrain backdrop or what we called a feature which was just an animated sprite (e.g. trees). 

So there were a few obvious things to do to make this faster.  One of them was to somehow cache the 3d units and turn them into a sprite which could be rendered a lot more quickly.  For a normal unit like a tank we would cache off a bitmap that contained the image of the tank rendered at the correct orientation (we call this an imposter today, look up talisman). There was a caching system with a pool that we could ask to give us the bitmap.  It could de-allocate to make room in the cache using a simple round robin scheme.  The more memory your machine had the bigger the cache was up to some limit.  We would store off the orientation of that image and then simply blt it to the screen to draw the tank.  If a tank was driving across flat terrain at the same angle we could move the bitmap around because we used an orthographic projection.  Units sitting on the ground doing nothing were effectively turned into bitmaps.  Wreckage too.

There was another wrinkle here; the actual units were made from polygons that had to be sorted.  But sometimes the animators would move the polys through each other which caused weird popping so a static sorting was no good.  In addition it didn't handle intersection at all.  So I decided to double the size of the bitmap that I used and Z-buffer the unit (in 8-bits) only against itself.  So it was still turned into a bitmap but at least the unit itself could intersect, animate etc without having worry about it. I think at the time this was the correct decision and actually having a full screen Z-buffer for the game probably also would have been the correct decision (instead we rendered in layers).

Now all of this sounds great but there were other issues.  For example a lot of units moving on the screen at the same time could still bring the machine to its knees.  I could limit this to some extent by limiting the numbers of units that got updated any given frame.  For example rotation could be snapped more which means not every unit has to get rendered every frame.  Of course units of the same type with the same transform could just use the same sprite.  Even with everything I could come up with at the time you could still worst case it and kill performance.  Sorry!  I was given a task that was pretty hard and I did my best.

Once I had all the moving units going I realized I had a problem.  The animators wanted the buildings to animate with spinney things and other objects that moved every frame!  The buildings were some of the most expensive units to render because of their size and complexity.  By even animating one part they were flushing the cache every frame and killing performance.  So I came up with another idea.  I split the building into animating and non-animating parts.  I pre-rendered the non-animating parts into a buffer and kept around the z-buffer.  Then each frame I rendered just the animating parts against that base texture using the z-buffer and then used the result for the screen.  I retrospect I could have sped this up by doing this part on the screen itself but there were some logistical issues due to other optimizations.

After I had the building split out, the animating stuff split out, the z-buffering and the caching I still had a few more things I needed to do.  I haven't talked about shadows at all.  Unit shadows and building shadows were handled differently.  Unit shadows simply took the cached texture and rendered it offset from the unit with a special shader (shader haha it was a special blt routine really) that used a darkening palette lookup.  E.g. if there was anything at that texel just render shadow there like an alpha test type deal.  This gave me some extra bang for the buck in the caching because I had another great use for that texture and I think the shadows hold up well.

Not all was well in shadow land with it came to buildings though.  Due to their tall spires and general complexity I decided to go ahead and properly project the shadows.  This ended up significantly increasing the footprint of the buildings and the fill rate started to become sub-optimal because a single building could really take up a lot of the screen.  Render the shadow (which overlaps the building and a lot more) then render the building itself on top and you are just wasting bandwidth.  So the next step was to render the projected shadow, render the building (both into the cache) then cut out the shape of the building from the shadow and then RLE encode the shadow since it's all the same intensity.   Now rendering consisted of render the shadow (not overlapping and faster because it's a few RLE spans) and then render the building.  Ahhh... way faster.

Now the whole way that TA did texture mapping was just screwed.  Frankly we had no idea what we were doing.  Jeff knew it was fucked but it was just so already built that it wasn't changeable in the time we had anyway.  I could do a 100x better job at this today 16 years later.

So we had some pretty serious image quality issues mostly related to aliasing, especially of textures (there were no UV's each quad had a texture with a specific rotation stretched to it).  So the one thing I did that I think worked well is anti-alias the buildings.  Basically for the non-animating part of the building I would allocate a buffer that was double the size in each dimension.  I rendered the building at this larger size and then anti-aliased that into the final cache.  So the AA only happened once when it got cached which means I could spend some cycles.  This only applied to buildings.

Now doing AA in 8-bits is going to require some sort of lookup table.  Since I had 4 pixels that I wanted to shrink down to 1 pixel I came up with a simple solution. It's very similar to what we use for bloom type stuff today which is simply separating the vertical and horizontal elements.  So the lookup table took 2 8-bit values and returned a single 8-bit value that represented the closest color in the palette to the average of the colors.  No I didn't take into account gamma correctly or much else to be honest.  Anyway I would simply do a lookup on the top two pixel and the bottom two pixels.  The results from those two ops were then looked up to give me the final color, so 3 lookups.  It drastically improved the look up the buildings.
Except I fucked up and left a bug in there.  Ever notice that a lot of the buildings have a weird purple halo?  Basically the table broke when dealing with the edge and transparency because I didn't have a correct way to represent that. Then I ran out of time, I think I could have fixed it.

Anyway I wrote some particle system stuff, lighting effect stuff and some other cool effects that didn't get used (psionics!).  But the unit rendering was by far the most complicated part of the whole renderer and it's what I ended up spending the most time on.

BTW I still think TA was an amazing game and I'm still interested in pushing that kind of game more in the future.  It seems like every time I do an RTS a few years later I'm ready to take another shot at it (SupCom was the last one, I'll do some posts about its engine sometime too).

Long enough?


  1. Very interesting read! I was always very curious of TA's engine.

  2. Such a cool read. Thank you kind sir!

  3. Yes, I remember having to dont-cache moving pieces on structure or they wouldn't update, and how that disabled shadows on those pieces or something. Also, your mindgun code wasn't completly removed, as attested by this picture from my old TA modding days:

    1. Yup, you found the mindgun stuff allright. Cool that it's still there in some form.

  4. Sounds pretty hardcore. These days a lot of game developers can afford to almost ignore optimization and use higher-level development tools to create games more rapidly and therefore cheaper.

    I enjoy seeing developers that really push the tech to try to do something no-one has achieved before. TA was a great example and it's fantastic to see you doing the same with PA. A million units? Do it.

  5. Jon this was very interesting to read. As interesting as reading Michael Abrash book about the time he worked with Carmack on Quake. Any plan for the same stuff about these not so old days about Supcom/Forged Alliance ? I guess this might be possible after Planetary Annihilation release which is obviously consuming all your bandwidth :)
    Thanks again for this post !

  6. Thanks for this piece of history. TA pushed the limits and considering how fast the gaming scene was moving at the time, that was well worth a few hacks !

    I have now discovered Spring and Balanced Annihilation, but in my mind it is TA that set the standard.

    Other readers might enjoy this Reddit discussion through which I came here :

  7. Wow! Nice information. There is fantastic on "Total Annihilation Graphics Engine". I am intimidated by the value of in sequence on this website. There are a lot of excellent assets here. Definitely I will visit this place again soon. I know something about this same information, to know you can click here.