Articles posted April 2005

More about blitters

Picking up from my last article on blitters, the next thing I was going to talk about was how to time a blitter. At a rough guess, if you have schematics handy, you can look at a blitter chip and do an approximation of how long a blit will take. A blitter chip, like a CPU, needs a clock, so that's a good starting point. For example, the blitter on the classic Williams games (Joust, Robotron, etc) runs at 4MHz. That means it's going to be limited to 4 million operations in a second.

But you can go beyond that. If you look at the width of the data bus that the blitter has, then you can tell how many bits it can operate on at a time. The Williams blitters have a 4-bit bus, and there are two of them running in parallel, so that's 8 bits per operation, or 2 4-bit pixels.

Taking the guessing game one step further, if there is only one address bus on the chip (as is the case for the Williams blitters), then it can't read graphics data from the source and write to a target at the same time. In fact, it will need to constantly swap back and forth between the source address and the destination address. So, assuming that it takes one clock to read a source byte, one clock to swap to the target address, one clock to write a target byte, and one clock to swap back to the source address, you're looking at 4 clocks per operation.

So, putting all this together, I'd have a first guesstimate that the Williams blitter could handle (4000000 clocks / 4 clocks per operation) * (8 bits per operation / 8 bits per byte) = 1 million bytes per second, give or take. So that's the ballpark to expect. I'm currently working on an update to the early Williams games that will factor this in, in the hopes that Robotron might slow down enough to match the arcade (a number of folks have noticed that Robotron runs too fast in MAME at the higher levels).

Of course, the ideal situation would be if we could measure this directly. Unfortunately with the Williams games, there's no obvious way to do it because the blitter HALTs the main CPU while the blit is happening, in order to gain full control over the systemwide address and data busses. If I had an oscilloscope and a PCB to play with, I could probably do some measurements that way. But there's no obvious software path to do it. (Okay, I lied, there is one: I could program the sound CPU to measure the timing, and then send start/stop signals to the sound CPU just before/after the blit is done. That might be worth trying.)

Fortunately, most later hardware that used blitters decided that it didn't really make sense to suspend the main CPU while a blit was happening, so they were designed to operate in parallel with the main CPU. These are generally known as an "asynchronous blitters". The Incredible Technologies blitters are like this. The nice thing is that this setup makes it relatively straightforward to measure how long blits of different sizes take to complete. Having done this now for the 8-bit IT games, I'll describe the gory details in my next blitter-related post.

Pentium M Benchmarks

A few recent forum posts have been asking about the performance of Pentium M chips. Since I recently upgraded to a Pentium M based laptop (Dell Inspiron 8600 @ 2.0GHz with an ATI Mobility Radeon 9600), I figured I would run some benchmarks to see how it competes against the desktop systems featured on the MAME32QA site. Unfortunately, there are no straight Pentium 4 benchmarks to compare against, but the AMD64 machines are pretty spiffy.

In general, I'm quite happy with the results. Overall, the performance was 8% slower than the Athlon64 3700+ desktop system. For several games, my system came out on top. For others, it kind of tanked.

The full results are here, but here are the interesting bits:

Games where the Pentium M 2GHz significantly outpaces the Athlon64 3700+: mk2 (+10%), popeye (+14%), rfjet (+13%), tekken3 (+10%), vasara (+27%), xexex (+16%)

Games where the Pentium M 2GHz significantly loses to the Athlon64 3700+: propcycl (-40%), radikalb (-28%), robotron (-20%), starblad (-28%)

Even more interesting is what happens when you compile with P4 optimizations (this is the "P-M P4 Build" column on the results page). The P4 optimizations could in theory help because the Pentium M has SSE2 capabilities and thus the P4 build could take advantage of that. On the other hand, the P4 optimizations also tweak the instruction scheduling to work better on the longer pipelines of the P4, while the Pentium M has much shorter pipelines.

In general, I found that P4 optimizations tended to exaggerate the results. That is, games which were slower than the AMD chip got even slower with P4 optimizations enabled, while games which were faster got even faster. In the end, the results were a wash, still ending up about 8% slower overall than the AMD64 3700+.

Interesting results, nonetheless. :-)

The long-promised site is now up in beta form. Have a look!

All about blitters

A number of games I've done drivers for — Williams, Strata, Art & Magic, etc — use blitters as their way of drawing graphics on the screen. This is different from the way most arcade games work, and is actually much more similar to a modern computer. In these games (and in computers), there is a large chunk of video memory which is called the "frame buffer". The frame buffer contains 'n' bits of information for each pixel you see on the screen.

Now, one immediate problem you run into with a frame buffer is that there is some hardware that is constantly scanning through this memory and pushing that data to the screen. If you are in the middle of modifying a bunch of pixels and the scanner intersects the area you are drawing to, then you can produce an effect called "tearing", where a partially rendered object is displayed (this is a simplification, but it illustrates the general idea). To get around this, most blitter-based games have two frame buffers. At any given time, one of the frame buffers is actively being scanned and displayed to the screen, while drawing happens to the other frame buffer. After the beam has scanned to the bottom of the display, the two buffers are swapped. This is known as "page flipping".

The early Williams games didn't have enough video RAM to do page flipping, so they had to be very aware of what scanline was currently being displayed. Once the scanning beam had passed below the area of video RAM they wanted to animate, then they could make their changes without fear of tearing. This is generally referred to as "drawing behind the beam".

So where does a blitter come into play? Well, in general it was entirely possible to sit there and muck about with the frame buffer using the main CPU. In fact, Defender doesn't have a blitter at all — it is all drawn by the main CPU. The problem is that it takes a lot of CPU power to draw lots of pixels on the screen. And although a CPU can certainly accomplish a lot, it is not specifically designed for drawing lots of pixels at high speeds. One solution to this could be to add a second CPU that is responsible for drawing the graphics, based on commands from the first CPU. In fact, a number of games such as Gyruss and the Cinematronics/Leland games (Quarterback, Ataxx, Super Off Road, etc) do just that.

However, a CPU is an expensive part. And it's not optimized for doing graphics. So a number of folks caught onto the idea of designing custom ICs that were dedicated to performing very fast graphics operations. In general, it's not enough to just draw pixels, these chips also had to do a lot of bit manipulation and address computations to handle things like X and Y flipping, transparency, scaling, etc. The term that has come up over time to describe this kind of operation (copying large arrays of data and manipulating them during the copy) is a "blit". And custom chips that are dedicated to this sort of work are called blitters.

So, you can think of a blitter as a custom chip that is designed to copy graphics (which are normally stored in ROM or RAM) to a frame buffer while manipulating the data in a programmed fashion.

So what makes this tricky in MAME? Well, first off, every company designed their own blitter. There is no standard way of blitting. Generally, this just takes some reverse engineering power and a lot of patience to figure out what's happening.

The really tricky part is the fact that blitters don't perform their operations instantaneously — it takes some time to actually shuffle through all that data and render it to the frame buffer. As a simplification, most blitters in MAME are implemented as "instantaneous" blitters, meaning they complete instantly. The problem is that many games rely on the speed of the blitter to limit their speed, or else overtax the blitter so that the original game slowed down when too much was being drawn at one time. Figuring out how fast the blitters operated is the trick, and I'll talk more about that next time.

What happened?

Wow, after doing so well for so long, I had a week-long dry spell on the old blog. It had to happen eventually. Things at work have heated up and so I'm finding less free time to ramble about what I'm doing here. And with what little free time I have, I'm too busy trying to accomplish MAME-related stuff to write about it. :-)

So I hope to have some more detailed updates in the next few days, but here are a few tidbits. First, I decided that I needed to buy a new ROM programmer. The old one I have connects to a custom ISA card, and hence requires an ancient vintage PC to connect it to. Now, I realize some people out there still try to run MAME on such a beast, but I felt it was time to upgrade to something more modern using fancy "USB" technology. ;-) Seems to work pretty well so far.

One of the remaining games I want to get emulated is Gaelco's Speed Up, which is the first of the three games they made that run on the early 3D hardware (the other two being Surf Planet and Radikal Bikers). The problem is that there are a number of 42-pin ROMs on the board that neither I nor Guru can seem to read. We've even traced out the pins and are pretty sure the pin layout matches a known EPROM type (27C322), but attempting to read the data that way is unsuccessful. Unfortunately, these ROMs contain all the model and sound data, and thus the game is unplayable without them. We will continue to try and find a solution on one side of the ocean or the other.

Beyond that, I've been taking a good hard look at my Incredible Technologies 8-bit driver, and have made some nice progress that will hopefully bode well for more accurate emulation of other systems with blitter chips in the future. More on that later.