<< Newer | Article #139 | Older >> |
CPU Scheduling in MAME (part 3)
Part 2 of this article discussed the cooperation between the scheduler and timers that enables events to be synchronized between multiple CPUs. In short, when a CPU needs to signal an event to another CPU, it sets an "instant" timer which causes all CPUs to execute up to that point in time. Then, the timer callback is fired and the event can be signalled safely and accurately.
The big problem with this approach is that it really only works well if the target CPU has a local time that is less than the current time. The example we looked at last time had CPU #0 signalling an interrupt to CPU #1. Since the round-robin ordering causes CPU #0 to execute first, we were (pretty much) guaranteed that when CPU #0 wanted to issue its signal, its local time would be greater than the local time of CPU #1.
But what happens when CPU #1 wants to send a signal back to CPU #0?
Taking the naive approach, and returning back to our previous example, let's say that we have CPU #0 running at 14MHz, and CPU #1 running at 2MHz. There is a scheduled timer that is set to fire in 150usecs, or at time 0.000150. So we begin executing CPU #0 for 2100 cycles, and it completes its timeslice normally, returning 2112 cycles, meaning its local time ends up being 0.000150857.
Now we execute CPU #1 for its timeslice, which turns out to be 300 cycles as in the example from Part 1. However, this time, at 50 cycles into execution, CPU #1 decides that it needs to send a signal to CPU #0. So, rather than sending the signal immediately, it sets an instant timer to go off immediately, at time = 50 / 2,000,000 = 0.000025. This also has the side effect of aborting the execution of CPU #1 and ending the current round robin.
So, at this point, the scheduler contacts the timer system and indicates that the global time should be updated to the minimum of all the CPUs' local times, which would be 0.000025. The timer system sees that there is a timer scheduled to go off at 0.000025, and fires it; in the callback, we send our signal to CPU #0.
But wait, isn't CPU #0's local time already way off in the future at 0.000150857? Yep. This means that the signal arrives much later that it should have (0.000150857 - 0.000025 = 0.000125857 seconds, or 1762 CPU cycles too late), and we have lost our synchronization.
How do we solve this problem? Well, we could have swapped the execution order, making the scheduler execute CPU #1 first. But in order to do that, we would have needed to predict the future once again. If the communication details between two CPUs are well-understood and follow strict rules, then it might be possible to make this sort of fine-grain scheduler tweaking work. Up to now, however, there has not been a good case made for running out-of-order like this. And so the round-robin order remains fixed.
Traditionally in MAME — in fact, even before I ever wrote the timer system and the scheduler — the way this sort of communication issue was resolved was to increase the interleave factor between CPUs. The interleave factor was a number that indicated how frequently the CPUs in MAME were configured to re-synchronize their execution times each video frame. (This was specified in terms of video frames originally because all timing in MAME was done relative to video frames before the timers existed.)
The interleave factor is specified globally in MAME's machine driver structure. It is implemented by computing how many times per second the synchronization is implied (for example, an interleave of 100 with a game that runs at 60Hz, would imply 6000 synchronizations per second), and simply setting a timer with a NULL callback to go off at that rate. No callback is needed because no action needs to be performed; rather, the mere existence of the timer firing at that rate effectively brings all CPU into sync at that frequency.
Back to our example, let's say our game runs at a 60Hz frame rate, and we bump the interleave factor to 500. That will ensure for us 500 * 60 = 30,000 synchronizations per second, or once every 0.000033333 seconds. This means that there will be a timer set to fire every 0.000033333 like clockwork. So let's re-evaluate what happens and why the interleave improves things.
Remember that the timer system figures out when the first timer is set to fire. Previously, our first timer was going to go off at 0.000150, but now we have this interleave timer which is going to go off even earlier at 0.000033333, so that determines what our first timeslice will be. Taking 0.000033333 * 14,000,000 gives us 467 cycles, so we execute CPU #0 for 467 cycles. Let's say it comes back having executed 470 cycles. That puts our local time at 0.000033571.
Now we execute CPU #1 for its timeslice, which turns out to be 67 cycles with the new timer in place. Again, at 50 cycles into execution, CPU #1 decides that it needs to send a signal to CPU #0. So we set an instant timer to go off immediately, at time = 0.000025. This ends the round robin, and the timer system is notified just as before.
This time, however, CPU #0 receives the signal at 0.000033571, which is only 0.000008571 seconds or 120 CPU cycles too late. This is a big improvement over 1762 cycles, but it's still not perfect. By increasing the interleave we could make it even better if we wanted. In fact, the interleave factor effectively determines the worst case latency for a signal from one CPU to another.
Could we make it perfect? Well, actually, we could. If we set up a timer to run at a frequency of 2,000,000 times per second (the clock speed of the 2nd fastest CPU), then we would get as close as possible to perfect interleave. CPU #0 would never execute longer than a single clock on CPU #1, so when CPU #1 sent a signal, it would hit CPU #0 at the same time as the end of that particular clock cycle on CPU #1.
To set the interleave that high would require specifying an interleave of 33333 in the game's machine driver. Try it sometime. Things get very very slow. This is because a context switch between two CPUs is not free, and when you try to set up a timer to run that frequently, you spend all your time context switching and very little time actually executing any code on the CPUs.
The ideal solution to this is to detect when it is likely that CPU #1 needs to signal CPU #0, and temporarily boost the interleave so that, at least for a while, synchronization is guaranteed. This is the purpose of the cpu_boost_interleave call. It takes two parameters. The first parameter is how frequently the timer should fire — note that it is not specified in terms of video frames, but rather as an absolute time. You can also pass 0 here, which causes the system to automatically pick the clock rate of the 2nd fastest CPU, which will give you ideal synchronization. The second parameter specifies how long, in seconds, you want to maintain this level of interleave. Generally, you don't want it on too long.
Most commonly, games in MAME are set up so that "master" CPUs are run first, and communication tends to go from earlier CPUs to later ones in the round-robin order. Interleave boosting is used in these cases when the "slave" CPUs need to send some information back, and the master is sitting there waiting for a response.
Keep in mind that none of these systems are perfect, yet they have been successfully used for many thousands of different platforms. In Part 4, I'll wrap this series up with a bit about spinning, yielding, and triggers.