<< Newer Article #140 Older >>

CPU Scheduling in MAME (part 4)

Part 3 of this series discussed the problems involved in scheduling communication from a "later" CPU (one scheduled later in the round-robin order) to an "earlier" CPU. Since the MAME scheduler does not support changing the order of CPU execution within the round-robin, the only options to improve latency are to increase the interleave factor, either globally or temporarily.

There are two other means of altering the scheduling of CPUs during execution. These are the cpu_yield() and cpu_spin() calls. Both of these methods have been abused often in the past due to a lack of understanding about how they actually work, so now is the time to set the record straight.

What cpu_yield() does is end the current CPU's timeslice early. It does not affect the execution of any other CPUs in the system.

Let's look at an example.

Again we'll say that CPU #0 is running at 14MHz, and CPU #1 is running at 2MHz. There is a scheduled timer that is set to fire at time 0.000150. So we begin executing CPU #0 for 2100 cycles, but this time, partway through its timeslice (say, 1250 cycles in), one of its read/write handlers calls cpu_yield(). This aborts the current timeslice, leaving CPU #0's local time set to 0.000089286.

Now CPU #1 gets its turn to execute. Normally it would have executed for its entire timeslice up to time 0.000150; however, since the previous CPU stopped early, the scheduler only schedules up to the time when the previous CPU stopped execution. This equates to 0.000089286 * 2,000,000 = 179 cycles. Let's say it executes for 180 cycles, giving it a local time of 0.00009.

At this point, we tell the timer system that the global time is 0.000089286, but there are no timers ready to fire until 0.000150, so nothing happens, and the round robin begins again.

So far so good, but there is an additional side-effect that is not immediately obvious: after a cpu_yield(), the CPU is descheduled until the next time the interleave timer fires. (Recall from the last part of this article that the interleave value causes a timer to be scheduled at the requested interleave rate.) This means that in future scheduling rounds, CPU #0 will not participate, until the interleave timer fires and allows it once again.

In fact, cpu_yield() belongs to a class of synchronization calls: cpu_yielduntil_trigger(), cpu_yielduntil_int(), cpu_yielduntil_time(). All of these perform the same basic operation as cpu_yield() — that is, they stop execution on the current CPU and remove it from scheduling — but each specifies a different event that will enable the CPU to be scheduled again. This descheduling of the CPU has some interesting consequences.

Let's look at the previous example again, with this additional knowledge in hand. To make things a little simpler to explain, we'll change the situation so that instead of calling cpu_yield(), we will call cpu_yielduntil_time(0.00005). This is essentially telling the scheduler to not only give up our timeslice, but remove us from the scheduling equation altogether for the next 50 microseconds. So:

CPU #0 executes as before, ending its timeslice early at time 0.000089286 by calling cpu_yielduntil_time(0.00005). This aborts its timeslice and also internally sets a timer to go off at the current local time (0.000089286) plus 0.00005 seconds, or at time 0.000139286.

Then CPU #1 executes up to the time when the previous CPU stopped execution, which is time 0.000089286. This is again 179 cycles, so we run the CPU, and it comes back claiming 180 cycles, making its local time now 0.00009.

The timer system is called, but nothing is ready to fire, so the round-robin starts over. This time, when we ask the timer system when the next timer is set to go off, it reports 0.000139286, due to the timer that was set in response to cpu_yielduntil_time().

Now the round-robin begins again, except that CPU #0 is completely removed from the scheduling, so we skip right to CPU #1. Computing cycles: ((0.000139286 - 0.00009) * 2,000,000) = 99 cycles. We run CPU #1 with that request, and get back 101 cycles, putting its local time at 0.0001405, and the round-robin ends.

At this point, the timer set in cpu_yielduntil_time() fires, and it enables CPU #0 to be scheduled for the next round. However, notice that the two CPUs are now signficantly out of sync with respect to each other. CPU #0 is still stuck back at local time 0.000089286, while CPU #1 is already at 0.0001405. There is still a timer set to go off at 0.000150, so that will be used as the target for the next round-robin.

For CPU #0, that next pass at execution will run 0.000150 - 0.000089286 = 0.000060714 seconds, or 850 cycles, which is much longer than normal because it needs to execute more cycles to catch up to the target time. CPU #1, on the other hand, is almost at the target time, and only needs to execute 19 cycles to reach the target.

So the big side-effect of using cpu_yielduntil_time() in this manner is that you cause the yielding CPU to get behind in the scheduling process. Repeated use of the yield calls can put the CPU farther and farther behind, which can result in some seriously wacky behavior. For example, since CPU #0 is starting out at a time significantly behind CPU #1, if it sets a timer, that timer could already be in the past relative to CPU #1.

Analagous to the cpu_yield() calls are the cpu_spin() calls. These calls operate exactly the same as their partners, except that the local time for a spinning CPU is automatically bumped to the current global time at the end of each timeslice. This means that the CPU doesn't get behind; rather, it effectively "burns" all of the remaining cycles in a way that the spinning CPU will never get them back.

The thing that is confusing to people is this: Yielding is a form of synchronization. Spinning is a hack. Even though they look similar, they are used for two very different purposes.

There is really only one legitimate reason for spinning, and that is for adding spin loop optimizations to games that are doing no useful work while waiting for some event. If the event happens at some known time in the future, use cpu_spinuntil_time(). If that event is an interrupt, use cpu_spinuntil_int(). If that event is some other externally-driver factor, use cpu_spinuntil_trigger(), and then call cpu_trigger() when the event occurs.

If you find yourself using a cpu_spin() call for synchronization, you are masking some other problem in the emulation system.

Finally, a word about triggers. Triggers are simply a signalling mechanism that is used to indicate that a particular event has occurred. A trigger is identified by an integer, which is essentially a random number that is chosen by the person creating/using the trigger. There is no collision detection or means of allocating them (though there probably should be). To signal a trigger, you simply call cpu_trigger() with the trigger identifier. Triggers are mostly used in conjunction with the yield and spin calls to block execution of a CPU until the trigger is signalled. It's really no more complicated than that.

And that concludes my discussion of CPU scheduling in MAME. If there are significant questions, I may follow up a little later with a part 5 to answer some of the remaining questions. Thanks for hanging in there — I realize a lot of this can be difficult to follow!

(Edited to fix incorrect information about cpu_yield()).