<< Newer Article #206 Older >>

FD1094 Analysis, part 3

Unfortunately, just knowing the pseudo-random number generation formula is only good enough to identify the low 6 bits in 8188 out of the 8192 bytes of the key. The top two bits aren't generated by the formula, and the first four bytes are completely independent.

The first four bytes of the FD1094 key are what's known as the "global" key. The first byte indicates which internal mode the FD1094 switches into whenever an interrupt is taken (the FD1094 has 256 different modes that it can be in, and programs can switch between them at will while running). The three bytes following affect how the bit permutations work across the entire key, which is why we generally refer to it as the "global" key. In order to decrypt a given word of the program code, we take the 8-bit value from the key at the corresponding address, plus the 24 bits from the global key, plus the 8-bit current mode, and use those 40 bits to determine how to shuffle the bits to produce the final result.

Fortunately, there are some patterns in the global key that help reduce the keyspace by a few bits (each bit that is "locked" reduces the search space by a factor of 2, so each bit counts!), so in the end we only have to search all 8 bits of the first byte, and 16 bits over the next three bytes, so 24 bits total. Add that to the 22 bits necessary to fully describe a seed value for the random number generator, and we're up to 46 bits worth of "root" key data necessary to generate all of the full 8192-byte key, minus the top two bits.

So, what's in those top two bits of the rest of the key? Well, as I mentioned in part 1, bit $80 is set consistently at addresses from $0004-$0FFF, and bit $40 is set consistently at addresses from $1004-$1FFF. So actually, for most of the bytes, one of the two remaining bits is a constant value.

It's that final bit that is the tricky part. One of the features of the FD1094 is that, after all the bit permutation has completed, certain decrypted values are thrown away and replaced by the value $FFFF. This was likely done either to prevent the use of certain opcodes, or (more likely) so that $FF fills--which are a common encryption weakness--could be encrypted in a random fashion. Regardless of the reasons, the final bit has a significant effect on this $FFFF replacement step. If the bit is 0, a certain subset of decrypted values are replaced; if the bit is 1, a more expansive set of decrypted values are replaced (including all of those that get replaced if the bit is 0).

To simplify the following explanation, I'm going to break the decrypted values into 3 categories. Category A opcodes are never converted to $FFFF. Category B opcodes are always converted to $FFFF, regardless of the state of the mystery bit. And Category C opcodes are only coverted to $FFFF if the mystery bit is set to 1.

Thus, at encryption time, the programmers had to be careful about what opcodes were present in the original code stream. Obviously opcodes in Category A were fine. However, opcodes in Category B are always converted to $FFFF, so they had to completely avoided; presumably there was some mechanism to avoid that problem. And for the opcodes in Category C, it had to be ensured that the mystery bit was set to 0 in the key.

Now, there are several potential theories about how the mystery bit was generated for the key. One theory is that the bit comes from the random number generator, but is then modified to change the bit to 0 wherever there is an opcode from Category C in the path. Statistical analysis of all the existing keys disproves this theory: if it were true, then the mystery bit should be set to 0 consistently less than 50% of the time. In reality the mystery bit varies quite a bit; on some games it is 0 in 30% of the bytes, and on others it goes as high as 70%.

A second theory is that by default the mystery bit is set to 1 everywhere except where there is an opcode from Category C. This would mean that the mystery bit is strictly determined by the original code. This theory seems more likely. A quick analysis of the simplest game (Sonic Boom) reveals that the mystery bit is only 0 if there is at least one Category C opcode present, and is 1 in all other cases. This is not definitive, but it is a good indication.

So, how can we figure out what that last bit is? That's the trick, and the reason that I've been pounding away at creating some new debugger extensions. It involves a lot of tedious work, and I don't think we'll get 100% perfect results, but I'm hopeful we can get something playable at least. It essentially involves analyzing all possible decrypted combinations that you get when you decrypt the code with the mystery bit set and clear, discarding any invalid opcodes, and then manually selecting the right answer based on analysis of the code. Through this mechanism I've managed to get the title screen of Excite League to come up, and have identified over 30% of the mystery bits in Bullet.

Keep your eyes peeled here for news of additional progress....