<< Newer Article #127 Older >>

Writing an assembler

Over the weekend I decided to write an assembler. The official impetus for doing it was that there were no known good assemblers for the CPU architecture I'm looking at. The unofficial impetus is that it sounded like fun, especially since I already had a nice expression evaluation engine in the new MAME debugger, and that was half the battle.

Of course, I hadn't written an assembler in 20 years. Which is a good story in itself. 20 years ago I was just a pimply teenager learning how to write assembly code on my IBM PCjr. At the time, the only real assembler available was Microsoft's MASM, which I bought, but which it turned out required so much of my computer's little 128k of memory that it could not assemble anything bigger than a few hundred lines of code.

So I went searching on the local BBS's and came across a cool assembler called (IIRC) a86. This was a shareware assembler written by some guy at Intel, I think. But it was shareware, and expensive shareware at that. Furthermore, the author claimed to have used alternate encodings of instructions in such a way that he could detect if your binary was assembled with his assembler. Well, that sucked.

On the plus side, this assembler had a lot of nice features that MASM was lacking at the time, including support for local labels and no need for a linking step, as well as a much smaller memory footprint. But what to do about this whole shareware business?

Well, as was my typical response to such situations when I was a teenager and had a lot of time on my hands, I decided I could write my own assembler ... in assembly language! It would be fast and would support everything I wanted. In fact, I would make it assemble the exact same syntax as a86, since I liked a86 so much.

So I spent probably a month or two writing an assembler in assembly language. I called it qasm (for 'Q'uick Assembler -- after all, it was hand-coded in assembly language, it had to be fast!) I figured out how to do basic expression evaluation on my own (I didn't do it all that well at the time), and eventually created something that could assemble most of the code I had written while tinkering with a86. Then it was time for the real test.

Using a86, I assembled qasm. Then I took qasm.com and used it to assemble itself. Finally, I took the qasm.com that was generated from that step and assembled qasm again, just to be sure that the one that was self-assembled still worked.

I didn't know it at the time, but this is the process of bootstrapping a compiler. If you've ever gone through the process of compiling gcc on your own, you've done it as well. Use an existing tool to compile the compiler, then use the newly-generated compiler to compile itself. Finally, use the compiler that was self-generated to compile itself again as a sanity check.

Anyway, back to the present. Let me just say that assemblers are much easier to write in C. It's also much easier to write architecturally-sound code in C. I'm actually quite proud of how it's turning out. There is a core frontend that can be used to write pretty much any assembler; and there is a target-specific portion that interprets the specific opcodes and generates the machine code. I'll probably release the source once it's available. It would be pretty straightforward to make it into a universal assembler at some point down the road.

But for the moment, I think I'll just stick to getting it working as a v60 assembler. :-)