Hello. It's imamelia here, ready to teach you as much as I know about ASM, with lessons, examples, and a few bad jokes. Now, of course, I'm not an expert by any means; I'm still learning, too. But they say that the best way to learn is to teach others. I had fun writing this tutorial, anyway, and I certainly hope that it can help people in their ASM endeavors.
What, right now, do you think you know about ASM? Lots? Some? None at all? Maybe you've looked through the custom blocks, sprites, or patches section at SMW Central and thought, “Wow, cool. I want to make some of those, too!”. Perhaps you have opened the .asm file of a custom sprite intending to remap its graphics, and you couldn't help but be curious about the rest of the code. What is “JSR GET_DRAW_INFO”? What does “PLB” do? Why is it that some numbers have both a # sign and a $ sign in front of them, but others have only the $ sign? What is the average flight speed of a swallow? Okay, maybe you weren't thinking too much about that last question. Well, in any case, don't worry. This tutorial may not answer all of your questions, but it should answer a lot of them. You'll learn why GET_DRAW_INFO is so important (Lab 3), what PLB does and how to use it (Lesson 3-2), what the difference between #$-- and $-- is and when to use which (Lesson 2-1), and...while ASM may not be the best tool for calculating the average flight speed of a swallow, you can use it to calculate the average flight speed of a rideable bomb-dropping Albatoss.
Well, enough blabbering. I'd say it's about time for the actual lessons, don't you think? Let's go!
Part 1: An Introduction to the Assembly System
Lesson 1-1: Hexadecimal and Binary
Lesson 1-2: A Bit of Vocabulary
Lesson 1-1: Hexadecimal and Binary
First of all, if you've looked at any ASM, you may have noticed something. Some numbers, like 14, 28, and 80, look normal, but some, like 7F, A8, and EC, have letters—usually capital—in them. Since when do letters count as numbers? What kind of weirdos run this place? Sure, a number like 7F wouldn't exactly be a “number” per se in our normal base 10 system...but we're not counting in base 10. We're counting in base 16, or hexadecimal. While base 10, the decimal system, uses 10 digits (0, 1, 2, 3, 4, 5, 6, 7, 8, 9), hexadecimal uses 16 digits (0, 1, 2, 3, 4, 5, 6, 7, 8, 9, A, B, C, D, E, F). That means that, almost invariably, when you see the number 10 in ASM code, it doesn't mean 10, ten, the number of fingers a normal human has. It means 10, as in 16, 4 times 4. If you wanted to express the number of fingers you have in hexadecimal, then you'd write it as A or 0A (unless, of course, you have a birth defect or are from another planet where people don't have exactly 10 fingers). See that? In hexadecimal, after 9 comes A, which is the same as 10 in decimal. After A comes B, then C, D, E, and F, which correspond to 11, 12, 13, 14, and 15 in decimal. After F, as you might have guessed, comes 10, which is 16 in decimal. Base 10 uses ones, tens, hundreds, thousands, ten-thousands, hundred thousands, etc., but base 16 uses ones, sixteens, two hundred and fifty-sixes, four thousand and ninety-sixes, etc., multiplying the value of each column by 16 to get the value of the next one.
Now, the question is, why use hexadecimal at all? Who the heck wants to be bothered with remembering numbers like 256 and 65,536 when numbers like 100 and 10,000 are so much easier? Well, you have to remember, humans really only like base 10 because we have 10 fingers. Computers, obviously, have no fingers at all (but if you see one that does, run), and at the most basic level, they can understand only two things: off and on. Up and down. Left and right. Indented or flat. Present or absent. They process data with zillions of tiny “pieces” that can only have two possible states, and if we translate that into numbers, the result is that the numbers we end up with have only two different digits: 0 and 1. I'll touch on the hexadecimal question later, but this brings me to the second part of this lesson...binary.
Binary is almost like hexadecimal on a smaller scale. Binary is base 2, so it uses only two digits: 0 and 1. Sound familiar? In binary, instead of ones, tens, hundreds, and so on, we have ones, twos, fours, eights, etc., multiplying the value of each column by 2 to get the value of the next one. So if you see a “10” that you know is in binary, its equivalent base-ten value would be only 2. 10 in binary = 2 in decimal. 11 in binary = 3 in decimal. 100 in binary = 4 in decimal, 101 = 5, 110 = 6, and so on and so forth. If you want to talk about having “ten” fingers, but you want to express it in binary, the number you would write is 1010. Count the value of the columns (zero ones, one two, zero fours, and one eight), and you'll see that they add up to 10 (decimal).
Now, to tie this all together and conclude this lesson, let's go back to the hexadecimal question that I posed two paragraphs ago. Why even use hexadecimal, as opposed to decimal? If we like base 10 so much, then why can't we just use that in our ASM? (Well, technically we can, but that's for a later lesson.) The answer, if you haven't guessed yet, lies in the two-value scenario. Computers think in binary, which consists of 1s and 0s and is based around the number 2...BUT the number 16 just so happens to be a power of 2; 2^4 = 16. So hexadecimal really isn't that big of a stretch from binary, but it's a lot easier on us humans than long strings of 0s and 1s. That's why we use it for things like ASM. (Side note: “Hexadecimal” is sometimes shortened to just “hex” in some contexts. Unless you're a native speaker of Greek, which is where the word “hexadecimal” comes from [hex-: six, deci-: ten] you may be tempted to make some Harry Potter reference right about now.)
Lesson 1-2: A Bit of Vocabulary
Now, if you really want to learn ASM, there might be a few terms you'll want to know. First of all, a bank is a block of 65,536 bytes of data (10000 in hexadecimal). BUT...not all data is used for the same purposes. Plain and simple:
-$0000-$1FFF are RAM, and their values are the same in every bank. They can and do change value during the game.
-$8000-$FFFF, except in banks 70-7F, are ROM, and their values are different in every bank. ROM data never changes; once it's there, it's there to stay.
-$2000-$FFFF in bank 7E and $0000-$FFFF in bank 7F are not ROM, but RAM. They work like any other RAM, but the bank has to be specified. You can't use just $8200; it has to be $7E8200 (or $7F8200).
-$0000-$FFFF in banks 70-7? are SRAM; this is used for things in the game that save, such as the number of exits found.
A big thank-you goes to Alcaro for compiling most of the bank data in the first place.
A bit is a single binary digit. In fact, the very word “bit” is short for binary digit. This digit, obviously, can be either 0 or 1; if the bit is clear or reset, then it is equal to 0. If it is set, then it is equal to 1.
A byte is a two-digit hexadecimal number that can have any value from 00 to FF and is composed of 8 bits. The bits are in the following order:
The bit farthest to the right, therefore, is bit 0, the bit farthest to the left is bit 7, and all the ones in between are...well, all the numbers in between. Now each bit has a certain hexadecimal value. If our byte has the value 00000001, then bit 0 is set and all other bits are clear. This is equal to 01 in hexadecimal. If our byte has the value 00000010, then bit 1 is set and all other bits are clear. This is equal to 02 in hexadecimal. You can continue the pattern by simply setting the next bit and clearing all others, so that 00000100 = 04, 00001000 = 08, 00010000 = 10, 00100000 = 20, 01000000 = 40, and 10000000 = 80. One way you may see bits expressed is as letters, with a lowercase letter indicating a clear bit and a capital letter indicating a set one. (E.g. if “bbbbbbbb” are our eight bits, then BbbbbbbB would indicate that bits 0 and 7 are both set and all others are clear.) To recap:
- A byte is a 2-digit hexadecimal number (any).
- A byte is composed of 8 bits.
- The bits each have a certain number, 76543210.
- They are often called by their numbers, e.g. “bit 2”.
- They are sometimes represented by letters, with a lowercase letter indicating a clear bit (a 0) and a capital letter indicating a set bit (a 1).
An address, or RAM address, is a byte—or sometimes 2 bytes—at a specific point in the RAM. Examples: $13, $0680, $13BF, $7FAB10. A single byte of RAM can contain any hexadecimal value from 00 to FF at any given moment. RAM is essentially used to tell the game what to do. In Super Mario World, doing a certain thing to one RAM address, for example, could activate the P-switch, but doing the same thing to a different RAM address could give the player invincibility.
An opcode is basically a command that tells the computer what to do. In 65c816 ASM (that's the formal name of the ASM Super Mario World uses, but you don't need to know it), all opcodes consist of three letters, such as LDA, RTS, and EOR. There are a total of 99 opcodes in 65c816 ASM (I counted) including useless ones, although you'll mainly be using just a fraction of them. (If you're curious, that list at the beginning of this document has all of them in alphabetical order.) An opcode tells the processor whether to put a number into this place, into that place, compare it to another number, decrease its value by 1, multiply it by 2, jump to another part of the code, copy something to something else, and more. Also, many opcodes are immediately followed by a number or sometimes other data, which, of course, is the number, address, etc. that the opcode will affect.
An addressing mode is...well, essentially, you can make a single opcode do slightly different things by using different addressing modes. For example, you can make some opcodes affect a RAM address (which you may or may not know the exact value of), or you can make them affect a specific number. These are two different addressing modes. So an addressing mode is basically how an opcode uses the data that follows it. (If it's a stand-alone opcode and doesn't have anything after it, then it has only one addressing mode or doesn't exactly use addressing modes per se.)
There are also three registers that are used in ASM. The accumulator, also known as A, is the most common. “A” is..sort of a temporary place to hold data. You can load a number into A, you can store the value in A to a RAM address, you can compare the value in A to another value, and more. There are two more registers you should know about: the X register and the Y register. These are very similar to A, but their opcodes are more limited and they are usually used for different purposes. The most common use of the X and Y registers is for indexing, which is covered in section 2-6. Together, A, X, and Y are the three variables you use to get data into and out of the game. Think of each of them like...one of those vertical carts that you can push. You take a box off of a shelf, put it onto the cart, and then transfer it to another shelf. Similarly, in ASM, you can take a number or the value of a RAM address, put it into A, X, or Y, and then transfer it to another RAM address.
An operand is something, such as a RAM address or value, that is affected by an opcode. For example, in “LDA #$05”, the #$05 is the operand (the LDA is the opcode). It sounds more difficult to understand than it is.
I think we're ready for some actual lessons, don't you? Now that you have a foundation...LET'S DO IT!!