[size=8]Note: This is a [b][color=#ff0000]really long[/color][/b] tutorial, make sure you are comfortable and read it carefully. Take pauses and remember to go steady.[/size]
This tutorial is intended to teach the basic aspects of Super FX assembly, from the basic registers to sorta complex codes. If you have a question and/or think something isn't clear/right, please post in this thread rather than PMing me.
[b]Index:[/b]
A)CPU Basics
B)Getting used to the registers
C)Basic operations
D)Example codes
[b]A)CPU Basics[/b]
What is Super FX? It is a Co-processor for general usage, clocked at 21.7 MHz (being capable of receiving overclocks at maximum 60 MHz) and RISC architecture. The Super FX makes use of 16 registers, each being 16-bit in size, ranging from R0 to R15. This CPU also have a pipeline system, loading the next instructions as one is executed. There are other registers accessed by SNES only, they will be covered on this tutorial as well.
[b]B)Getting used to the registers[/b]
As it is noted, Super FX contains 16 registers for usage but not all them are general only. This table will show the registers and their relation with Super FX, registers in [i]Italic[/i] means that GSU can't access it:
Register Name |
Super NES CPU Address |
Special Functions |
R0 |
$3000:$3001 |
Default Source/Destination Registers |
R1 |
$3002:$3003 |
PLOT Instructions, X coordinate |
R2 |
$3004:$3005 |
PLOT Instructions, Y coordinate |
R3 |
$3006:$3007 |
- |
R4 |
$3008:$3009 |
LMULT Instructions, lower 16 bits |
R5 |
$300A:$300B |
- |
R6 |
$300C:$300D |
FMULT and LMULT instructions, multiplication |
R7 |
$300E:$300F |
MERGE instruction, source 1 |
R8 |
$3010:$3011 |
MERGE instruction, source 2 |
R9 |
$3012:$3013 |
- |
R10 |
$3014:$3015 |
None but best used as Stack Pointer |
R11 |
$3016:$3017 |
LINK instruction destination register |
R12 |
$3018:$3019 |
LOOP instruction counter |
R13 |
$301A:$301B |
LOOP instruction branch |
R14 |
$301C:$301D |
ROM address pointer |
R15 |
$301E:$301F |
Program counter |
Status/Flag |
$3030:$3031 |
Indicates the status of the GSU. |
Program Bank Register |
$3034 |
The program bank register specifies the memory bank register to be accessed. |
ROM Bank Register |
$3036 (Read-Only) |
The ROM bank register specifies the ROM bank when loading data from ROM using the ROM buffering system. |
RAM Bank Register |
$303C (Read-Only) |
The RAM bank register specifies the RAM bank when loading/writing data from RAM. |
Cache Base Register |
$303E:$303F (Read-Only) |
The cache base register specifies the starting address when data are loaded from ROM or RAM to the cache RAM. |
Screen Base Register |
$3038 (Write-Only) |
The screen base register is used to specify the start address in the character data storage area. |
Screen Mode Register |
$303A (Write-Only) |
The screen mode register specifies the color gradient and screen height during PLOT processing and controls ROM and RAM bus assignments. |
Colour Register |
- |
The colour register contains data which specifies the colours to be plotted when PLOT processing is performed. |
Plot Option Register |
- |
The plot option register contains flags which specify the mode to be used when a COLOR, GETC, or PLOT instruction is executed. |
Backup RAM Register |
$3033 (Write-Only) |
Makes sure data at Banks $78:$79 get protected or not for writing. |
Version Code Register |
$303B (Read-Only) |
Checks for the version of the Super FX chip. |
Config Register |
$3037 (Write-Only) |
The CONFIG register selects the operating speed of the multiplier in the GSU and sets up a mask for the interrupt signal. |
Clock Select Register |
$3039 (Write-Only) |
This register assigns the Super FX operating frequency. |
[b]C)Basic operations[/b]
Now that you are aware of the registers, you will learn the basic codes, this section deals with knowledge of the operations so we can apply them later in this tutorial.
Immediate operations:
[code]IBT R10,#$83
IWT R4,#$1234[/code]
Super FX is a 16-bit CPU, almost every 8-bit operation will sign-extend the bytes to words by grabbing the bit 7 and copying from bits 8 through bit 15. The above code clarifies that, doing [b]IBT R10,#$83[/b] makes [u]R10 = $FF83[/u]. Why? #$83 in binary equals to [b]1000 0011[/b], count the bits, the last bit is always copied from the upper bits. Word operations sets the values as is.
Loading banks:
[code]IBT R7,#$10
FROM R7
ROMB
IBT R0,#$01
RAMB[/code]
The above code sets ROM bank to $10, meaning that ROM operations will be done taking bank $10 in mind, also, it sets bank $71, so RAM operations are done in bank $71.
Source and Destination:
[code]TO R10
GETB
FROM R1
COLOR
WITH R3
SUB R3[/code]
The above code sets R10 as [u]destination[/u] from the Get Byte from ROM. So Super FX gets data from ROM and puts on R10. Later on, it sets R1 as source and puts the value of R1 to the color register. Later, it sets R3 as both source and destination and subtracts R3 from it.
The above operation can be read like this: Data -> R10 ; R1 -> COLOR ; R3-R3=R3 (0)
Be aware that R0 is the default Source and Destination register, whatever operation you do that isn't a branch nor MOVE/MOVES/ALT operations, resets the source and destination to R0.
Store and Loading:
[code]GETB
STW (R4)
LDB (R1)
SM ($5678),R0
LM R3,($1234)[/code]
GETB loads a byte from ROM, taking account address in ROMB:R14. STW stores a 16-bit (word) value from source to the value in the register, for example, in that code, the register used is R4, if R4 = $3232 it means that data will be stored in address RAMB:$3232. On the other hand, LDB does the reverse operation, loading a value (in this case a byte) on the destination register. SM and LM do the same except it loads/stores 16-bit values only and you can specify the address. [b]NOTE: If you load 16-bit values, take care of even/odd addresses. If address is even, the high byte will be located at Address+1, however if address is odd, the high byte will be located at Address-1.[/b]
Jumping and Comparing:
Suppose that R1 = $8000 and R3 = $2FFF
[code]FROM R1
CMP R3
BCS Label
NOP[/code]
[code]LINK #4
IWT R15,#Label
NOP
Return:
[...]
Label:
JMP R11[/code]
[code]FROM R1
LJMP R11
NOP[/code]
The above code is a simple compare, same operation as SNES does, set the source to compare, if it sets the flags then you can set the branches.
The other code is a simple "subroutine", LINK (ranges from 1-4 bytes) loads the return address by doing (1 thru 4)+ R15 = R11. R15 is the Program Counter, it is where the processor is executing the codes, so if you modify R15, you are basically jumping to routines. Changing R15 makes Super FX jump to the desired location. [b]NOTE: Due to pipeline, you should be careful for two or more bytes when jumping, Super FX will only read the first byte of the next instruction.[/b]
JMP works like the SNES version, except you use the register as jump address while LJMP does a long jump, it works by getting the source as the bank and the other register as address to jump.
Bitshift, Addition and Subtraction operations:
[code]INC R0
DEC R1
ADC R2
ADC #3
ADD R4
ADD #5
SBC R6
SUB #7
SUB R8
ASR
LSR
ROR
ROL[/code]
The above operations are pretty much self explanatory. Increase register by 1. Decrease register by 1. Add with Carry from source to destination (Source + Rn = Destination). Add without carry. Subtract with carry. Subtract without carry. Arithmetic shift right. Logical shift right. Rotate through carry right and Rotate through carry left.
The difference of the shifts is that ASR copies bit 15 into itself while LSR doesn't, shifting normally.
Bitwise operations:
[code]FROM R1
NOT
AND R1
OR R5
XOR R2
BIC #15[/code]
NOT is a simple operation, it inverts every bit. AND compares the values and if the bits match, the bit value is maintained if not the other is discarded. OR works the opposite way as AND, if the bits don't match, then the value are maintained rather than discarded. XOR albeit similar of the OR instruction, this one takes in consideration that bits SHOULDN'T be matched, otherwise they'll be inverted and last but not least, BIC performs logical AND on corresponding bits of source register and the 1's complement of register specified in register, this means the value stated will be inverted [u]THEN[/u] [b]AND[/b] operation will be done.
Multiplication:
[code]TO R1
MULT R0
TO R2
UMULT R0
LMULT[/code]
Multiplcation is simple on Super FX. MULT and UMULT does 8-bit multiplication only while LMULT and FMULT does 16-bit calculations.
The difference of MULT and UMULT is that MULT does signed operations (it checks for the 7th bit) while UMULT doesn't, also they differ from LMULT and FMULT that they can set registers to multiply from whereas FMULT and LMULT uses R6 as prefixed register to multiply from and R4 as low word destination from the 32-bit result.
For example: Source -> R5 = $52CF and R1 = $63CF
[code]FROM R5
MULT R1[/code]
The result would be R0 = $0961. Why? The operation is 8-bit but result is 16-bit, it takes account for sign bit. You can do yourself on Windows calculator, $FFCF*$FFCF=$0961. As for an unsigned multiplication, let's take this for example: Source -> R5 = $364F and R1 = $B2CF
[code]FROM R5
UMULT R1[/code]
The result would be R0 = $3FE1. Why? Same reason as above, [b]HOWEVER[/b] the operation isn't signed, therefore in Windows calculator, you'd do $004F*$00CF=$3FE1.
Long multiplications are a tad harder but they do good in complex operations. FMULT omits the R4 destination while LMULT sets the whole result, take this as an example: Source -> R5 = $B556 and R6 = $DAAB
[code]FROM R5
LMULT[/code]
The result would be: R0 = $0AE3 and R4 = $5C72. To check the result in Windows calculator, do $FFFFB556*$FFFFDAAB = $0AE35C72. Remembering, only UMULT doesn't account for most significant bit (either bit 7 in 8-bit operations or bit 15 in 16-bit operations.)
Loop and Cache:
[code]IBT R12,#$04
CACHE
MOVE R13,R15
[...]
LOOP
NOP[/code]
The above code is simple, R12 sets the amount of times a routine should be looped. The MOVE opcode copies address from R15 (PC) to Looback address Register. The CACHE opcode needs to be used prior loops so when a LOOP command is executed, the contents of data will be ran on Cache RAM next time rather than ROM/RAM. LOOP decrements R12 and checks if it is zero, if it is, don't loop again, otherwise, jump to the address specified in R13.
Misc. Code:
[code]SWAP
MOVE R11,R15
TO R12
HIB
LOB
MERGE
STOP[/code]
Well, starting with SWAP. SWAP changes the position of high byte to low byte and vice versa. For example, if R0 is $1234, after a SWAP it'd be $3412. MOVE copies the value from source to destination, the syntax is MOVE Destination,Source. MOVES does the same as MOVE except it sets flags that can be useful for testing values. HIB gets the high byte value and places on low byte from destination. LOB does AND #$00FF and gets the low byte only.
MERGE is a tad complicated operation but it works like this: MERGE gets the high byte of R7 and places on the high byte of destination register while gets the high byte of R8 and places on the low byte of the destination register, effectively merging them.
STOP does as is, it stops Super FX's clock for SNES to read the output result.
[code]ADD R3
ALT1
ADD R3
ALT2
ADD R3
ALT3
ADD R3[/code]
The above code deals with alternate codes, by using ALTn instructions, you can replace certain operations with others. It is done automatically on the assembler but you can use it to save a few bytes or cycles even.
For example, without any ALTn, ADD R3 stays as is. With ALT1, then ADD R3 turns into ADC R3. With ALT2, ADD R3 turns into ADD #3. With ALT3, then ADD R3 turns into ADC #3. Beware of them!
Bitmap Code:
[code]IBT R0,#$02
CMODE
FROM R7
COLOR
LOOP
PLOT[/code]
The bitmap code is easy to understand but they require attention when working with them. CMODE sets the flags for the PLOT operation, such as transparency, dither, sprite mode and 256 bit colour. COLOR reads the source address to get the palette index for plotting.
The PLOT opcode works like a printer, it reads for the X and Y coordinates (specified by R1 and R2 respectively), the palette index pointed by COLOR and the Screen Base Register. Take into mind that PLOT will increment X so you don't have to do it.
[code]RPIX
GETC[/code]
The above code is extra codes for bitmap processing, RPIX is the alternate code for PLOT, it reads the pixel position by checking the coordinates and reads the colour information on the destination register. The GETC works like GETB except that it places data straight into Colour Register.
A reminder that this section deals only with the basics of code, below, I will do simple examples, with commented code for easier understanding.
[b]D)Example codes[/b]
[code]SUB R0 ;Do R0-R0=R0 (0)
RAMB ;Store bank value from R0 to RAM Bank
IBT R1,#$44 ;R1 = $0044
IWT R2,#$8000 ;R2 = $8000
FROM R1 ;Source is R1
STB (R2) ;Store byte value from R1 on address at R2. High byte is ignored.[/code]
[code]IBT R0,#$01 ;R1 = $0001
ROMB ;Store bank value from R0 to ROM Bank
IWT R2,#$1DFB ;R2 = $1DFB
IWT R14,#$8000 ;R14 = $8000 - Also start ROM buffering (ROM pointer)
TO R6 ;Set R6 as destination
GETB ;Get data from ROM to destination - ROMB:R14 - In this case $01:8000
TO R4 ;Set R4 as destination
LDW (R2) ;Load word value from address in R2 to destination in R4. R0 turns destination again.[/code]
[code]LINK #4 ;Get return address by doing R15+4 = R11
IWT R15,#JumpHere ;Jump to Label
WITH R5 ;Meanwhile load this opcode and make R5 source and destination
ReturnLabel:
FROM R5 ;When return, get R5 as source
ADD R1 ;Do R5+R1=R2
SM ($1AAF),R2 ;Store the result (16-bit) from R2 to address $1AAF
STOP ;Stop the CPU
JumpHere:
UMULT #5 ;Do R5*5=R5
JMP R11 ;Return
TO R2 ;Set R2 as destination[/code]
[code]IBT R1,#$80 ;R1 = $FF80
FROM R1 ;Set R1 as source
TO R2 ;Set R2 as destination
XOR #15 ;Do R1^F = R2 ($FF8F)
FROM R2 ;Set R2 as source
AND R1 ;Do R2 & R1 = R0 ($FF80)
BIC R2 ;Do R0 & (~ R2) = R0 ($0000) - It Inverts the register THEN it ANDs it. Bear that in mind.
NOT ;Invert all bits = R0 ($FFFF)[/code]
[code]IBT R0,#$02
CMODE
IBT R1,#$00
IBT R2,#$00
IBT R12,#$15
IWT R7,#$8000
CACHE
MOVE R13,R15
LDB (R7)
INC R7
COLOR
LOOP
PLOT
STOP[/code]