This chapter gives rules and examples to follow when designing an assembly language program. The chapter includes a “learn by doing” section that contains information about how calling sequenca work. This involves writing a skeleton version of your prospective assembly routine using a high level language, and then compiling it with the –S option to generate a human-readable assembly language file. The assembly language file can then be used as the starting point for coding your routine.
This assembler works in either 32-bit or 64-bit compilation mode. While these modes are very similar, due to the difference in data, register and address sizes, the 64-bit assembler linkage conventions are not always the same as those for 32-bit mode. For details on some of these differences, see the MIPSpro Porting and Transition Guide.
The procedures and examples in this chapter, for the most part, describe 32-bit compilation mode. In some cases, specific differences necessitated by 64-bit mode are highlighted.
When you write assembly language routines, you should follow the same calling conventions that the compilers observe, for two reasons:
Often your code must interact with compiler-generated code, accepting and returning arguments or accessing shared global data.
The symbolic debugger gives better assistance in debugging programs using standard calling conventions.
The conventions for the compiler system are a bit more complicated than some, mostly to enhance the speed of each procedure call. Specifically:
The compilers use the full, general calling sequence only when necessary; where possible, they omit unneeded portions of it. For example, the compilers don't use a register as a frame pointer whenever possible.
The compilers and debugger observe certain implicit rules rather than communicating via instructions or data at execution time. For example, the debugger looks at information placed in the symbol table by a “.frame” directive at compilation time, so that it can tolerate the lack of a register containing a frame pointer at execution time.
This section describes three general areas of concern to the assembly language programmer:
Usable and restricted registers.
Stack frame requirements on entering and exiting a routine.
The “shape” of data (scalars, arrays, records, sets) laid out by the various high level languages.
The main processor has 32 integer registers. They are each 32-bit wide in MIPS1 and MIPS2 architectures. In MIPS3 and later architecture, each register is 64 bits wide. The uses and restrictions of these registers are described in Table 1-1 in Chapter 1.
The floating point coprocessor has 16 floating-point registers. Each register can hold either a single precision (32 bit) or a double precision (64 bit) value. All references to the32-bit versions of these registers use an even register number (e.g., $f4). Table 1-4 and Table 1-4 list the floating point registers and describe their use.
This discussion of the stack frame, particularly regarding the graphics, describes 32-bit operations. In 32-bit mode, restrictions such as stack addressing are enforced strictly. While these restrictions are not enforced rigidly for 64-bit stack frame usage, their observance is probably still a good coding practice, especially if you count on reliable debugging information.
The compilers classify each routine into one of the following categories:
Non-leaf routines, that is, routines that call other procedures.
Leaf routines, that is, routines that do not themselves execute any procedure calls. Leaf routines are of two types:
Leaf routines that require stack storage for local variables
Leaf routines that do not require stack storage for local variables.
You must decide the routine category before determining the calling sequence.
To write a program with proper stack frame usage and debugging capabilities, use the following procedure:
Regardless of the type of routine, you should include a .ent pseudo-op and an entry label for the procedure. The .ent pseudo-op is for use by the debugger, and the entry label is the procedure name. The syntax is:
.ent procedure_name procedure_name: |
If you are writing a leaf procedure that does not use the stack, skip to step 3. For leaf procedure that uses the stack or non-leaf procedures, you must allocate all the stack space that the routine requires. The syntax to adjust the stack size is:
subu $sp,framesize |
where framesize is the size of frame required; framesize must be a multiple of 16. Space must be allocated for:
Local variables.
Saved general registers. Space should be allocated only for those registers saved. For non-leaf procedures, you must save $31, which is used in the calls to other procedures from this routine. If you use registers $16–$23, you must also save them.
Saved floating-point registers. Space should be allocated only for those registers saved. If you use registers $f20–$f30 (for 32-bit) or $f24-$f31 (for 64-bit), you must also save them.
Procedure call argument area. You must allocate the maximum number of bytes for arguments of any procedure that you call from this routine.
![]() | Note: Once you have modified $sp, you should not modify it again for the rest of the routine. |
Now include a .frame pseudo-op:
.frame framereg,framesize,returnreg |
The virtual frame pointer is a frame pointer as used in other compiler systems but has no register allocated for it. It consists of the framereg ($sp, in most cases) added to the framesize (see step 2 above). Figure 7-1 illustrates the stack components.
The returnreg specifies the register containing the return address (usually $31). These usual values may change if you use a varying stack pointer or are specifying a kernel trap routine.
If the procedure is a leaf procedure that does not use the stack, skip to step 7. Otherwise you must save the registers you allocated space for in step 2.
To save the general registers, use the following operations:
.mask bitmask,frameoffset sw reg,framesize+frameoffset–N($sp) |
The .mask directive specifies the registers to be stored and where they are stored. A bit should be on in bitmask for each register saved (for example, if register $31 is saved, bit 31 should be `1' in bitmask. Bits are set in bitmask in little-endian order, even if the machine configuration is big-endian).The frameoffset is the offset from the virtual frame pointer (this number is usually negative).N should be 0 for the highest numbered register saved and then incremented by four for each subsequently lower numbered register saved. For example:
sw $31,framesize+frameoffset($sp) sw $17,framesize+frameoffset–4($sp) sw $16,framesize+frameoffset–16($sp) |
Figure 7-2 illustrates this example.
Now save any floating-point registers that you allocated space for in step 2 as follows:
.fmask bitmask,frameoffsets.[sd] reg,framesize+frameoffset–N($sp) |
Notice that saving floating-point registers is identical to saving general registers except we use the .fmask pseudo-op instead of .mask, and the stores are of floating-point singles or doubles.The discussion regarding saving general registers applies here as well, but remember that N should be incremented by 16 for doubles.The stack framesize must be a multiple of 16.
This step describes parameter passing: how to access arguments passed into your routine and passing arguments correctly to other procedures. For information on high-level language-specific constructs (call-by-name, call-by-value, string or structure passing), refer to the MIPSpro Compiling, Debugging and Performance Tuning Guide.
As specified in step 2, space must be allocated on the stack for all arguments even though they may be passed in registers. This provides a saving area if their registers are needed for other variables.
General registers must be used for passing arguments. For 32-bit compilations, general registers $4–$7 and float registers $f12, $f14 are used for passing the first four arguments (if possible). You must allocate a pair of registers (even if it's a single precision argument) that start with an even register for floating-point arguments appearing in registers.
For 364-bit compilations, general registers $4–$11 and float registers $f12, through $f19 are used for passing the first eight arguments (if possible).
In the table below, the “fN” arguments are considered single- and double-precision floating-point arguments, and “nN” arguments are everything else. The ellipses (...) mean that the rest of the arguments do not go in registers regardless of their type. The “stack” assignment means that you do not put this argument in a register. The register assignments occur in the order shown in order to satisfy optimizing compiler protocols:
Table 7-1. Parameter Passing (32-Bit)
Argument List | Register and Stack Assignments |
|---|---|
f1, f2 | $f12, FFigure 7-2igure 7-2$f14 |
f1, n1, f2 | $f12, $6, stack |
f1, n1, n2 | $f12, $6 $7 |
n1, n2, n3, n4 | $4, $5, $6, $7 |
n1, n2, n3, f1 | $4, $5, $6, stack |
n1, n2, f1 | $4, $5, ($6, $6) |
n1, f1 | $4, ($6, $7) |
Table 7-2. Parameter Passing (64-Bit)
Argument List | Register and Stack Assignments |
|---|---|
d1,d2 | $f12, $f13 |
s1,s2 | $f12, $f13 |
s1,d1 | $f12, $f13 |
d1,s1 | $f12, $f13 |
n1,d1 | $4,$f13 |
d1,n1,d1 | $f12, $5,$f14 |
n1,n2,d1 | $4, $5,$f14 |
d1,n1,n2 | $f12, $5,$6 |
s1,n1,n2 | $f12, $5,$6 |
d1,s1,s2 | $f12, $f13, $f14 |
s1,s2,d1 | $f12, $f13, $f14 |
n1,n2,n3,n4 | $4,$5,$6,$7 |
n1,n2,n3,d1 | $4,$5,$6,$f15 |
n1,n2,n3,s1 | $4,$5,$6, $f15 |
s1,s2,s3,s4 | $f12, $f13,$f14,$f15 |
s1,n1,s2,n2 | $f12, $5,$f14,$7 |
n1,s1,n2,s2 | $4,$f13,$6,$f15 |
n1,s1,n2,n3 | $4,$f13,$6,$7 |
d1,d2,d3,d4,d5 | $f12, $f13, $f14, $f15, $f16 |
d1,d2,d3,d4,d5,s1,s2, s3,s4 | $f12, $f13, $f14, $f15, $f16, $f17, $f18,$f19,stack |
d1,d2,d3,s1,s2,s3,n1, n2,n3 | $f12, $f13, $f14, $f15, $f16, $f17, $10,$11, stack |
Next, you must restore registers that were saved in step 4. To restore general purpose registers:
lw reg,framesize+frameoffset–N($sp) |
To restore the floating-point registers:
l.[sd] reg,framesize+frameoffset–N($sp) |
Refer to step 4 for a discussion of the value of N.)
Get the return address:
lw $31,framesize+frameoffset($sp) |
Clean up the stack:
addu framesize |
Return:
j $31 |
To end the procedure:
.end procedurename |
The difference in stack frame usage for 64-bit operations can be summarized as follows
The portion of the argument structure beyond the initial eight doublewords is passed in memory on the stack, pointed to by the stack pointer at the time of call. The caller does not reserve space for the register arguments; the callee is responsible for reserving it if required (either adjacent to any caller-saved stack arguments if required, or elsewhere as appropriate). No requirement is placed on the callee either to allocate space and save the register parameters, or to save them in any particular place.
In most cases, high-level language routine and assembly routines communicate via simple variables: pointers, integers, booleans, and single- and double-precision real numbers. Describing the details of the various high-level data structures (arrays, records, sets, and so on) is beyond our scope here. If you need to access such a structure as an argument or as a shared global variable, refer to the MIPSpro Compiling, Debugging and Performance Tuning Guide.
This section contains the examples that illustrate program design rules. Each example shows a procedure written and C and its equivalent written in assembly language.
The following example shows a non-leaf procedure. Notice that it creates a stackframe, and also saves its return address since it must put a new return address into register $31 when it invokes its callee:
float
nonleaf(i, j)
int i, *j;
{
double atof();
int temp;
temp = i - *j;
if (i < *j) temp = -temp;
return atof(temp);
}
.globl nonleaf
# 1 float
# 2 nonleaf(i, j)
# 3 int i, *j;
# 4 {
.ent nonleaf 2
nonleaf;
subu $sp, 24 ## Create stackframe
sw $31, 20($sp) ## Save the return
## address
.mask 0x80000000, -4
.frame $sp, 24, $31
# 5 double atof();
# 6 int temp;
# 7
# 8 temp = i - *j;
lw $2, 0($5) ## Arguments are in
## $4 and $5
subu $3, $4, $2
# 9 if (i < *j) temp = -temp;
bge $4, $2, $32 ## Note: $32 is a label,
## not a reg
negu $3, $3
$32:
# 10 return atof(temp);
move $4, $3
jal atof
cvt.s. $f0, $f0 ## Return value goes in $f0
lw $31, 20($sp) ## Restore return address
addu $sp, 24 ## Delete stackframe
j $31 ## Return to caller
.end nonleaf
|
This example shows a leaf procedure that does not require stack space for local variables. Notice that it creates no stackframe, and saves no return address.
int
leaf(p1, p2)
int p1, p2;
{
return (p1 > p2) ? p1 : p2;
}
.globl leaf
# 1 int
# 2 leaf(p1, p2)
# 3 int p1, p2;
# 4 {
.ent leaf2
leaf:
.frame $sp, 0, $31
# 5 return (p1 > p2) ? p1 : p2;
ble $4, $5, $32 ## Arguments in
## $4 and $5
move $3, $4
b $33
$32:
move $3, $5
$33:
move $2, $3 ## Return value
## goes in $2
j $31 ## Return to
## caller
# 6 }
.end leaf
|
The next example shows a leaf procedure that requires stack space for local variables. Notice that it creates a stack frame, but does not save a return address.
char
leaf_storage(i)
int i;
{
char a[16];
int j;
for (j = 0; j < 10; j++)
a[j] = `0' + j;
for (j = 10; j < 16; j++)
a[j] = `a' + j;
return a[i];
}
.global leaf_storage
# 1 char
# 2 leaf_storage(i)
# 3 int i;
# 4 {
.ent leaf_storage 2 ## "2" is the
## lexical level
## of the
## procedure.You
## may omit i.
leaf_storage:
subu $sp, 24 ## Create
## stackframe.
.frame $sp, 24, $31
# 5 char a[16];
# 6 int j;
# 7
# 8 for (j = 0; j < 10; j++)
sw $0, 4($sp)
addu $3, $sp, 24
$32:
# 9 a[j] = `0' + j;
lw $14, 4($sp)
addu $15, $14, 48
addu $24, $3, $14
sb $15, =16($24)
lw $25, 4($sp)
addu $8, $25, 1
sw $8, 4($sp)
blt $8, 10, $32
# 10 for (j = 10; j < 16; j++)
li $9, 10
sw $9, 4($sp)
$33:
# 11 a[j] = `a' + j;
lw $10, 4($sp)
addu $11, $10, 97
addu $12, $3, $10
sb $11, -16($12)
lw $13, 4($sp)
addu $14, $13, 1
sw $14, 4($sp)
blt $14, 16, $33
# 12 return a[i];
addu $15, $3, $4 ## Argument is
## in $4.
lbu $2, -16($15) ## Return value
## goes in $
addu $sp, 24 ## Delete
## stackframe
j $31 ## Return to
## caller.
.end leaf_storage
|
The rules and parameter requirements that exist between assembly language and other languages are varied and complex. The simplest approach to coding an interface between an assembly routine and a routine written in a high-level language is to do the following:
Use the high-level language to write a skeletal version of the routine that you plan to code in assembly language.
Compile the program using the –S option, which creates an assembly language (.s) version of the compiled source file (the –O option, though not required, reduces the amount of code generated, making the listing easier to read).
Study the assembly-language listing and then, imitating the rules and conventions used by the compiler, write your assembly language code.
The machine's default memory allocation scheme gives every process two storage areas; these can grow without bound. A process exceeds virtual storage only when the sum of the two areas exceeds virtual storage space. The link editor and assembler use the scheme shown in Figure 7-3. An explanation of each area in the allocation scheme follows the figure.
Reserved for kenel operations.
Reserved for operating system use.
Used for local data in C programs.
Not allocated until a user requests it, as in System V shared memory regions.
The heap is reserved for sbrk and break system calls, and it not always present.
The machine divides all data into one of five sections:
bss - Uninitialized data with a size greater than the value specified by the –G command line option.
sbss - Data less than or equal to the –G command line option. (512 is the default value for the –G option.)
sdata (small data) - Data initialized and specified for the sdata section.
data (data) - Data initialized and specified for the data section.
Reserved for any shared libraries.
Contains the .text section, .rdata section and all dynamic tables.
Reserved.