Processor-internal stacks can be realized as conventional register files
with special addressing hardware.
A top-pointer identifies the topmost element of a stack.
It is implicitly modified during a push or pop operation.
Since a stack can grow beyond the size of processor-internal storage,
the bottom part must be extended into memory.
The processor-internal part of a stack realizes a circular
buffer for the topmost elements.
In the following,
the term stack refers to the processor-internal part, and
the term swap-area refers to the part held in memory, unless otherwise stated.
A last-pointer defines the bottom element of the stack.
Top-pointer and last-pointer
are abbreviated as
and
, respectively, in formulas. By convention, the stack grows towards higher addresses,
the top-pointer denotes the topmost element,
the last-pointer points to the cell below the bottom element of the stack,
and
denotes the size of the stack.
should be a power of 2, in which case
the last
bits of the top-pointer
can be used to address the stack.
The basic stack operations
-
push, pop, read or (over)write the topmost stack element
-
do not suffice to implement (functional) languages efficiently.
To allow for a uniform hardware implementation
and a regular instruction set,
the following
stack operations
are available in Fast (with
):
];
] := value;
] := value,
:=
;
],
:=
;
The index
must be less than the size
of the stack.
To access an element located in the swap-area,
a conventional memory access must be performed.
These stack operations greatly simplify the address generation hardware for the stack system, which is a critical part of the architecture. The address generation logic is the only part of Fast which could slow down the cycle time relative to conventional organizations of the register file. Thus it is important to make this part as fast as possible.
In the following, ``register'' denotes stack elements as well as general purpose registers. The register file of Fast is a dual port memory with 256 cells, thus the address of a register is 8 bits wide. The cells 1 to 127 are general purpose registers, and the four stacks have 32 cells each. Register 0 is hardwired to 0; if it is specified by an instruction as destination, the result is discarded.
Each stack is identified by a number
.
This number is mapped by a decoder
to the actual internal stack number.
To implement the stack switch operation,
a flag named SWITCH
is involved in this mapping.
If it is cleared,
the decoder does not modify the number,
otherwise the numbers 2 and 3 are swapped.
The stacks are externally specified by the
names M, H, I and A
which correspond to the numbers 0 to 3 before the mapping is done.
The address of a stack element thus consists of two parts:
a 2-bit stack-number St and a 5-bit stack index i.
The address within the register file is formed as
1 !
! (
)
(where '!' denotes the concatenation of bit strings). A top-pointer is 5 bits wide, a last-pointer is 32 bits wide. This is an implicit realization of the stacks as circular buffers. The last-pointer also defines the swap-area of each stack in memory. The contents of the 32-bit register point to the topmost element of the part of the stack which is held in memory. Top-pointers are only 5 bits wide to save chip area and to reduce cycle time.
The instruction pipeline of Fast is divided into a stage each for instruction fetching, instruction decoding and (register) operand fetching, execution, and storage of the result. The pipeline affects assembly programming only in terms of two delay slots for branch and load instructions. Other pipeline interlocks are treated transparently by the processor. A code re-organizer (instruction scheduler) takes care of the correct order of instructions in compliance with pipeline constraints.
Most RISC processors with delayed-branches do not allow for two consecutive branch instructions since they cannot restart such a sequence properly. Since Fast has two delay slots, a program counter for each stage of the pipeline is necessary to correctly handle possible traps that may occur in any combination of instructions. This is absolutely essential in case of stack overflow/underflow traps.
Fast has a Harvard architecture, i.e. data and instruction memory are conceptually separated. All memory accesses are performed with full 40-bit words. All addresses are word addresses, therefore no alignments are necessary.
Since the timing of memory accesses is not compatible with that of the four-stage pipeline, an asynchronous memory interface has been chosen. The execution of pipeline instructions proceeds as long as they do not depend on the results of outstanding loads, or are memory accesses themselves. Otherwise the pipeline is stopped and the processor waits for the completion of the memory access.
The basic instruction in Fast is a register-to-register instruction which may be denoted as: dest := src1 AluOp src2. This instruction takes two source registers src1 and src2, performs a dyadic operations AluOp on them, and stores the resulting value in the destination register dest. src2 may be replaced by an 8 bit immediate constant.
Operands may be held either in general purpose registers, in stack elements or special registers. Special registers are the four top-pointers, the four last-pointers, and the program counter. Using a consistent 40-bit architecture, a register-to-register instruction has the following format:

To generate 32 bit constants without accessing memory,
a Set Immediate High instruction
is available.
It sets the upper 24 bits of a register to an immediate
value and clears the lower 8 bits, which may be set with a
register-to-register instruction using an
as AluOp and
an 8 bits immediate as src2.
In order to simplify instruction decoding, other formats are introduced only when absolutely necessary. The load instruction has exactly the same format as the register-to-register instruction: dest := Mem[src1 AluOp src2].
The register file is not capable of delivering three source operands in a single cycle. The store instruction Mem[src1 AluOp Imm] := src2 therefore uses the dest field as an immediate value.
There are no condition codes in Fast, since they enlarge the processor state and it is difficult for the compiler to keep track of modifications of condition codes. This prevents possible optimizations and complicates code re-organization.
Fast has two different instructions to change the control flow: A conditional branch instruction: if condition(src2) then PC := src1 AluOp imm and a versatile call instruction: dest := PC , PC := src1 AluOp src2. src1 must be either a register or the program counter itself for PC-relative branches. These two instructions suffice to change the control flow. The call instruction may be used as a return instruction, too. Since the destination and the source operands may be nearly freely chosen, it is possible to synthesize the appropriate instructions. Additionally, a call instruction with an absolute address is available which is used by the code re-organizer. The condition inspected by a branch instruction may be either IsZero or IsNonZero, depending on a selection in the special field. An unconditional branch can be specified as if IsZero(Reg0) then PC := src1 AluOp imm since this condition always yields true. To simplify the work for the code re-organizer, the branches can conditionally annul the instructions in the delay slots. If a bit in the special field specifies conditional annulling, the delay instructions are annulled if the branch is not taken.