The reduction system
-Red
[GKK92]
developed at the University of Kiel
truly performs high-level program transformations governed by the
rewrite rules of a full-fledged applied
-calculus.
Programs of the untyped high-level reduction language KiR may be reduced either partially or to normal forms.
Conceptually,
program execution takes place entirely within the space of programs.
Intermediate programs may be made visible to the user in KiR notation.
-Red
comprises an abstract stack machine ASM for compiled graph
reduction and a compilation scheme for the transformation of
KiR programs into abstract machine code.
At runtime, ASM performs supercombinator reductions
under an applicative order regime,
switching to interpretation iff potential situations
of name clashes have to be resolved by full
-reductions.
The implementation of conventional languages and compiled graph reduction poses similar needs on processor architectures [HE+86]. However, there are some aspects that must be taken into consideration for the efficient implementation of functional languages [Kie85], some of which are also important for the implementation of other languages:
The most important issue is the architectural support for
highly efficient subroutine calls and parameter passing mechanisms.
In contrast to conventional implementations which use a single
runtime stack for recursive function invocations,
-Red
features four stacks to accommodate the runtime environment.
One stack each is to
Multiple stacks allow for a simpler compilation scheme than just a single stack
and also facilitate re-compilation of intermediate states of
program execution into high-level KiR programs,
as required by the reduction concept of
-Red
.
In contrast to this, compiling to conventional register machines
alienates the internal states and at least complicates re-compilation.
Moreover, multiple stacks increase the opportunities for some optimizations
which are not possible with a single stack.
An important feature of
-Red
is that the argument stack and the workspace stack are interchanged
between function calls.
The workspace frame of a calling
function (or at least its upper part)
can thus directly be taken as the argument frame of the called function.
This enables the compiler to generate code which
releases as early as possible and right from the top of the argument frame
entries that are no longer needed,
which might save considerable stack space.
Upon termination of a function call, the stack constellation of the calling
function can be re-installed by again switching argument
and workspace stacks,
and by moving the result of the called function to the new working stack.
ASM can not be compiled to conventional computers with reasonable efficiency,
since modern RISC processors do not provide the facilities
to directly support its stack concept.
We have therefore developed a tailor-made RISC processor architecture
with a unique stack system.
It comprises four processor-internal stacks which can be readily extended into main memory.
Two of the stacks can be flipped by inverting an internal flag.
The stacks are the main part of the architecture.
They combine
the advantages of
pure stacks
with the flexibility of
variable sized register windows.
Other support for
-Red
is available for
runtime type-checking by means of tags
which is to validate operand types simultaneously with the execution of primitive functions.
Fast is a general purpose 40-bit RISC processor with a Harvard architecture.
A versatile but small instruction set with register-to-register, load/store,
subroutine call and conditional branch instructions
supports efficient implementations of functional languages.
However, implementations of conventional languages may benefit
from this stack architecture as well.
To demonstrate this, we compare on this architecture the performance of
programs written in our functional language KiR and in C.
C has been chosen for two reasons:
it is in widespread use and hence allows for a comparison of
the performance of Fast with other RISC processors
.
Moreover,
it does not have nested function definitions,
and in this respect resembles the concept
of supercombinators insofar, as the parameter passing mechanisms
are very similar:
both require only a single argument frame to accommodate the environment
of a function call.
In the following, we will give an overview over the main features of the processor architecture, then we will explain the C compiler, and last we will compare the performance of a representative sample of programs written in KiR and C. The compilation of KiR to ASM code is specified in [GKK92] and will not be explained here.