Details of Implementation for Syntaxtree
Used Structures
The so called syntay_tree is in reality a list of real trees.
Each trees root is a Fun_node_ptr.
Its children are Expr_node_ptr's, which may have
Expr_node_ptr's as children theirself.
Fun_node_ptr
typedef struct _Fun_node_struct {
gchar* fun_id;
Nodetype nodetype; //only e_invalid, e_userfun and
//e_constructor are used
MLType resulttype; //type of the value which is returned or
//constructed
GList* paramslist; //list of Expr_node_ptr's for each clause
GList* bindingslist;//list of Symboltable_entry_ptr's for each clause
//stores the type of id's occurring in parameters
GList* bodylist; //unused in case of e_constructor, otherwise a
//list of Expr_node_ptr's for each clause
} Fun_node_struct, *Fun_node_ptr;
Expr_node_ptr
typedef struct _Expr_node_struct{
Nodetype nodetype; //the kind of this node
gchar* strrep; //string representation of node
MLValue value; //the value represented by this node
//(may by unused)
MLType valuetype; //type of the value found/computed by the typechecker
MLType valuetype_expected; //expected type of the value
gpointer misc; //additional stuff for later use
//(e.g. the position on screen)
//Maybe the following data should be moved into a misc-structure.
//So use ACCESSMACROS!!!
gint lineno; //line where the token was read
gint colno; //col where the token was read
struct _Expr_node_struct *child[EXPR_NODE_MAX_CHILDREN];
} Expr_node_struct, *Expr_node_ptr;
Syntax Checking
Checking the syntax is done by the code automaticly generated by
(LEX &) YACC.
Symbol Checking
Within a datatype-system only those types may be used, which are allready
defined befor or will be defined later, but still within this
datatype-system.
That makes it a little bit harder to check, if the actually seen type is
valid or not.
To do so it is necessary to keep an actually undefined type in mind and
check its definition again, when having parsed the whole datatype-sytem.
For function-systems it is similar.
Within the body only those functions may occur, which are allready defined
or will be defined within this system.
Datatype-systems
The actual datatype within a datatype-system is stored in the
symbol_table, if its identifier is not allready stored in
the symbol_table.
Beeing in the symbol_table would mean, that this identifier is
allready defined elsewhere.
The same is done for each constructor.
Each type occuring in the definition of a constructor will be checked.
If it is undefined, it's not an error until now.
To keep this identifier in mind it is stored to a special list called
undefined_symbols.
Having parsed the whole datatype-system for each identifier in the
undefined_symbols is checked, if it is defined now.
If not it is an error.
After that undefined_symbols is cleared for the next system.
Function-systems
If the actual function within a function-system is not stored allready in
the symbol_table, it is now.
Otherwise it is an error.
Each identifier of a parameter is stored in a binding_table.
At least the body is parsed.
Each identifier, which is not a function application, has to be found
in the symbol_table or the binding_table.
If not it is an error.
Each identifier of a function application, which is not allready in the
symbol_table, will be stored to the undefined_symbols.
After the whole function-system has been parsed, for those identifiers is
checked, if they are defined in the symbol_table now.
Error messages
To be able to present the user a meaningfull error message, it is necessary
to store the line and column number of the identifier with it in the
undefined_symbols.
Type Checking
The type checking is done in a seperated step by traversing the whole
syntaxtree.
That keeps the YACC-file from becoming to complex.
Building the syntaxtree and checking the symbold makes it complex enough.
Another reason is that the syntaxtree is build in bottom-up manner and
type checking is done top-down.
**** to be done ****
Memory Issues