PDA

View Full Version : Compiler for actsl, extremely small


echidna
October 27th, 2007, 01:23 PM
Well i try first here, when i know what to explain, then i maybe write it in that sticky thread...

Source code is in comp.compilers

http://compilers.iecc.com/comparch/article/07-10-055

This source code is only less than 300 lines of normal C code, in actsl it's a bit more, but actsl also uses a bit longer constructs than C. So if you want to write your own compiler, i honestly don't know any smaller compilers to start from. The compiler compiles itself, so you don't even have to know C for that, you can also write a different, better etc compiler in actsl.

Now i'll try to explain what this compiler and language is all about, not quite sure how well i will succeed in doing that. The main principle is that this language comes directly from the properties of the computer, so that the compiler is minimal, and has the clearest structure. It is based on the principle that every token would cause a direct action, which simply means that for every token, there is an if statement, which also generates code necessary for implementing that token. For making the language more advanced, it's possible to add some minimal parsing in the future, but when it's truly minimal, the clear structure of the compiler would likely remain.

When the compiler is extremely small, then the language is also minimal, with a minimal number of syntax rules, so not much to remember, and therefore should be easy to learn. But in spite that the language is simple, it has expressions, functions, if statements and while loops, and also pointers, therefore almost anything which can be done with C, should be possible to do with actsl. And the compiler compiles quite fast executables in machine code. All the standard C functions and POSIX functions can be called from the actsl code (you don't have to include any header files for that), but arguments cannot be passed to actsl own functions the same way, it's possible though to pass the arguments to them using the data stack.

So i try to explain the language. Most importantly, all expressions are in reverse polish notation. This is similar to some other languages, like Forth, but using stack is kind of more extended in actsl, enabling to push into stack some type of data which is not possible in Forth. The reverse polish notation is really simple, the value of every constant or variable would be pushed into stack whenever it appears, operator does the operation with the last two values in stack, and when it produces some result, then this would too be pushed into stack. So 3 - 1 in arithmetic notation, would be 3 1 -. That way we also need no parentheses, as we can leave the previous result in stack, provide more values and operators, and finally perform some operation with the results of two previous operations, like (3 + 1) * 2 would be 3 1 + 2 *. Not very difficult, and in fact very convenient if to get used to. But in actsl, the operations are not only mathematical and logical, there are also operarators deref, ref and =. It is the common convention in math and programming, that variables represent their values. But in programming, every variable has one more property, it's reference or address. And what deref operar does, is that it considers the result of the previous expression (last value in stack) as an address of some variable, and replaces it with the value of that variable (a value at that address). The operator ref does the opposite, it pushes to the stack the reference or address of a variable, different from all other operators, the operator ref is prefix (must be before the operand). This is what enables to do all the pointer operations, like in C. And these operators are quite closely related to the assignment operator =, which assigns the previous value in stack, to the last value in stack (kind of reverse, but this enables to always obtain the last value in stack). It is that we can assign only to reference (store something to some address), not to value, so when we want to assign something to variable, we must reference this variable first. I hope you understood.

Well then, what else... At present, there are only one type of variables, int (the declaration int xn says that xn is an int variabe), which means 32 bits in 32 bit machine. This is just like many compilers were once for small 8 bit computers or such, which had no real arithmetics processors, so the real number arithmetics had to be done either using BCD, or special libraries, like this can be done with GMP library, or by piping through dc or bc, if there are not much such arithmetics. It is not so terribly complicated to implement floating point variables, but the compiler would not be minimal any more then, as then we would also need a symbol table, and the compiler may become some 500 lines of code, which of course is not big at all, but it would not be minimal any more. Also what that means, is that there are no arrays, also no character arrays. What this means, is that the arrays must be allocated before use, using malloc, like 100 arg call malloc ref buf = assigns an array 100 bytes (not int-s!) long to buf.

Well what else... For calling an external function, we must provide arguments in reverse order, using operator arg after every argument, to push it into processor stack (which is not data stack), and then write call funname, for actsl functions we simply don't provide arguments. Such function call may appear in every expression, and the return value would be pushed into stack. This happens for every function, so remember to use it, ie to assign it to some variable, so that these values would not remain hanging in the stack.

Then what else... Defining a function is simple, just start with fun funname, and end with endfun, the return operator returns from function. The if statement is simple also, if expression then statements endif. While statement has the form while expression do statements endwhile, also you can use the statements continue, to proceed to the next iteration, and break, to exit the innermost loop.

To compile the compiler, just copy the first program from archive to actsl.c, and use the command gcc actsl.c -o actsl, well, you can compile the stage 2 compiler, too, using already the compiler itself for that. To compile the actsl program, well, the compiler just asks the input file and output file. Output file is assembly code, usually with extension s. And then, gcc something.s -o something creates an executable called something, which you can then run like ./something. I hope that i succeeded to say everything important, but if not, then sure ask. But anyway, have fun!