A pure imperative language
% Remco Bloemen % 20100216, last updated 20140304
In the previous post I developed a virtual machine for a silly language (Brainfuck) in a silly language (C++). This post is documents my attempt to create a language that is not silly. I’ll start with three usecases, show their C++ implementation and then start mutulating the C++ syntax.
Three challenging functions
The following three functions are my favorites when trying out a new language. You’ll be supprised at how often one or more of these examples are simply impossible in a language, yet they represent very common mathematical operations.
The first challenge is recursion:
int factorial(int n)
{
if(n == 1)
return 1;
return n * fac(n  1);
}
I could make it more challening by requiring guaranteed tail call optimization, but for now let’s just leave it at supporting recursive functions.
The second challenge is multiple return values:
void divide(int& quotient, int& remainder, int dividend, int divisor)
{
quotient = 0;
remainder = dividend;
while(remainder >= divisor) {
remainder = divisor;
++quotient;
}
}
The C++ solution is functional, but syntacticly it mixes inputs and outputs.
The third challenge is first class functions:
typedef double (*function)(double);
function makeDerivative(function f, double dx)
{
return delegate double (double x) {
return (f(x + dx)  f(x)) / dx;
};
}
Note that this last example is not valid
C! In fact, I doubt there is a way to express this function in C
without calling an external compiler and linking in the result.
Developing a solution
Since this is supposed to become a practical solution, I’ll start with a pragmatic language: C++. In a few steps I will completely disfigure this language, hopefully for the better.
Step zero: removing the sugar
Templates in C++ are little more than statically typed macros. Remove
them. Object oriented programming is little more than implicit passing
of a this
pointer to member functions. Type inheritance and virtuals
functions are a bit more complex, but these can be simultated with first
class functions. Oh yes and remove the whole preprocessor.
We are now basically left with C again, but without a few of the problems that C has. We can always add a new sugar coating later, but first we’ll redesign the cake.
Step one: multiple return values
The second challenge is probably the easiest, so I’ll start with this one. Let’s extend C++ in the following way:
(int quotient, int remainder)divide(int dividend, int divisor)
{
quotient = 0;
remainder = dividend;
while(remainder >= divisor) {
remainder = divisor;
++quotient;
}
}
// Function call syntax
(q, r) = divide(13, 4);
In summary: return values are named, return values are assigned as local
variables, return
no longer takes an argument, function calls take the
form () = ();
. Nesting of function calls like sqrt(sin(x)+cos(x))
is
no longer supported (altough this could easily be reintroduced with some
syntactic sugar).
Step two: pure functions
For the sake of clarity and tractability, let’s make the all functions pure. That is, all inputs are constant or unchangable.
Pure functions make it difficult to interact with stateful objects.
These not only include objects from objectoriented programming, but
also thing such as IO operations. In fact, functions like
currentTime()
and randomNumber()
are by their nature impure.
One solution is to introduce an explicit state variable:
(Object this, int returnValue) ObjectMemberFunction (Object this);
(pipe state, string input) readInput (pipe state);
(clockType state, timeType time) currentTime (clockType state);
(RNG state, int number) randomNumber (RNG state, int min, int max);
This will all be hidden under a sugar coating later. I have not yet tought about how to incorporate multithreaded memory access into this.
Step three: lambda expressions
The last challenge requires functions as values and thus the creation of anonymous functions.
I’ll introduce the following expression to create anonymous functions
( <outputs> ) λ ( <inputs> )
{
<statements>
}
Now challenge three can be implemented as:
typedef (double)*function(double);
(function derivative) makeDerivative (function f, double dx)
{
derivative = (double x) λ (double dfdx)
{
dfdx = (f(x + dx)  f(x)) / dx;
}
}
Step four: only lambda expressions
We are now in the situation where the following two pieces of code are equivalent:
(double y) addFive (double x)
{
y = x + 5;
}
addFive = (double y) λ (double x)
{
y = x + 5;
}
Redundancy in notation is never good, it reflects a discrepancy between syntax and semantics. I’ll remove the redundancy by elliminating the first form. All functions are to be declared via the λ expression.
Step five: reintroducing recursion
The last step introduced a problem, we can no longer define recursive functions! Look what happens:
factorial = (int f)λ(int n)
{
if(n == 0)
f = 1;
else
f = n * factorial(n  1);
}
The problem is, factorial
is not yet defined when inside the function.
This is the problem of recursive anonymous lambda expressions. The
common way to solve this is using a fixed point
combinator (see
also this article in
lisp
and the F♯
version).
A practical way to solve this is by introducing the self
keyword for
self referencing:
factorial = (int f)λ(int n)
{
if(n == 0)
f = 1;
else
f = n * self(n  1);
}
Note that I am breaking my own function calling rule for the sake of clarity. To enforce the rule one would need to explicitly introduce temporary variables. The correct version would be:
factorial = (int f)λ(int n)
{
bool isZero;
(isZero) = equals(n, 0);
if(isZero) {
(f) = identity(1);
} else {
int prevN;
(prevN) = subtract(n, 1);
(prevF) = self(prevN);
(f) = multiply(n, prevF);
}
}
Step six: syntax change
Since this language has become an odd one in the curly bracket family we might as well change all of the syntax:
Design principles:

Very simple grammar

Strong emphasize on following common mathematical syntax

Symetric infix operators have symetric symbols

Assymetric infix operators have assymetric symbols, of which the mirror image works as espected
Functionality Old notation New notation
Assignment x = y
x ↤ y
Anonymous tuple type (type, type, …)
(type, type, …)
Anonymous tuple (value, value, …)
(value, value, …)
Named tuples (type name, type name, …)
(name: type, name: type
Function type +outputs *inputs + →
Function definition { }
+name ↤ λ inputs → outputs { statements  }
Function call =
↤
Where inputs and outputs are anonymous tuples types when in a function type, named tuples in a function definition and anonymous tuples in a function call.
The factorial example now becomes:
factorial ↤ λ (n: int) → (f: int)
{
isZero ↤ equals(n, 0)
if(isZero) {
f ↤ 1
} else {
f ↤ n × self(n  1)
}
}
Step seven: the if metafunction
In the above example still contains silly syntaxt like
if (<em>bool</em>) {<em>statements</em> } else {<em>statements</em> }
.
This can be replaced by a simple metafunction:
if ↤ λ (condition: bool, then: T, else: T) → (output: T)
Depending on condition
this function returns either then
or else
.
Now, the naïve way to employ this functions is:
factorial ↤ λ (n: int) → (f: int)
{
f ↤ n × self(n  1)
isZero ↤ equals(n, 0)
f ↤ if(isZero, 1, f)
}
But this will lead to infinite recursion. The correct way is to make use of the firstclass functions:
factorial ↤ λ (n: int) → (f: int)
{
then ↤ λ (n: int) → (f: int)
{
f ↤ 1
}
else ↤ λ (n: int) → (f: int)
{
f ↤ n × self(n  1)
}
isZero ↤ equals(n, 0)
ff ↤ if(isZero, then, else)
f ↤ ff(n)
}
Step eight: the while metafunction
Another way to implement the factorial function is using a loop:
factorial ↤ λ (n: int) → (f: int)
{
f ↤ 1
do {
f ↤ n × f
n ↤ n  1
repeat ↤ ni > 1
} while (repeat)
}
But this do { <em>statements</em> } while (<em>condition</em>)
polutes
the language. It can be replaced by the following meta function:
while ↤ λ (loopState: T, loop: (loopState: T) → (loopState: T, repeat: bool)) → (output: T)
This function starts with loopState
and then itterates the function
loop
on it untill this function sets repeat
to false. The factorial
can now be implemented as:
factorial ↤ λ (n: int) → (f: int)
{
loop ↤ λ ((ni: int, fi: int)) → ((no: int, fo: int), repeat: bool)
{
fo ↤ ni × fi
no ↤ ni  1
repeat ↤ ni > 1
}
f ↤ while((n, 1), loop)
}
All the other control flow constructions such as switch
and for
can
be trivially rewritten using if
and while
.
to be continued…
That’s enough for now. Some things for the future:
 Formal grammar and type system
 Parser and compiler (using Quex and LLVM)
 Explicit internal and external variables to functions
 Generics
 Homoiconicity
 Ability to extend parser and compiler in language and at runtime
 A large coating of syntactic sugar developed in language
 Arrays, list, containers, etc…
 Embedded proofs and a verifying compiler
 Selfhosting
 Selfawareness