2010-02-16 (updated 2014-03-04)

A pure imperative language

% Remco Bloemen % 2010-02-16, last updated 2014-03-04

In the previous post I developed a virtual machine for a silly language (Brainfuck) in a silly language (C++). This post is documents my attempt to create a language that is not silly. I’ll start with three use-cases, show their C++ implementation and then start mutulating the C++ syntax.

Three challenging functions

The following three functions are my favorites when trying out a new language. You’ll be supprised at how often one or more of these examples are simply impossible in a language, yet they represent very common mathematical operations.

The first challenge is recursion:

int factorial(int n)
{
	if(n == 1)
		return 1;
	return n * fac(n - 1);
}

I could make it more challening by requiring guaranteed tail call optimization, but for now let’s just leave it at supporting recursive functions.

The second challenge is multiple return values:

void divide(int& quotient, int& remainder,  int dividend, int divisor)
{
	quotient = 0;
	remainder = dividend;
	while(remainder >= divisor) {
		remainder -= divisor;
		++quotient;
	}
}

The C++ solution is functional, but syntacticly it mixes inputs and outputs.

The third challenge is first class functions:

typedef double (*function)(double);

function makeDerivative(function f, double dx)
{
	return delegate double (double x) {
		return (f(x + dx) - f(x)) / dx;
	};
}

Note that this last example is not valid C! In fact, I doubt there is a way to express this function in C without calling an external compiler and linking in the result.

Developing a solution

Since this is supposed to become a practical solution, I’ll start with a pragmatic language: C++. In a few steps I will completely disfigure this language, hopefully for the better.

Step zero: removing the sugar

Templates in C++ are little more than statically typed macros. Remove them. Object oriented programming is little more than implicit passing of a this pointer to member functions. Type inheritance and virtuals functions are a bit more complex, but these can be simultated with first class functions. Oh yes and remove the whole pre-processor.

We are now basically left with C again, but without a few of the problems that C has. We can always add a new sugar coating later, but first we’ll redesign the cake.

Step one: multiple return values

The second challenge is probably the easiest, so I’ll start with this one. Let’s extend C++ in the following way:

(int quotient, int remainder)divide(int dividend, int divisor)
{
	quotient = 0;
	remainder = dividend;
	while(remainder >= divisor) {
		remainder -= divisor;
		++quotient;
	}
}

// Function call syntax
(q, r) = divide(13, 4);

In summary: return values are named, return values are assigned as local variables, return no longer takes an argument, function calls take the form () = ();. Nesting of function calls like sqrt(sin(x)+cos(x)) is no longer supported (altough this could easily be reintroduced with some syntactic sugar).

Step two: pure functions

For the sake of clarity and tractability, let’s make the all functions pure. That is, all inputs are constant or unchangable.

Pure functions make it difficult to interact with stateful objects. These not only include objects from object-oriented programming, but also thing such as IO operations. In fact, functions like currentTime() and randomNumber() are by their nature impure.

One solution is to introduce an explicit state variable:

(Object this, int returnValue) ObjectMemberFunction (Object this);

(pipe state, string input) readInput (pipe state);

(clockType state, timeType time) currentTime (clockType state);

(RNG state, int number) randomNumber (RNG state, int min, int max);

This will all be hidden under a sugar coating later. I have not yet tought about how to incorporate multi-threaded memory access into this.

Step three: lambda expressions

The last challenge requires functions as values and thus the creation of anonymous functions.

I’ll introduce the following expression to create anonymous functions

( <outputs> ) λ ( <inputs> )
{
	<statements>
}

Now challenge three can be implemented as:

typedef (double)*function(double);

(function derivative) makeDerivative (function f, double dx)
{
	derivative = (double x) λ (double dfdx)
	{
		dfdx = (f(x + dx) - f(x)) / dx;
	}
}

Step four: only lambda expressions

We are now in the situation where the following two pieces of code are equivalent:

(double y) addFive (double x)
{
	y = x + 5;
}

addFive = (double y) λ (double x)
{
	y = x + 5;
}

Redundancy in notation is never good, it reflects a discrepancy between syntax and semantics. I’ll remove the redundancy by elliminating the first form. All functions are to be declared via the λ expression.

Step five: re-introducing recursion

The last step introduced a problem, we can no longer define recursive functions! Look what happens:

factorial = (int f)λ(int n)
{
	if(n == 0)
		f = 1;
	else
		f = n * factorial(n - 1);
}

The problem is, factorial is not yet defined when inside the function. This is the problem of recursive anonymous lambda expressions. The common way to solve this is using a fixed point combinator (see also this article in lisp and the F♯ version).

A practical way to solve this is by introducing the self keyword for self referencing:

factorial = (int f)λ(int n)
{
	if(n == 0)
		f = 1;
	else
		f = n * self(n - 1);
}

Note that I am breaking my own function calling rule for the sake of clarity. To enforce the rule one would need to explicitly introduce temporary variables. The correct version would be:

factorial = (int f)λ(int n)
{
	bool isZero;
	(isZero) = equals(n, 0);
	if(isZero) {
		(f) = identity(1);
	} else {
		int prevN;
		(prevN) = subtract(n, 1);
		(prevF) = self(prevN);
		(f) = multiply(n, prevF);
	}
}

Step six: syntax change

Since this language has become an odd one in the curly bracket family we might as well change all of the syntax:

Design principles:

Very simple grammar
Strong emphasize on following common mathematical syntax
Symetric infix operators have symetric symbols
Assymetric infix operators have assymetric symbols, of which the mirror image works as espected

Functionality Old notation New notation

Assignment x = y x ↤ y Anonymous tuple type (type, type, …) (type, type, …) Anonymous tuple (value, value, …) (value, value, …) Named tuples (type name, type name, …) (name: type, name: type Function type +outputs *inputs + → Function definition { } +name ↤ λ inputs → outputs { statements | } Function call = ↤

Where inputs and outputs are anonymous tuples types when in a function type, named tuples in a function definition and anonymous tuples in a function call.

The factorial example now becomes:

factorial ↤ λ (n: int) → (f: int)
{
	isZero ↤ equals(n, 0)
	if(isZero) {
		f ↤ 1
	} else {
		f ↤ n × self(n - 1)
	}
}

Step seven: the if meta-function

In the above example still contains silly syntaxt like if (bool) {statements } else {statements }. This can be replaced by a simple meta-function:

if ↤ λ (condition: bool, then: T, else: T) → (output: T)

Depending on condition this function returns either then or else. Now, the naïve way to employ this functions is:

factorial ↤ λ (n: int) → (f: int)
{
	f ↤ n × self(n - 1)
	isZero ↤ equals(n, 0)
	f ↤ if(isZero, 1, f)
}

But this will lead to infinite recursion. The correct way is to make use of the first-class functions:

factorial ↤ λ (n: int) → (f: int)
{
	then ↤ λ (n: int) → (f: int)
	{
		f ↤ 1
	}
	else ↤ λ (n: int) → (f: int)
	{
		f ↤ n × self(n - 1)
	}
	isZero ↤ equals(n, 0)
	ff ↤ if(isZero, then, else)
	f ↤ ff(n)
}

Step eight: the while meta-function

Another way to implement the factorial function is using a loop:

factorial ↤ λ (n: int) → (f: int)
{
	f ↤ 1
	do {
		f ↤ n × f
		n ↤ n - 1
		repeat ↤ ni > 1
	} while (repeat)
}

But this do { statements } while (condition) polutes the language. It can be replaced by the following meta function:

while ↤ λ (loopState: T, loop: (loopState: T) → (loopState: T, repeat: bool)) → (output: T)

This function starts with loopState and then itterates the function loop on it untill this function sets repeat to false. The factorial can now be implemented as:

factorial ↤ λ (n: int) → (f: int)
{
	loop ↤ λ ((ni: int, fi: int)) → ((no: int, fo: int), repeat: bool)
	{
		fo ↤ ni × fi
		no ↤ ni - 1
		repeat ↤ ni > 1
	}
	f ↤ while((n, 1), loop)
}

All the other control flow constructions such as switch and for can be trivially rewritten using if and while.

to be continued…

That’s enough for now. Some things for the future:

Formal grammar and type system
Parser and compiler (using Quex and LLVM)
Explicit internal and external variables to functions
Generics
Homoiconicity
Ability to extend parser and compiler in language and at runtime
A large coating of syntactic sugar developed in language
Arrays, list, containers, etc…
Embedded proofs and a verifying compiler
Self-hosting
Self-awareness