# A pure imperative language

2010-02-16, last updated 2014-03-04

In the previous post I developed a virtual machine for a silly language (Brainfuck) in a silly language (C++). This post is documents my attempt to create a language that is not silly. I’ll start with three use-cases, show their C++ implementation and then start mutulating the C++ syntax.

## Three challenging functions

The following three functions are my favorites when trying out a new language. You’ll be supprised at how often one or more of these examples are simply impossible in a language, yet they represent very common mathematical operations.

The first challenge is recursion:

I could make it more challening by requiring guaranteed tail call optimization, but for now let’s just leave it at supporting recursive functions.

The second challenge is multiple return values:

```
void divide(int& quotient, int& remainder, int dividend, int divisor)
{
quotient = 0;
remainder = dividend;
while(remainder >= divisor) {
remainder -= divisor;
++quotient;
}
}
```

The C++ solution is functional, but syntacticly it mixes inputs and outputs.

The third challenge is first class functions:

```
typedef double (*function)(double);
function makeDerivative(function f, double dx)
{
return delegate double (double x) {
return (f(x + dx) - f(x)) / dx;
};
}
```

Note that this last example is not valid C`! In fact, I doubt there is a way to express this function in C`

without calling an external compiler and linking in the result.

## Developing a solution

Since this is supposed to become a practical solution, I’ll start with a pragmatic language: C++. In a few steps I will completely disfigure this language, hopefully for the better.

### Step zero: removing the sugar

Templates in C++ are little more than statically typed macros. Remove them. Object oriented programming is little more than implicit passing of a `this`

pointer to member functions. Type inheritance and virtuals functions are a bit more complex, but these can be simultated with first class functions. Oh yes and remove the whole pre-processor.

We are now basically left with C again, but without a few of the problems that C has. We can always add a new sugar coating later, but first we’ll redesign the cake.

### Step one: multiple return values

The second challenge is probably the easiest, so I’ll start with this one. Let’s extend C++ in the following way:

```
(int quotient, int remainder)divide(int dividend, int divisor)
{
quotient = 0;
remainder = dividend;
while(remainder >= divisor) {
remainder -= divisor;
++quotient;
}
}
// Function call syntax
(q, r) = divide(13, 4);
```

In summary: return values are named, return values are assigned as local variables, `return`

no longer takes an argument, function calls take the form `() = ();`

. Nesting of function calls like `sqrt(sin(x)+cos(x))`

is no longer supported (altough this could easily be reintroduced with some syntactic sugar).

### Step two: pure functions

For the sake of clarity and tractability, let’s make the all functions pure. That is, all inputs are constant or unchangable.

Pure functions make it difficult to interact with stateful objects. These not only include objects from object-oriented programming, but also thing such as IO operations. In fact, functions like `currentTime()`

and `randomNumber()`

are by their nature impure.

One solution is to introduce an explicit state variable:

```
(Object this, int returnValue) ObjectMemberFunction (Object this);
(pipe state, string input) readInput (pipe state);
(clockType state, timeType time) currentTime (clockType state);
(RNG state, int number) randomNumber (RNG state, int min, int max);
```

This will all be hidden under a sugar coating later. I have not yet tought about how to incorporate multi-threaded memory access into this.

### Step three: lambda expressions

The last challenge requires functions as values and thus the creation of anonymous functions.

I’ll introduce the following expression to create anonymous functions

Now challenge three can be implemented as:

### Step four: only lambda expressions

We are now in the situation where the following two pieces of code are equivalent:

Redundancy in notation is never good, it reflects a discrepancy between syntax and semantics. I’ll remove the redundancy by elliminating the first form. All functions are to be declared via the λ expression.

### Step five: re-introducing recursion

The last step introduced a problem, we can no longer define recursive functions! Look what happens:

The problem is, `factorial`

is not yet defined when inside the function. This is the problem of recursive anonymous lambda expressions. The common way to solve this is using a fixed point combinator (see also this article in lisp and the F♯ version).

A practical way to solve this is by introducing the `self`

keyword for self referencing:

Note that I am breaking my own function calling rule for the sake of clarity. To enforce the rule one would need to explicitly introduce temporary variables. The correct version would be:

### Step six: syntax change

Since this language has become an odd one in the curly bracket family we might as well change all of the syntax:

Design principles:

Very simple grammar

Strong emphasize on following common mathematical syntax

Symetric infix operators have symetric symbols

Assymetric infix operators have assymetric symbols, of which the mirror image works as espected

Functionality Old notation | New notation |
---|---|

Assignment `x = y` |
`x ↤ y` |

Anonymous tuple type `(type, type, …)` |
`(type, type, …)` |

Anonymous tuple `(value, value, …)` |
`(value, value, …)` |

Named tuples `(type name, type name, …) | `(name: type, name: type |

Function type +outputs **inputs* + |
`→` |

Function definition `{ }` |
+name ↤ λ inputs → outputs { statements | } |

Function call `=` |
`↤` |

Where *inputs* and *outputs* are anonymous tuples types when in a function type, named tuples in a function definition and anonymous tuples in a function call.

The factorial example now becomes:

### Step seven: the *if* meta-function

In the above example still contains silly syntaxt like `if (<em>bool</em>) {<em>statements</em> } else {<em>statements</em> }`

. This can be replaced by a simple meta-function:

Depending on `condition`

this function returns either `then`

or `else`

. Now, the naïve way to employ this functions is:

```
factorial ↤ λ (n: int) → (f: int)
{
f ↤ n × self(n - 1)
isZero ↤ equals(n, 0)
f ↤ if(isZero, 1, f)
}
```

But this will lead to infinite recursion. The correct way is to make use of the first-class functions:

### Step eight: the *while* meta-function

Another way to implement the factorial function is using a loop:

```
factorial ↤ λ (n: int) → (f: int)
{
f ↤ 1
do {
f ↤ n × f
n ↤ n - 1
repeat ↤ ni > 1
} while (repeat)
}
```

But this `do { <em>statements</em> } while (<em>condition</em>)`

polutes the language. It can be replaced by the following meta function:

This function starts with `loopState`

and then itterates the function `loop`

on it untill this function sets `repeat`

to false. The factorial can now be implemented as:

```
factorial ↤ λ (n: int) → (f: int)
{
loop ↤ λ ((ni: int, fi: int)) → ((no: int, fo: int), repeat: bool)
{
fo ↤ ni × fi
no ↤ ni - 1
repeat ↤ ni > 1
}
f ↤ while((n, 1), loop)
}
```

All the other control flow constructions such as `switch`

and `for`

can be trivially rewritten using `if`

and `while`

.

## to be continued…

That’s enough for now. Some things for the future:

- Formal grammar and type system
- Parser and compiler (using Quex and LLVM)
- Explicit internal and external variables to functions
- Generics
- Homoiconicity
- Ability to extend parser and compiler
*in language*and at runtime - A large coating of syntactic sugar developed in language
- Arrays, list, containers, etc…
- Embedded proofs and a verifying compiler
- Self-hosting
- Self-awareness