Optimizing brainfuck compiler using LLVM

Remco Bloemen

2010-02-27, last updated 2014-03-04

Optimizing brainfuck compiler using LLVM

LLVM is currently the hype in the world of compilers. This C++ based compiler infrastruture is hard on its way to replace the Gnu Compiler Collection. In contrast with GCC LLVM has a nice C++ interface wich allows you to easily implement your own language, or so I thought.

It turns out that LLVM is still somewhat of a moving target and examples from previous versions no longer compile. It was quite a struggle to get the simplest example to work since the API is very unintuitive and complex at first.

In the end I managed to get it working.

Then I played with it some more. Trying out other ways to translate brainfuck to LLVM IR.

In the end I found that I could get remarkable optimzation results. The key was:

If you have done all that, then the compiler is capable of compiling the following brainfuck “Hello Wordl!”-program:

>+++++++++[<++++++++>-]<.>+++++++[<++++>-]<+.+++++++..+++.>>>
++++++++[<++++>-]<.>>>++++++++++[<+++++++++>-]<---.<<<<.+++
.------.--------.>>+.

into

which is optimal! All the loops and operations are elliminated! Compare that with the esotope compiler!

Benchmark!

Okay, now that I have my super-optimizing llvm-based brainfuck compiler it is time to benchmark it against the direct-threaded virtual machine I developed earlier.

The direct threaded VM has two modes; non-optimizing where it executes brainfuck instructions as-is and optimizing where it replaces certain sequences of instructions with more efficient onces.

The llvm jit-compiler has a broad selection of optimization passes that can be chained in uncountable ways. After a lot of trial and error I found the chain presented in the source below. In the benchmark I also included no optimization passes, only an instruction combining pass, the default selection for amd64, the aggressive default selection for amd64 and my own extensive chain.

The first benchmark program, Bottles, prints the lyric to “99 bottles of beer”. Developed by Andrew Paczkowski. origin.

The second, Hanoi, solves the three peg nine disk towers of Hanoi problem and animates the solution with ascii art. Developed by Clifford Wolf. origin

Last, Mandelbrot, calculates a 512×192 mandelbrot fractal and displays it with ascii characters. Developed by Erik Bosman. origin

Virtual machine Optimization Bottles Hanoi Mandelbrot
direct threaded none 0.008 29.647 1978.041
direct threaded simple 0.017 1.091 273.033
llvm jit none 0.129 11.731 111.611
llvm jit combining pass 0.121 7.245 52.894
llvm jit default 0.157 9.491 42.440
llvm jit aggressive 0.174 9.618 42.048
llvm jit extensive 0.375 50.141 37.745

The results are in seconds and include everything from starting the compiler until the exit of the brainfuck program.

One result stands out: the extensively optimized hanoi takes a lot of time! Well, actually, the optimization takes a lot of time, the resulting program executes in less than a second. Further analysis (using callgrind, callgraph shown below) showed that all the time was spend in the “Loop Invariant Code Motion” pass. This pass is largely responsible for the incredible optimization result shown earlier. I suspect that LLVM was also able to reduce hanoi to a near minimal set of output instructions, although the resulting code is too messy to confirm this.

Hanoi case callgraph
Hanoi case callgraph

Source!

As always, here is the source:

#include <stack>
#include <fstream>
#include <llvm/Module.h>
#include <llvm/Function.h>
#include <llvm/PassManager.h>
#include <llvm/CallingConv.h>
#include <llvm/Analysis/Verifier.h>
#include <llvm/Support/IRBuilder.h>
#include <llvm/ExecutionEngine/ExecutionEngine.h>
#include <llvm/Target/TargetData.h>
#include <llvm/ModuleProvider.h>
#include <llvm/LinkAllPasses.h>
#include <llvm/ExecutionEngine/JIT.h>
#include <llvm/Target/TargetSelect.h>
using namespace std;
using namespace llvm;

struct MyLoopInfo
{
    Value* beforeValue;
    PHINode* startValue;
    Value* endValue;
    Value* afterValue;

    BasicBlock* beforeBlock;
    BasicBlock* startBlock;
    BasicBlock* endBlock;
    BasicBlock* afterBlock;
};

Function* makeFunc(Module* module, const char* source, int tapeSize = 300)
{
    // Some useful types and constants
    const Type* voidType = Type::getVoidTy(getGlobalContext());
    const IntegerType* cellType = IntegerType::get(getGlobalContext(), 8);
    const IntegerType* indexType = IntegerType::get(getGlobalContext(), 32);
    const PointerType* tapeType = PointerType::get(cellType, 0);
    Value* zero = ConstantInt::get(cellType, 0);
    Value* one = ConstantInt::get(cellType, 1);
    Value* minOne = ConstantInt::get(cellType, -1);

    //declare i32 @getchar()
    Function* getchar = cast<Function>(module->getOrInsertFunction("getchar", cellType, NULL));
    getchar->setCallingConv(CallingConv::C);

    //declare i32 @putchar(i32)
    Function* putchar = cast<Function>(module->
    getOrInsertFunction("putchar", voidType, cellType, NULL));
    putchar->setCallingConv(CallingConv::C);

    // Contruct void main(char* tape)
    Function* main = cast<Function>(module->getOrInsertFunction("main", voidType, NULL));
    main->setCallingConv(CallingConv::C);
    BasicBlock* block = BasicBlock::Create(getGlobalContext(), "code", main);
    stack<MyLoopInfo> loops;
    IRBuilder<> codeIR(block);
    Value* head = codeIR.CreateAlloca(cellType, ConstantInt::get(indexType, tapeSize));
    Value* it = head;
    for(int i=0; i < tapeSize; i++)
    {
        codeIR.CreateStore(zero, it);
        it = codeIR.CreateGEP(it, one);
    }
    while(*source)
    {
    IRBuilder<> builder(block);
    switch(*source++)
    {
        case '>': head = builder.CreateGEP(head, one); break;
        case '<': head = builder.CreateGEP(head, minOne); break;
        case '+':
        {
            Value* headValue = builder.CreateLoad(head);
            Value* result = builder.CreateAdd(headValue, one);
            builder.CreateStore(result, head);
            break;
        }
        case '-':
        {
            Value* headValue = builder.CreateLoad(head);
            Value* result = builder.CreateSub(headValue, one);
            builder.CreateStore(result, head);
            break;
        }
        case '.':
        {
            Value* output = builder.CreateLoad(head);
            builder.CreateCall(putchar, output);
            break;
        }
        case ',':
        {
            Value* input = builder.CreateCall(getchar);
            builder.CreateStore(input, head);
            break;
        }
        case '[':
        {
            // Construct loop info
            MyLoopInfo loop;
            loop.beforeBlock = block;
            loop.startBlock = BasicBlock::Create(getGlobalContext(), "", main);
            loop.afterBlock = BasicBlock::Create(getGlobalContext(), "", main);
            loop.beforeValue = head;

            // Create branching instructions
            Value* headValue = builder.CreateLoad(head);
            Value* condition = builder.CreateIsNotNull(headValue);
            builder.CreateCondBr(condition, loop.startBlock, loop.afterBlock);

            // Create a phi node
            IRBuilder<> sbuilder(loop.startBlock);
            loop.startValue = sbuilder.CreatePHI(tapeType);
            loop.startValue->addIncoming(loop.beforeValue, loop.beforeBlock);

            // Push the loop
            loops.push(loop);
            block = loop.startBlock;
            head = loop.startValue;
            break;
        }
        case ']':
        {
            // Retrieve the loop info
            MyLoopInfo loop = loops.top(); loops.pop();
            loop.endValue = head;
            loop.endBlock = block;

            // Create a conditional branch
            Value* headValue = builder.CreateLoad(head);
            Value* condition = builder.CreateIsNotNull(headValue);
            builder.CreateCondBr(condition, loop.startBlock, loop.afterBlock);

            // Augement loops phi node
            loop.startValue->addIncoming(loop.endValue, loop.endBlock);

            // Switch to the after block
            block = loop.afterBlock;

            // Create a phi node
            IRBuilder<> abuilder(block);
            PHINode* headPhi = abuilder.CreatePHI(tapeType);
            headPhi->addIncoming(loop.beforeValue, loop.beforeBlock);
            headPhi->addIncoming(loop.endValue, loop.endBlock);
            head = headPhi;
            break;
        }
        default:
            break;
        }
    }

    // Close the function
    {
        IRBuilder<> builder(block);
        builder.CreateRetVoid();
    }
    return main;
}

int main(int argc, char* argv[])
{
    if(argc < 2)
    {
        cerr << "Usage: " << argv[0] << " bf_file" << endl;
        return -1;
    }
    ifstream sourceFile(argv[1]);
    string line, source;
    while(getline(sourceFile, line)) source += line;

    // Setup a module and engine for JIT-ing
    std::string error;
    InitializeNativeTarget();
    Module* module = new Module("bfcode", getGlobalContext());
    ExecutionEngine *engine = EngineBuilder(module)
        .setErrorStr(&error)
        .setOptLevel(CodeGenOpt::Aggressive)
        .create();
    if(!engine)
    {
        cout << "No engine created: " << error << endl;
        return -1;
    }

    // Compile the BF to IR
    cout << "Parsing..." << flush;
    Function* func = makeFunc(module, source.c_str());
    cout << " done" << endl;
    // cout << *module << endl;

    // Run optimization passes
    cout << "Optimizing..." << flush;
    FunctionPassManager pm(new ExistingModuleProvider(module));
    pm.add(new TargetData(*(engine->getTargetData())));
    pm.add(createVerifierPass());

    // Eliminate simple loops such as [>>++<<-]
    pm.add(createInstructionCombiningPass()); // Cleanup for scalarrepl.
    pm.add(createLICMPass());                 // Hoist loop invariants
    pm.add(createIndVarSimplifyPass());       // Canonicalize indvars
    pm.add(createLoopDeletionPass());         // Delete dead loops

    // Simplify code
    for(int repeat=0; repeat < 3; repeat++)
    {
        pm.add(createGVNPass());                  // Remove redundancies
        pm.add(createSCCPPass());                 // Constant prop with SCCP
        pm.add(createCFGSimplificationPass());    // Merge & remove BBs
        pm.add(createInstructionCombiningPass());
        pm.add(createCondPropagationPass());      // Propagate conditionals
        pm.add(createAggressiveDCEPass());        // Delete dead instructions
        pm.add(createCFGSimplificationPass());    // Merge & remove BBs
        pm.add(createDeadStoreEliminationPass()); // Delete dead stores
    }

    // Process
    pm.run(*func);
    cout << "done" << endl;
    cout << *module << endl;

    // Compile ...
    cout << "Compiling..." << flush;
    void (*bf)() = (void (*)())engine->getPointerToFunction(func);
    cout << " done" << endl;

    // ... and run!
    bf();

    return 0;
}

Compile with:

Note: The order of the arguments is important! The source file must come before the llvm-config part or the linking will fail! I lost about a hour on this non-intuitive behaviour.

Awesome! This is exactly what I’ve been looking for.

— B L 2011-03-03