2010-02-27 (updated 2014-03-04)

Optimizing brainfuck compiler using LLVM

% Remco Bloemen % 2010-02-27, last updated 2014-03-04

Optimizing brainfuck compiler using LLVM

LLVM is currently the hype in the world of compilers. This C++ based compiler infrastruture is hard on its way to replace the Gnu Compiler Collection. In contrast with GCC LLVM has a nice C++ interface wich allows you to easily implement your own language, or so I thought.

It turns out that LLVM is still somewhat of a moving target and examples from previous versions no longer compile. It was quite a struggle to get the simplest example to work since the API is very unintuitive and complex at first.

In the end I managed to get it working.

Then I played with it some more. Trying out other ways to translate brainfuck to LLVM IR.

In the end I found that I could get remarkable optimzation results. The key was:

enabled some complex loop optimization passes
allocated the brainfuck memory on the stack
do not create a loop to initialize the memory, but explicittly set every cell

If you have done all that, then the compiler is capable of compiling the following brainfuck "Hello Wordl!"-program:

>+++++++++[<++++++++>-]<.>+++++++[<++++>-]<+.+++++++..+++.>>>
++++++++[<++++>-]<.>>>++++++++++[<+++++++++>-]<---.<<<<.+++
.------.--------.>>+.

into

define void @main() {
code:
  call void @putchar(i8 72)
  call void @putchar(i8 101)
  call void @putchar(i8 108)
  call void @putchar(i8 108)
  call void @putchar(i8 111)
  call void @putchar(i8 32)
  call void @putchar(i8 87)
  call void @putchar(i8 111)
  call void @putchar(i8 114)
  call void @putchar(i8 108)
  call void @putchar(i8 100)
  call void @putchar(i8 33)
  ret void
}

which is optimal! All the loops and operations are elliminated! Compare that with the esotope compiler!

Benchmark!

Okay, now that I have my super-optimizing llvm-based brainfuck compiler it is time to benchmark it against the direct-threaded virtual machine I developed earlier.

The direct threaded VM has two modes; non-optimizing where it executes brainfuck instructions as-is and optimizing where it replaces certain sequences of instructions with more efficient onces.

The llvm jit-compiler has a broad selection of optimization passes that can be chained in uncountable ways. After a lot of trial and error I found the chain presented in the source below. In the benchmark I also included no optimization passes, only an instruction combining pass, the default selection for amd64, the aggressive default selection for amd64 and my own extensive chain.

The first benchmark program, Bottles, prints the lyric to "99 bottles of beer". Developed by Andrew Paczkowski. origin.

The second, Hanoi, solves the three peg nine disk towers of Hanoi problem and animates the solution with ascii art. Developed by Clifford Wolf. origin

Last, Mandelbrot, calculates a 512×192 mandelbrot fractal and displays it with ascii characters. Developed by Erik Bosman. origin

Virtual machine Optimization Bottles Hanoi Mandelbrot

direct threaded none 0.008 29.647 1978.041 direct threaded simple 0.017 1.091 273.033 llvm jit none 0.129 11.731 111.611 llvm jit combining pass 0.121 7.245 52.894 llvm jit default 0.157 9.491 42.440 llvm jit aggressive 0.174 9.618 42.048 llvm jit extensive 0.375 50.141 37.745

The results are in seconds and include everything from starting the compiler until the exit of the brainfuck program.

One result stands out: the extensively optimized hanoi takes a lot of time! Well, actually, the optimization takes a lot of time, the resulting program executes in less than a second. Further analysis (using callgrind, callgraph shown below) showed that all the time was spend in the "Loop Invariant Code Motion" pass. This pass is largely responsible for the incredible optimization result shown earlier. I suspect that LLVM was also able to reduce hanoi to a near minimal set of output instructions, although the resulting code is too messy to confirm this.

Hanoi case callgraph

Source!

As always, here is the source:

#include <stack>
#include <fstream>
#include <llvm/Module.h>
#include <llvm/Function.h>
#include <llvm/PassManager.h>
#include <llvm/CallingConv.h>
#include <llvm/Analysis/Verifier.h>
#include <llvm/Support/IRBuilder.h>
#include <llvm/ExecutionEngine/ExecutionEngine.h>
#include <llvm/Target/TargetData.h>
#include <llvm/ModuleProvider.h>
#include <llvm/LinkAllPasses.h>
#include <llvm/ExecutionEngine/JIT.h>
#include <llvm/Target/TargetSelect.h>
using namespace std;
using namespace llvm;

struct MyLoopInfo
{
	Value* beforeValue;
	PHINode* startValue;
	Value* endValue;
	Value* afterValue;

	BasicBlock* beforeBlock;
	BasicBlock* startBlock;
	BasicBlock* endBlock;
	BasicBlock* afterBlock;
};

Function* makeFunc(Module* module, const char* source, int tapeSize = 300)
{
	// Some useful types and constants
	const Type* voidType = Type::getVoidTy(getGlobalContext());
	const IntegerType* cellType = IntegerType::get(getGlobalContext(), 8);
	const IntegerType* indexType = IntegerType::get(getGlobalContext(), 32);
	const PointerType* tapeType = PointerType::get(cellType, 0);
	Value* zero = ConstantInt::get(cellType, 0);
	Value* one = ConstantInt::get(cellType, 1);
	Value* minOne = ConstantInt::get(cellType, -1);

	//declare i32 @getchar()
	Function* getchar = cast<Function>(module->getOrInsertFunction("getchar", cellType, NULL));
	getchar->setCallingConv(CallingConv::C);

	//declare i32 @putchar(i32)
	Function* putchar = cast<Function>(module->
	getOrInsertFunction("putchar", voidType, cellType, NULL));
	putchar->setCallingConv(CallingConv::C);

	// Contruct void main(char* tape)
	Function* main = cast<Function>(module->getOrInsertFunction("main", voidType, NULL));
	main->setCallingConv(CallingConv::C);
	BasicBlock* block = BasicBlock::Create(getGlobalContext(), "code", main);
	stack<MyLoopInfo> loops;
	IRBuilder<> codeIR(block);
	Value* head = codeIR.CreateAlloca(cellType, ConstantInt::get(indexType, tapeSize));
	Value* it = head;
	for(int i=0; i < tapeSize; i++)
	{
		codeIR.CreateStore(zero, it);
		it = codeIR.CreateGEP(it, one);
	}
	while(*source)
	{
	IRBuilder<> builder(block);
	switch(*source++)
	{
		case '>': head = builder.CreateGEP(head, one); break;
		case '<': head = builder.CreateGEP(head, minOne); break;
		case '+':
		{
			Value* headValue = builder.CreateLoad(head);
			Value* result = builder.CreateAdd(headValue, one);
			builder.CreateStore(result, head);
			break;
		}
		case '-':
		{
			Value* headValue = builder.CreateLoad(head);
			Value* result = builder.CreateSub(headValue, one);
			builder.CreateStore(result, head);
			break;
		}
		case '.':
		{
			Value* output = builder.CreateLoad(head);
			builder.CreateCall(putchar, output);
			break;
		}
		case ',':
		{
			Value* input = builder.CreateCall(getchar);
			builder.CreateStore(input, head);
			break;
		}
		case '[':
		{
			// Construct loop info
			MyLoopInfo loop;
			loop.beforeBlock = block;
			loop.startBlock = BasicBlock::Create(getGlobalContext(), "", main);
			loop.afterBlock = BasicBlock::Create(getGlobalContext(), "", main);
			loop.beforeValue = head;

			// Create branching instructions
			Value* headValue = builder.CreateLoad(head);
			Value* condition = builder.CreateIsNotNull(headValue);
			builder.CreateCondBr(condition, loop.startBlock, loop.afterBlock);

			// Create a phi node
			IRBuilder<> sbuilder(loop.startBlock);
			loop.startValue = sbuilder.CreatePHI(tapeType);
			loop.startValue->addIncoming(loop.beforeValue, loop.beforeBlock);

			// Push the loop
			loops.push(loop);
			block = loop.startBlock;
			head = loop.startValue;
			break;
		}
		case ']':
		{
			// Retrieve the loop info
			MyLoopInfo loop = loops.top(); loops.pop();
			loop.endValue = head;
			loop.endBlock = block;

			// Create a conditional branch
			Value* headValue = builder.CreateLoad(head);
			Value* condition = builder.CreateIsNotNull(headValue);
			builder.CreateCondBr(condition, loop.startBlock, loop.afterBlock);

			// Augement loops phi node
			loop.startValue->addIncoming(loop.endValue, loop.endBlock);

			// Switch to the after block
			block = loop.afterBlock;

			// Create a phi node
			IRBuilder<> abuilder(block);
			PHINode* headPhi = abuilder.CreatePHI(tapeType);
			headPhi->addIncoming(loop.beforeValue, loop.beforeBlock);
			headPhi->addIncoming(loop.endValue, loop.endBlock);
			head = headPhi;
			break;
		}
		default:
			break;
		}
	}

	// Close the function
	{
		IRBuilder<> builder(block);
		builder.CreateRetVoid();
	}
	return main;
}

int main(int argc, char* argv[])
{
	if(argc < 2)
	{
		cerr << "Usage: " << argv[0] << " bf_file" << endl;
		return -1;
	}
	ifstream sourceFile(argv[1]);
	string line, source;
	while(getline(sourceFile, line)) source += line;

	// Setup a module and engine for JIT-ing
	std::string error;
	InitializeNativeTarget();
	Module* module = new Module("bfcode", getGlobalContext());
	ExecutionEngine *engine = EngineBuilder(module)
		.setErrorStr(&error)
		.setOptLevel(CodeGenOpt::Aggressive)
		.create();
	if(!engine)
	{
		cout << "No engine created: " << error << endl;
		return -1;
	}

	// Compile the BF to IR
	cout << "Parsing..." << flush;
	Function* func = makeFunc(module, source.c_str());
	cout << " done" << endl;
	// cout << *module << endl;

	// Run optimization passes
	cout << "Optimizing..." << flush;
	FunctionPassManager pm(new ExistingModuleProvider(module));
	pm.add(new TargetData(*(engine->getTargetData())));
	pm.add(createVerifierPass());

	// Eliminate simple loops such as [>>++<<-]
	pm.add(createInstructionCombiningPass()); // Cleanup for scalarrepl.
	pm.add(createLICMPass());                 // Hoist loop invariants
	pm.add(createIndVarSimplifyPass());       // Canonicalize indvars
	pm.add(createLoopDeletionPass());         // Delete dead loops

	// Simplify code
	for(int repeat=0; repeat < 3; repeat++)
	{
		pm.add(createGVNPass());                  // Remove redundancies
		pm.add(createSCCPPass());                 // Constant prop with SCCP
		pm.add(createCFGSimplificationPass());    // Merge & remove BBs
		pm.add(createInstructionCombiningPass());
		pm.add(createCondPropagationPass());      // Propagate conditionals
		pm.add(createAggressiveDCEPass());        // Delete dead instructions
		pm.add(createCFGSimplificationPass());    // Merge & remove BBs
		pm.add(createDeadStoreEliminationPass()); // Delete dead stores
	}

	// Process
	pm.run(*func);
	cout << "done" << endl;
	cout << *module << endl;

	// Compile ...
	cout << "Compiling..." << flush;
	void (*bf)() = (void (*)())engine->getPointerToFunction(func);
	cout << " done" << endl;

	// ... and run!
	bf();

	return 0;
}

Compile with:

g++ ./main.cpp `llvm-config --cppflags --ldflags --libs core jit native all` -o ./test

Note: The order of the arguments is important! The source file must come before the llvm-config part or the linking will fail! I lost about a hour on this non-intuitive behaviour.

Awesome! This is exactly what I’ve been looking for.

— B L 2011-03-03