Optimizing brainfuck compiler using LLVM
% Remco Bloemen % 2010-02-27, last updated 2014-03-04
Optimizing brainfuck compiler using LLVM
LLVM is currently the hype in the world of compilers. This C++ based compiler infrastruture is hard on its way to replace the Gnu Compiler Collection. In contrast with GCC LLVM has a nice C++ interface wich allows you to easily implement your own language, or so I thought.
It turns out that LLVM is still somewhat of a moving target and examples from previous versions no longer compile. It was quite a struggle to get the simplest example to work since the API is very unintuitive and complex at first.
In the end I managed to get it working.
Then I played with it some more. Trying out other ways to translate brainfuck to LLVM IR.
In the end I found that I could get remarkable optimzation results. The key was:
-
enabled some complex loop optimization passes
-
allocated the brainfuck memory on the stack
-
do not create a loop to initialize the memory, but explicittly set every cell
If you have done all that, then the compiler is capable of compiling the following brainfuck "Hello Wordl!"-program:
>+++++++++[<++++++++>-]<.>+++++++[<++++>-]<+.+++++++..+++.>>>
++++++++[<++++>-]<.>>>++++++++++[<+++++++++>-]<---.<<<<.+++
.------.--------.>>+.
into
define void @main() {
code:
call void @putchar(i8 72)
call void @putchar(i8 101)
call void @putchar(i8 108)
call void @putchar(i8 108)
call void @putchar(i8 111)
call void @putchar(i8 32)
call void @putchar(i8 87)
call void @putchar(i8 111)
call void @putchar(i8 114)
call void @putchar(i8 108)
call void @putchar(i8 100)
call void @putchar(i8 33)
ret void
}
which is optimal! All the loops and operations are elliminated! Compare that with the esotope compiler!
Benchmark!
Okay, now that I have my super-optimizing llvm-based brainfuck compiler it is time to benchmark it against the direct-threaded virtual machine I developed earlier.
The direct threaded VM has two modes; non-optimizing where it executes brainfuck instructions as-is and optimizing where it replaces certain sequences of instructions with more efficient onces.
The llvm jit-compiler has a broad selection of optimization passes that can be chained in uncountable ways. After a lot of trial and error I found the chain presented in the source below. In the benchmark I also included no optimization passes, only an instruction combining pass, the default selection for amd64, the aggressive default selection for amd64 and my own extensive chain.
The first benchmark program, Bottles, prints the lyric to "99 bottles of beer". Developed by Andrew Paczkowski. origin.
The second, Hanoi, solves the three peg nine disk towers of Hanoi problem and animates the solution with ascii art. Developed by Clifford Wolf. origin
Last, Mandelbrot, calculates a 512×192 mandelbrot fractal and displays it with ascii characters. Developed by Erik Bosman. origin
Virtual machine Optimization Bottles Hanoi Mandelbrot
direct threaded none 0.008 29.647 1978.041 direct threaded simple 0.017 1.091 273.033 llvm jit none 0.129 11.731 111.611 llvm jit combining pass 0.121 7.245 52.894 llvm jit default 0.157 9.491 42.440 llvm jit aggressive 0.174 9.618 42.048 llvm jit extensive 0.375 50.141 37.745
The results are in seconds and include everything from starting the compiler until the exit of the brainfuck program.
One result stands out: the extensively optimized hanoi takes a lot of time! Well, actually, the optimization takes a lot of time, the resulting program executes in less than a second. Further analysis (using callgrind, callgraph shown below) showed that all the time was spend in the "Loop Invariant Code Motion" pass. This pass is largely responsible for the incredible optimization result shown earlier. I suspect that LLVM was also able to reduce hanoi to a near minimal set of output instructions, although the resulting code is too messy to confirm this.
Source!
As always, here is the source:
#include <stack>
#include <fstream>
#include <llvm/Module.h>
#include <llvm/Function.h>
#include <llvm/PassManager.h>
#include <llvm/CallingConv.h>
#include <llvm/Analysis/Verifier.h>
#include <llvm/Support/IRBuilder.h>
#include <llvm/ExecutionEngine/ExecutionEngine.h>
#include <llvm/Target/TargetData.h>
#include <llvm/ModuleProvider.h>
#include <llvm/LinkAllPasses.h>
#include <llvm/ExecutionEngine/JIT.h>
#include <llvm/Target/TargetSelect.h>
using namespace std;
using namespace llvm;
struct MyLoopInfo
{
Value* beforeValue;
PHINode* startValue;
Value* endValue;
Value* afterValue;
BasicBlock* beforeBlock;
BasicBlock* startBlock;
BasicBlock* endBlock;
BasicBlock* afterBlock;
};
Function* makeFunc(Module* module, const char* source, int tapeSize = 300)
{
// Some useful types and constants
const Type* voidType = Type::getVoidTy(getGlobalContext());
const IntegerType* cellType = IntegerType::get(getGlobalContext(), 8);
const IntegerType* indexType = IntegerType::get(getGlobalContext(), 32);
const PointerType* tapeType = PointerType::get(cellType, 0);
Value* zero = ConstantInt::get(cellType, 0);
Value* one = ConstantInt::get(cellType, 1);
Value* minOne = ConstantInt::get(cellType, -1);
//declare i32 @getchar()
Function* getchar = cast<Function>(module->getOrInsertFunction("getchar", cellType, NULL));
getchar->setCallingConv(CallingConv::C);
//declare i32 @putchar(i32)
Function* putchar = cast<Function>(module->
getOrInsertFunction("putchar", voidType, cellType, NULL));
putchar->setCallingConv(CallingConv::C);
// Contruct void main(char* tape)
Function* main = cast<Function>(module->getOrInsertFunction("main", voidType, NULL));
main->setCallingConv(CallingConv::C);
BasicBlock* block = BasicBlock::Create(getGlobalContext(), "code", main);
stack<MyLoopInfo> loops;
IRBuilder<> codeIR(block);
Value* head = codeIR.CreateAlloca(cellType, ConstantInt::get(indexType, tapeSize));
Value* it = head;
for(int i=0; i < tapeSize; i++)
{
codeIR.CreateStore(zero, it);
it = codeIR.CreateGEP(it, one);
}
while(*source)
{
IRBuilder<> builder(block);
switch(*source++)
{
case '>': head = builder.CreateGEP(head, one); break;
case '<': head = builder.CreateGEP(head, minOne); break;
case '+':
{
Value* headValue = builder.CreateLoad(head);
Value* result = builder.CreateAdd(headValue, one);
builder.CreateStore(result, head);
break;
}
case '-':
{
Value* headValue = builder.CreateLoad(head);
Value* result = builder.CreateSub(headValue, one);
builder.CreateStore(result, head);
break;
}
case '.':
{
Value* output = builder.CreateLoad(head);
builder.CreateCall(putchar, output);
break;
}
case ',':
{
Value* input = builder.CreateCall(getchar);
builder.CreateStore(input, head);
break;
}
case '[':
{
// Construct loop info
MyLoopInfo loop;
loop.beforeBlock = block;
loop.startBlock = BasicBlock::Create(getGlobalContext(), "", main);
loop.afterBlock = BasicBlock::Create(getGlobalContext(), "", main);
loop.beforeValue = head;
// Create branching instructions
Value* headValue = builder.CreateLoad(head);
Value* condition = builder.CreateIsNotNull(headValue);
builder.CreateCondBr(condition, loop.startBlock, loop.afterBlock);
// Create a phi node
IRBuilder<> sbuilder(loop.startBlock);
loop.startValue = sbuilder.CreatePHI(tapeType);
loop.startValue->addIncoming(loop.beforeValue, loop.beforeBlock);
// Push the loop
loops.push(loop);
block = loop.startBlock;
head = loop.startValue;
break;
}
case ']':
{
// Retrieve the loop info
MyLoopInfo loop = loops.top(); loops.pop();
loop.endValue = head;
loop.endBlock = block;
// Create a conditional branch
Value* headValue = builder.CreateLoad(head);
Value* condition = builder.CreateIsNotNull(headValue);
builder.CreateCondBr(condition, loop.startBlock, loop.afterBlock);
// Augement loops phi node
loop.startValue->addIncoming(loop.endValue, loop.endBlock);
// Switch to the after block
block = loop.afterBlock;
// Create a phi node
IRBuilder<> abuilder(block);
PHINode* headPhi = abuilder.CreatePHI(tapeType);
headPhi->addIncoming(loop.beforeValue, loop.beforeBlock);
headPhi->addIncoming(loop.endValue, loop.endBlock);
head = headPhi;
break;
}
default:
break;
}
}
// Close the function
{
IRBuilder<> builder(block);
builder.CreateRetVoid();
}
return main;
}
int main(int argc, char* argv[])
{
if(argc < 2)
{
cerr << "Usage: " << argv[0] << " bf_file" << endl;
return -1;
}
ifstream sourceFile(argv[1]);
string line, source;
while(getline(sourceFile, line)) source += line;
// Setup a module and engine for JIT-ing
std::string error;
InitializeNativeTarget();
Module* module = new Module("bfcode", getGlobalContext());
ExecutionEngine *engine = EngineBuilder(module)
.setErrorStr(&error)
.setOptLevel(CodeGenOpt::Aggressive)
.create();
if(!engine)
{
cout << "No engine created: " << error << endl;
return -1;
}
// Compile the BF to IR
cout << "Parsing..." << flush;
Function* func = makeFunc(module, source.c_str());
cout << " done" << endl;
// cout << *module << endl;
// Run optimization passes
cout << "Optimizing..." << flush;
FunctionPassManager pm(new ExistingModuleProvider(module));
pm.add(new TargetData(*(engine->getTargetData())));
pm.add(createVerifierPass());
// Eliminate simple loops such as [>>++<<-]
pm.add(createInstructionCombiningPass()); // Cleanup for scalarrepl.
pm.add(createLICMPass()); // Hoist loop invariants
pm.add(createIndVarSimplifyPass()); // Canonicalize indvars
pm.add(createLoopDeletionPass()); // Delete dead loops
// Simplify code
for(int repeat=0; repeat < 3; repeat++)
{
pm.add(createGVNPass()); // Remove redundancies
pm.add(createSCCPPass()); // Constant prop with SCCP
pm.add(createCFGSimplificationPass()); // Merge & remove BBs
pm.add(createInstructionCombiningPass());
pm.add(createCondPropagationPass()); // Propagate conditionals
pm.add(createAggressiveDCEPass()); // Delete dead instructions
pm.add(createCFGSimplificationPass()); // Merge & remove BBs
pm.add(createDeadStoreEliminationPass()); // Delete dead stores
}
// Process
pm.run(*func);
cout << "done" << endl;
cout << *module << endl;
// Compile ...
cout << "Compiling..." << flush;
void (*bf)() = (void (*)())engine->getPointerToFunction(func);
cout << " done" << endl;
// ... and run!
bf();
return 0;
}
Compile with:
g++ ./main.cpp `llvm-config --cppflags --ldflags --libs core jit native all` -o ./test
Note: The order of the arguments is important! The source file must
come before the llvm-config
part or the linking will fail! I lost
about a hour on this non-intuitive behaviour.
Awesome! This is exactly what I’ve been looking for.
— B L 2011-03-03