512 bit division

In the previous posts, I introduced the Chinese Remainder Theorem and used it to build a full multiplication function. In this post, I will introduce five small tricks and use them to build a simple division algorithm.

Find the Largest Power of Two Divisor of a Given Number

Given a number $a$. Find the largest power of two, $r = 2^n$, that divides a given number. Equivalently, find a number $r$ such that $\frac ar$ is an exact division and its quotient is an odd number.

The answer is -a & a. It is hard to explain mathematically because it combines arithmetic with bitwise operations. But an example in binary makes it intuitively clear:

$$ \begin{aligned} \dots\mathtt{00000110101101000} && a \\ \dots\mathtt{11111001010011000} && -a \\ \dots\mathtt{00000000000001000} && a\ \& -a \\ \end{aligned} $$

The key insight is that negation flips all high bits of a number up to but not including the least significant one. A simple and then removes all the flipped bits and we are left with only the least significant one.

For completeness, I’ll present it in Solidity as well:

contract Div512 {
    function twoDivisor(uint256 a) internal pure returns (uint256 r) {
        r = -a & a;
    }
}

Compute the largest power of two dividing a number

For zero there is no divisor, and the function returns zero. For odd numbers, it returns one. This makes sense as it is the largest power of two ($1 = 2^0$) that divides the number.

Divide $2^{256}$ by a Given Number

The number $2^{256}$ is fundamental to working with the Ethereum virtual machine, so it’s useful to do some tricks with it. It can not be expressed directly, but we can express fractions of it. For a given number $a$, we can compute $2^{256}$ divided by $a$. For this I found a trick involving negated numbers:

$$ \floor{\frac{2^{256}}{a}} = \floor{\frac{2^{256} - a}{a} + 1} = \floor{\frac{\mod{-a}_{2^{256}}}{a}} + 1 $$

Implemented in Solidity and optimized, this becomes:

contract Div512 {
    function div256(uint256 a) internal pure returns (uint256 r) {
        require(a > 1);
        assembly {
            r := add(div(sub(0, a), a), 1)
        }
    }
}

Divide $2^{256}$ by a given number

For the answer to be defined and fit in a single word, a needs to be larger than one. The assembly is to avoid a redundant zero check on division.

Compute $2^{256}$ Modulo a Given Number

Like the previous, we can also compute the remainder of 2^{256} divided by a instead of the quotient. The same trick with negation works:

$$ \mod{2^{256}}_a = \mod{2^{256} - a}_a = \mod{\mod{-a}_{2^{256}}}_a $$

Again, in Solidity:

contract Div512 {
    function mod256(uint256 a) internal pure returns (uint256 r) {
        require(a != 0);
        assembly {
            r := mod(sub(0, a), a)
        }
    }
}

Remainder of $2^{256}$ divided by a given number

The answer always exists for non-zero a and is strictly less than $a$. The assembly is to avoid a redundant zero check.

Add / Subtract Two 512-bit Numbers

The multiplication algorithm from before produces 512-bit numbers. To add two such numbers we can use an expression like:

contract Div512 {
    function add512(uint256 a0, uint256 a1, uint256 b0, uint256 b1)
    internal pure returns (uint256 r0, uint256 r1) {
        r0 = a0 + b0;
        r1 = a1 + b1 + (r0 < a0 ? 1 : 0);
    }
}

Simple 512-bit addition

The second line has an extra term that checks for carry and adds it. It takes about 65 gas. The Solidity compiler (v. 0.4.18) does not produce optimal code here. The ternary expression turns in to a branch. We can avoid this. Observe the yellow paper definition of the LT, the less-than instruction:

$$ \mathtt{lt}(a, b) = \begin{cases} 1 & a < b \\ 0 & \text{otherwise} \end{cases} $$

This function already returns zero or one exactly as we want it. We can thus add it directly. Casting a boolean to a number is not allowed in Solidity, so we need to switch to assembler again:

contract Div512 {
    function add512(uint256 a0, uint256 a1, uint256 b0, uint256 b1)
    internal pure returns (uint256 r0, uint256 r1) {
        assembly {
            r0 := add(a0, b0)
            r1 := add(add(a1, b1), lt(r0, a0))
        }
    }
}

Add two 512 bit numbers

This function takes only 10 gas (when measured by comparing to a function returning zero). Similarly for subtraction:

contract Div512 {
    function sub512(uint256 a0, uint256 a1, uint256 b0, uint256 b1)
    internal pure returns (uint256 r0, uint256 r1) {
        assembly {
            r0 := sub(a0, b0)
            r1 := sub(sub(a1, b1), lt(a0, b0))
        }
    }
}

Subtract 512-bit numbers

Dividing a 512-bit Number by a 256-bit Number

To show the usefulness of the above functions, I’ll make a simple division algorithm. We want to build a 512-by-256-bit division function, that is, we want to compute x such that:

$$ x = \floor{\frac{a_1 ⋅ 2^{256} + a_0}{b}} $$

Where $a_0$ and $a_1$ are the lower and higher half of the 512-bit number and $b$ is the 256-bit number. The result, $x$, will also be 512-bit. So it is also split into its low and high bits $x_0$ and $x_1$.

To solve it, first, we observe the following. With the div256 and mod256 functions we can compute two numbers q = div256(b) and r = mod256(b). These then satisfy:

$$ 2^{256} = q ⋅ b + r $$

If we substitute this identity in the fraction, we can split out a term:

$$ \floor{\frac{a_1 ⋅ \p{q ⋅ c + r} + a_0}{c}} = \floor{\frac{a_1 ⋅ r + x_0}{c}} + a_1 ⋅ q $$

Because $r$ is smaller than $2^{256}$, the new numerator is smaller than the one we had before — we made some progress. We can keep repeating this process. Each time splitting of a term and computing the new numerator. Eventually, we reach a point where the new $x_1$ is zero. When that happens we do a regular division of $x_0$ by $c$. The final result is the sum of all the $x_1 · q$ terms and the final division. This creates an elegant algorithm:

contract Div512 {
    function div512(uint256 a0, uint256 a1, uint256 b)
    internal pure returns (uint256 x0, uint256 x1) {
        uint256 q = div256(b);
        uint256 r = mod256(b);
        while (a1 != 0) {
            (t0, t1) = mul512(a1, q);
            (x0, x1) = add512(x0, x1, t0, t1);
            (t0, t1) = mul512(a1, r);
            (a0, a1) = add512(t0, t1, a0, 0);
        }
        (x0, x1) = add512(r0, r1, a0 / b, 0);
    }
}

512-by-256-division for $b > 1$

How many iterations does the loop take? In the worst case is $r$ is as large as possible because then we ‘scale back’ $x_1$ by the smallest amount. The largest $r$ can get is $2^{255} - 1$, when $c$ equals $2^{255} + 1$. This means $x_1$ is at least halved every iteration. Since $x_1$ is 256-bit, it takes at most 256 iterations.

As presented, it requires $b > 1$. A special case for $b = 1$ is easily added, just return the input $(a_0, a_1)$.

I did a bit of searching, but could not find this approach to division in the literature; a division algorithm where the radix is rewritten in terms of the denominator. Please tell me if you know one.

With these five new tools, we are developing a nice toolbox. We can already tackle some interesting problems. The division algorithm shown here is not optimized because there are better approaches. The really good ones require one more tool, modular inverses, which I will present in the next post.

Remco Bloemen
Math & Engineering
https://2π.com