Full Multiplication

A lot of smart contracts use the SafeMath library. It prevents contracts from having incorrect results, but it does so by failing transactions instead of making them correct. Let’s instead try to do the math correctly. In this series, I will derive some advanced techniques. Today, I’ll make a better safeMul.

If you multiply two numbers, the result will be a number twice the size. In Ethereum, when you multiply two numbers, the result can be up to 512 bits. But Ethereum only gives you the lower half; it simply ignores the rest. This is a common practice in mathematics called modular arithmetic.

However, ignoring numbers is not an acceptable practice in accounting. Care needs to be taken to avoid it or someone will lose something valuable. A popular library called SafeMath detects when it happens and then fails the transaction. But what if you do not want your transaction to fail?

What if you want to multiply any numbers and have the complete result?

Spoiler alert: this is snippet of Solidity code will do that for you:

contract FullMul {
    function mul512(uint256 a, uint256 b)
    public pure returns(uint256 r0, uint256 r1) {
        assembly {
            let mm := mulmod(a, b, not(0))
            r0 := mul(a, b)
            r1 := sub(sub(mm, r0), lt(mm, r0))
        }
    }
}

Optimized full 512 bit multiplication in Solidity.

But before we get into that, let’s define the problem precisely: We have two unsigned numbers $a$ and $b$, both 256 bits in length and we want their product, a 512-bit number $x$.

$$ x = a ⋅ b $$

Since this number is too large to be represented directly in code, we split it up into the least significant and most significant 256 bits, $r_0$ and $r_1$ respectively:

$$ \begin{aligned} r_0 &= \mod{x}_{2^{256}} & r_1 &= \floor{\frac{x}{2^{256}}} \end{aligned} $$

Where the square brackets with subscript represent the modulo operation and the lower-half brackets on the right represent the floor operation.

Schoolbook algorithm

The classical way of solving this problem is by long multiplication, the method we all learned in school. You split your large number into decimals, multiply the digits, and then add the intermediate results. This method also works in binary and other bases. Let me quickly show you how you would use it here:

Since we have 256 bit multiply build in, we can multiply any two 128 bit numbers and get the full result. So if we split our large number into groups of 128 bits we can compute all their products. Take $a_0$ and $a_1$ to respectively mean the least significant and most significant 128 bits of $a$, similarly for $b$:

$$ \begin{aligned} a_0 &= \mod{a}\_{2^{128}} & a_1 &= \floor{\frac{a}{2^{128}}} \\\\ b_0 &= \mod{b}\_{2^{128}} & b_1 &= \floor{\frac{b}{2^{128}}} \\\\ \end{aligned} $$

Now the original numbers a and b can be written as:

$$ \begin{aligned} a &= a_1 ⋅2^{128} + a_0 \\\\ b &= b_1 ⋅2^{128} + b_0 \\\\ \end{aligned} $$

If we substitute this in product equation it becomes:

$$ x = a_1 ⋅ b_1 ⋅2^{256} + (a_0 ⋅ b_1 + a_1 ⋅ b_0) ⋅ 2^{128} + a_0 ⋅ b_0 $$

Ignoring the constants, we now have four multiplications instead of one. But all four of them involve numbers less than $2^{128}$ that can be computed directly. The result is still too large, so we still need two numbers $r_0$ and $r_1$ to represent it. I will skip the steps of how to get $r_0$ and $r_1$ from this expression. It is straightforward, but annoying because of the shifts and carries. The final result is:

contract FullMul {
    uint256 constant H = 2**128;

    function mul512(uint256 a, uint256 b)
    public pure returns(uint256 r0, uint256 r1) {
        // Split in groups of 128 bit
        uint256 a0 = a % H;
        uint256 a1 = a / H;
        uint256 b0 = b % H;
        uint256 b1 = b / H;

        // Compute 256 bit intermediate products
        uint256 i00 = a0 * b0;
        uint256 i01 = a0 * b1;
        uint256 i10 = a1 * b0;
        uint256 i11 = a1 * b1;

        // Split results in (shifted) groups of 128 bit
        uint256 i010 = i01 * H; // Shifted up
        uint256 i011 = i01 / H;
        uint256 i100 = i10 * H; // Shifted up
        uint256 i101 = i10 / H;

        // Add all intermediate terms, taking care of overflow
        r0 = i00;
        r1 = i11 + i011 + i101;
        r0 += i010;
        if (r0 < i010) {
            r1 += 1;
        }
        r0 += i100;
        if (r0 < i100) {
            r1 += 1;
        }
    }
}

Schoolbook algorithm for 512 bit multiplication.

(Note that Solidity, as of 0.4.18, actually fails to compile the above example because the compiler can not handle that many local variables. This is easily solved by inlining some expressions, but since it reduces readability I opted not to do that for this example.)

The two multiplications for i01 and i10 can be replaced by one using the Karatsuba algorithm, at the expense of a few more additions. Since additions are 3 gas and multiplications only 5, this is not worth it. But if you want to do larger multiplications (say 4096 bit) it is worth looking into these methods.

We have now solved the problem using two modulo operations, four divisions, six additions, two conditional branches, and no less than six multiplications. The entire function takes a bit over 300 gas. This is not bad, but the gas cost is almost two orders of magnitude larger than the 5 gas for a regular multiplication, or 90 for a standard safeMul.

We can do a lot better.

Chinese Remainder

So here’s the trick: We use the rather obscure mulmod instruction and the Chinese Remainder Theorem. In short, the theorem states thati f we know a number modulo $2^{256}$ and $2^{256} - 1$, we can compute its 512-bit representation cheaply. The function to do this, chineseRemainder, is described in a previous post. To use it here, we first need to compute our product in the two moduli:

$$ \begin{aligned} x_0 &= \mod{a ⋅ b}\_{2^{256}} & x_1 &= \mod{a ⋅ b}\_{2^{256} -1 } \end{aligned} $$

The first one, $x_0$ is just a regular multiply, as it already truncates to 256 bits. The second one, $x_1$, can be computed directly using a single mulmod operation. This is a rather unknown opcode that computes:

$$ \mathtt{mulmod}(a, b, c) = \mod{a ⋅ b}_{c} $$

Put this together, and we have our new mul512 function:

contract FullMul {
    uint256 constant M1 = 2**256 - 1;

    function mul512(uint256 a, uint256 b)
    public pure returns(uint256 r0, uint256 r1) {
        uint256 x0 = a * b;
        uint256 x1 = mulmod(a, b, M1);
        (r0, r1) = chineseRemainder(x0, x1);
    }
}

512-bit multiplication.

The Solidity compiler, as of version 0.4.18, does not produce very optimal code here. The chineseRemainder function is so tiny it is not worth the call-overhead, so it should be inlined, but the compiler doesn’t do this. The compiler does recognize that M1 can be expressed efficiently as not(0). Manually inlining results in an efficient multiplication function:

contract FullMul {
    uint256 constant M1 = 2**256 - 1;

    function mul512(uint256 a, uint256 b)
    public pure returns(uint256 r0, uint256 r1) {
        assembly {
            let mm := mulmod(a, b, not(0))
            r0 := mul(a, b)
            r1 := sub(sub(mm, r0), lt(mm, r0))
        }
    }
}

Optimized full 512 bit multiplication in Solidity.

It is our chineseRemainder function (two subs and one lt) with a mul and mulmod added. We use assembly to avoid an unnecessary branch. The total gas cost is about 60 gas, compared to 5 for a normal multiply and 90 for a standard safeMul. In fact, it is slightly cheaper to use mul512 and check that r1 is zero than it is to use safeMul! Conclusion

It is possible to do full precision never-overflowing multiplication in the EVM for less gas than a regular safeMul. This is a good starting point for smart contracts do not want to reject transaction just because an intermediate value overflows.

In the next post, I will introduce a number of simple utility functions for dealing with large numbers. This builds up to a very popular function that nobody has yet implement correctly for all cases. Stay tuned!

Remco Bloemen
Math & Engineering
https://2π.com