Merkle Hashing

Dmitry Khovratovich's talk at devcon got me thinking about efficient hashing algorithms for Merkle trees. I'm also influenced by Keccak paper arguments to reduce the intermediate rounds in the Sponge construction.

The basic idea is to only use a cryptographically secure hash function at the leaf and root nodes, and a less secure function on intermediate nodes.

Unary tree Consider a degenerate Merkle tree where the branching factor is one:

$$ \mathrm{Tree}\ a = \mathrm{Leaf}\ a\ |\ \mathrm{Node}\ (\mathrm{Tree}\ a) $$

Tree a  = Leaf a | Node (Tree a)

We can construct a 'tree' hashing operation like so

hash (Node a) = sha3(innerHash a)

innerHash (Leaf a) = sha3(a)

innerHash (Node a) = innerHash(a) + 1

(assume all values in a sufficiently large field)

This scheme is essentially the same as sha3(sha3( leaf ) + length) which is secure. Yet it only requires two sha3 operations instead of O(nodes).

Fixed depth binary tree. Now let's try to generalization this to fixed depth binary trees. The tree structure becomes

$$ \mathrm{Tree}_d\ a = \mathrm{Leaf}_0\ a\ |\ \mathrm{Node}_d\ (\mathrm{Tree}_{d-1}\ a)\ (\mathrm{Tree}_{d-1}\ a) $$

The suffices are there to pass the depth of the tree along. They make it easier to form a closed form expression.

hash (Node d a) = sha3(innerHash a)

innerHash (Leaf d a) = sha3(a)

innerHash (Node d a b) = prime[d] * innerHash(a) + innerHash(b)

prime = 3, 5, ...

Example. Consider the case with four leaves:

a     b    c     d
 \   /      \   /
  3a+b      3c+d
    \         /
     \       /
 5(3a+b) + (3c+d) 

The resulting hash would be:

sha3( 15 sha3(a) + 5 sha3(b) + 3 sha3(c) + sha3(d) )

Security analysis. We loose some security in collision resistance because in addition to plain collisions on sha3, it is also sufficient to find certain linear relationships. This makes birthday attacks easier. I don't think this affects pre-image resistance.

More worrying is that we loose the ability to construct Merkle proofs entirely. Let's say I want to proof membership of a.

a     x
 \   /
  3a+x        y
    \        /
     \      /
   5(3a+x) + y 

I would provide you the values

a = a
x = sha3(b)
y = 3 sha3(c) + sha3(d)

You would compute

sha3( 15 sha3(a) + 5 x + y ) == rootHash

It's clear that if I can solve for x and y to make the inner expression match the (publicly known) pre-image of the root-hash.

From this example it follows that

You could argue that only the branch not taken needs to be secure, the chain to the leaf does not need the

This is related to the topic

Remco Bloemen
Math & Engineering