2022-12-30

Upper bound for transaction memory

What is good upper bound estimation of total EVM memory usage one can reach in a single transaction given the transaction gas limit?

The gas cost of memory from the yellow paper is:

$\mathrm{C_{mem}}(a) ≝ \mathrm{G_{memory}} ⋅ a + \floor{ \frac{a^2}{512} }$

With $\mathrm{G_{memory}} = 3$ and $a$ is the highest accessed address in 256-bit words. Accessing arbitrarily high addresses can be done at constant negligible cost using a MLOAD opcode.

So given $g$ gas, what is an upper bound for the accessible address?

$\begin{aligned} g &≤ 3 ⋅ a + \floor{ \frac{a^2}{512} } \\ g &≤ 3 ⋅ a + \frac{a^2}{512} + 1 \\ a &≤ 256 ⋅ \p{\sqrt{9 + (g - 1) ⋅ \frac{1}{128} } - 3} \\ a &≤ \sqrt{589312 + 512 ⋅ g} - 768 \end{aligned}$

Note that $a$ is in words, so the allocatable memory in a single MLOAD with gas $g$ is $32\cdota$ bounded by

$M(g) ≤ 512 ⋅ \sqrt{2} ⋅ \sqrt{g + 1151} - 48⋅512$

Which looks like

Recursion and the 63/64 rule.

A transaction can (recursively) call itself, which allows avoiding the quadratic cost of growing memory by splitting it up in multiple smaller allocations. On a call a transaction can pass some of the available gas. (See EIP-150 and EIP-2929). A call costs at least

$\mathrm{C_{call}} ≥ \mathrm{G_{warmaccess}} = 100$

gas. There is some additional overhead in setting up the stack, but I conservatively round that to zero. There is no substantial difference between using CALL, CALLCODE, DELEGATECALL and STATICCALL here and the CREATE and CREATE2 opcodes (which also allow recursion) are strictly more expensive. The call can pass along at most $g'$ of the $g$ gas remaining after subtracting the call cost:

$g' = g - \floor{\frac{g}{64}} < \frac{63}{64} ⋅ g + 1$

So if $g_i$ gas is left at recursion $i$ , the the maximum gas available at $g_{i+1}$ is bounded by

$g_{i+1} < \frac{63}{64} ⋅ \p{g_i - 100} + 1 = \frac{1}{64} ⋅ \p{63 ⋅ g_i - 6236}$

We can solve this linear recurrence to get the closed form bound

$g_i < \p{g_0 + 6236}⋅\p{\frac{63}{64}}^i - 6236$

And we can solve this for zero to determine the maximum recursion depth

$i_{\mathrm{max}} < \frac{\ln 6236 - \ln \p{g_0 + 6236}}{\ln 63 - \ln 64}$

Which looks like

Recursive allocation

At each level of recursion we loose a $64$ -th fraction of the gas. This fraction can still be used after the call returns, but this is not useful for memory maximization purposes. We need to allocate memory before the recursive call so it must be held during the call. (For the same reason it does not help to use the return data write of the CALL, a robust EVM implementation can allocate that after the recursion is finished.)

To put a bound on the maximum memory allocatable with recursion, we can take the non-recursive limit $M(g)$ and define the recursive limit to be $M_r(g)$ . This limit will have to satisfy a function relation that basically says it is invariant to adding another level of recursion:

$M_r(g) = \max_{g' ∈\delim[{0,g}]} M(g - g') + M_r\p{\floor{\frac{63⋅\p{g' - \mathrm{C_{call}}}}{64}}}$

This recursive function relation is a form of Bellman equation, which are hard to solve analytically. But numerically they can be solved by iterating until we find the fixed point. Bellman equations are basically the continuous analogue to dynamic programming.

Each line represent a level of recursion, starting with a single context (no recursion) as the bottom most line. We can also look at $n = \frac{g}{g - g'}$ , which is the $\frac{1}{n}$ fraction of the gas that is used locally to allocate while the rest is used for recursion.

It is visible that for small $n$ , if we are using $n$ levels of recursion then in each level we want to use close to $\frac{1}{n}$ of the gas to allocate memory. As $n$ gets large or the amount of gas available gets lower we want to forward less of the gas and allocate more locally. This matches our intuition that a good basic strategy is to divide the gas equally over a number of recursive calls to maximally avoid the quadratic factor.

In the limit of many recursions and lots of gas it converges to $64$ , so $g' = \frac{63}{64} ⋅ g$ . If we assume this limit and set $\mathrm{C_{call}} = 0$ the above functional relation simplifies to

$M_r(g) = M\p{\frac{g}{64}} + M_r\p{\p{\frac{63}{64}}^2 ⋅ g}$

This defines an summation series

$M_r(g) = \sum_{i∈[0,∞)} M\p{\frac{1}{64} ⋅ \p{\frac{63}{64}}^{2⋅i}⋅g }$

substituting the above bound on $M(g)$

$M_r(g) = \sum_{i∈[0,∞)} \p{ 512 ⋅ \sqrt{2} ⋅ \sqrt{\frac{1}{64} ⋅ \p{\frac{63}{64}}^{2⋅i}⋅g + 1151} - 48⋅512 }$

taking the limit of large $g$ we can ignore the additive constants

$M_r(g) = \sum_{i∈[0,∞)} \p{ 512 ⋅ \sqrt{2} ⋅ \sqrt{\frac{1}{64} ⋅ \p{\frac{63}{64}}^{2⋅i}⋅g}}$

this simplifies to

$M_r(g) = 64 ⋅ \sqrt{2 ⋅ g} ⋅ \sum_{i∈[0,∞)} \p{\frac{63}{64}}^{i}$

which is a geometric series with solution

$M_r(g) = 4096 ⋅ \sqrt{2 ⋅ g}$

We can plot this with our numerical results

The function indeed looks like an upper bound so our simplifying assumptions seem to hold. The bound could be tighter, but looks usable. So the upper bounds we found for $30$ MGas are $32$ MBytes using the analytic solution and $27$ MBytes numerically.

Question. What's a good lower bound? I.e. write EVM bytecode that maximizes the memory required.