Upper bound for transaction memory
Paweł Bylica asks:
What is good upper bound estimation of total EVM memory usage one can reach in a single transaction given the transaction gas limit?
The gas cost of memory from the yellow paper is:
\mathrm{C_{mem}}(a) ≝ \mathrm{G_{memory}} ⋅ a + \floor{ \frac{a^2}{512} }
With \mathrm{G_{memory}} = 3 and a is the highest accessed address in 256-bit words. Accessing arbitrarily high addresses can be done at constant negligible cost using a MLOAD
opcode.
So given g gas, what is an upper bound for the accessible address?
\begin{aligned} g &≤ 3 ⋅ a + \floor{ \frac{a^2}{512} } \\ g &≤ 3 ⋅ a + \frac{a^2}{512} + 1 \\ a &≤ 256 ⋅ \p{\sqrt{9 + (g - 1) ⋅ \frac{1}{128} } - 3} \\ a &≤ \sqrt{589312 + 512 ⋅ g} - 768 \end{aligned}
Note that a is in words, so the allocatable memory in a single MLOAD
with gas g is 32⋅a bounded by
M(g) ≤ 512 ⋅ \sqrt{2} ⋅ \sqrt{g + 1151} - 48⋅512
Which looks like
Recursion and the 63/64 rule.
A transaction can (recursively) call itself, which allows avoiding the quadratic cost of growing memory by splitting it up in multiple smaller allocations. On a call a transaction can pass some of the available gas. (See EIP-150 and EIP-2929). A call costs at least
\mathrm{C_{call}} ≥ \mathrm{G_{warmaccess}} = 100
gas. There is some additional overhead in setting up the stack, but I conservatively round that to zero. There is no substantial difference between using CALL
, CALLCODE
, DELEGATECALL
and STATICCALL
here and the CREATE
and CREATE2
opcodes (which also allow recursion) are strictly more expensive. The call can pass along at most g' of the g gas remaining after subtracting the call cost:
g' = g - \floor{\frac{g}{64}} < \frac{63}{64} ⋅ g + 1
So if g_i gas is left at recursion i, the the maximum gas available at g_{i+1} is bounded by
g_{i+1} < \frac{63}{64} ⋅ \p{g_i - 100} + 1 = \frac{1}{64} ⋅ \p{63 ⋅ g_i - 6236}
We can solve this linear recurrence to get the closed form bound
g_i < \p{g_0 + 6236}⋅\p{\frac{63}{64}}^i - 6236
And we can solve this for zero to determine the maximum recursion depth
i_{\mathrm{max}} < \frac{\ln 6236 - \ln \p{g_0 + 6236}}{\ln 63 - \ln 64}
Which looks like
Recursive allocation
At each level of recursion we loose a 64-th fraction of the gas. This fraction can still be used after the call returns, but this is not useful for memory maximization purposes. We need to allocate memory before the recursive call so it must be held during the call. (For the same reason it does not help to use the return data write of the CALL
, a robust EVM implementation can allocate that after the recursion is finished.)
To put a bound on the maximum memory allocatable with recursion, we can take the non-recursive limit M(g) and define the recursive limit to be M_r(g). This limit will have to satisfy a function relation that basically says it is invariant to adding another level of recursion:
M_r(g) = \max_{g' ∈\delim[{0,g}]} M(g - g') + M_r\p{\floor{\frac{63⋅\p{g' - \mathrm{C_{call}}}}{64}}}
This recursive function relation is a form of Bellman equation, which are hard to solve analytically. But numerically they can be solved by iterating until we find the fixed point. Bellman equations are basically the continuous analogue to dynamic programming.
Each line represent a level of recursion, starting with a single context (no recursion) as the bottom most line. We can also look at n = \frac{g}{g - g'}, which is the \frac{1}{n} fraction of the gas that is used locally to allocate while the rest is used for recursion.
It is visible that for small n, if we are using n levels of recursion then in each level we want to use close to \frac{1}{n} of the gas to allocate memory. As n gets large or the amount of gas available gets lower we want to forward less of the gas and allocate more locally. This matches our intuition that a good basic strategy is to divide the gas equally over a number of recursive calls to maximally avoid the quadratic factor.
In the limit of many recursions and lots of gas it converges to 64, so g' = \frac{63}{64} ⋅ g. If we assume this limit and set \mathrm{C_{call}} = 0 the above functional relation simplifies to
M_r(g) = M\p{\frac{g}{64}} + M_r\p{\p{\frac{63}{64}}^2 ⋅ g}
This defines an summation series
M_r(g) = \sum_{i∈[0,∞)} M\p{\frac{1}{64} ⋅ \p{\frac{63}{64}}^{2⋅i}⋅g }
substituting the above bound on M(g)
M_r(g) = \sum_{i∈[0,∞)} \p{ 512 ⋅ \sqrt{2} ⋅ \sqrt{\frac{1}{64} ⋅ \p{\frac{63}{64}}^{2⋅i}⋅g + 1151} - 48⋅512 }
taking the limit of large g we can ignore the additive constants
M_r(g) = \sum_{i∈[0,∞)} \p{ 512 ⋅ \sqrt{2} ⋅ \sqrt{\frac{1}{64} ⋅ \p{\frac{63}{64}}^{2⋅i}⋅g}}
this simplifies to
M_r(g) = 64 ⋅ \sqrt{2 ⋅ g} ⋅ \sum_{i∈[0,∞)} \p{\frac{63}{64}}^{i}
which is a geometric series with solution
M_r(g) = 4096 ⋅ \sqrt{2 ⋅ g}
We can plot this with our numerical results
The function indeed looks like an upper bound so our simplifying assumptions seem to hold. The bound could be tighter, but looks usable. So the upper bounds we found for 30 MGas are 32 MBytes using the analytic solution and 27 MBytes numerically.
Question. What's a good lower bound? I.e. write EVM bytecode that maximizes the memory required.