Sumcheck, MLEs and GKR

$$ \gdef\delim#1#2#3{\mathopen{}\mathclose{\left#1 #2 \right#3}} \gdef\p#1{\delim({#1})} \gdef\ps#1{\delim\{{#1}\}} \gdef\F{\mathbb F} \gdef\set#1{\mathcal #1} \gdef\mat#1{\mathrm #1} \gdef\vec#1{\bm #1} \gdef\popcount{\mathrm{popcount}} \gdef\eq{\mathrm{eq}} \gdef\L{\mathrm{L}} \gdef\∀{\mathop{\huge ∀}\limits} \gdef\mul{\mathsf{mul}} \gdef\add{\mathsf{add}} \gdef\addc{\mathsf{add^C}} \gdef\bar#1{\overline{#1}} $$

Last year saw the publication of many interesting new proof systems that deviate from the pattern of doing plonkish-starkish arithmetization on top of a (univariate) polynomial commitment protocol. These new systems are broadly based on two mechanism: folding schemes and sumcheck+GKR. In this note I'll explore sumcheck and GKR.

Sumcheck and GKR, introduced in 1992 and 2008 respectively, are relatively old protocols. I am not sure why they have thusfar received little attention from implementers, but I suspect it is because the proof sizes are not particularly small and the protocols are arguably more complicated. Proof size was the main metric to benchmark protocols on and here Groth16 is still king, with plonkish-starkish over KZG close behind. Now that it has become practical to recursively verify proofs this is less of a concern, a larger proof can be recursively verified in, say, a Groth16 proof and become small.

The main reason there is renewed interest in GKR protocols is prover complexity. Both Groth16 and plonkish arithmetization require the prover to do $O(n⋅\log n)$ work in the size of the computation. Through clever use of multivariate polynomials the GKR based methods can achieve $O(n)$ prover complexity.

Sumcheck

Sumcheck was introduced in LFKN92 and allows reducing certain sums of evaluations to a single evaluation. Given a field $\F$ multivariate polynomial $f ∈ \F[X_1,…,X_n]$ and an evaluation domain $\set B = \set B_1 × ⋯ × \set B_n$ with $\set B_i \subseteq \F$. Typically the evaluation domain is a unit hypercube, $\set B = \ps{0,1}^n$, but I'll present it generically here.

The goal is to reduce proving

$$ h = \sum_{\vec x ∈ \set B} f\p{\vec x} $$

to proving $f(\vec r) = c$ for some $\vec r ∈ \F^n$ and $c ∈ \F$. This can then be proven by other means, for example by an opening of a polynomial commitment. The protocol proceeds as follows:

  1. Prover sends $f_1 ∈ \F[X]$, a univariate polynomial such that $$ f_1(X) = \sum_{\vec x ∈ \set B_2×⋯×\set B_n} f\p{X, \vec x} $$
  2. Verifier sends a random $r_1 ∈ \F$ and checks that $\deg f_1 = \deg_1 f$ and computes $$ h = \sum_{x ∈ \set B_1} f_1\p{x} $$
  3. Prover sends $f_2(X)$ where $$ f_2(X) = \sum_{\vec x ∈ \set B_3×⋯×\set B_n} f\p{r_1, X, \vec x} $$
  4. Verifier sends a random $r_2 ∈ \F$ and checks that $\deg f_2 = \deg_2 f$ and $$ f_1(r_1) = \sum_{x ∈ \set B_2} f_2\p{x} $$
  5. The prover and verifier repeat this process $n-3$ more times.
  6. Prover sends $f_n$ where $$ f_n(X) = f\p{\vec r, X} $$
  7. Verifier sends a random $r_n ∈ \F$ and computes $c = f_n(r_n)$.
  8. The prover and verifier use further methods to check $$ c = f(\vec r) $$

See Justin Thaler's notes on Sum-Check T20a, T23a.

Q: Does this generalize to other reductions than addition? What is it about addition that makes this work?

Q: Does this generalize to other projections than on an axis? Would a sequence of linear independent lines $\F → \F^n$ work?

Analysis

Soundness. TODO: Soundness $O\p{\frac n{|\F|}}$. See also Thaler's note on small fields T23b.

Communication. There are $n$ rounds in the protocol and at each round $i$ the prover sends a $\deg_i f$ univariate polynomial. Which in general takes $\sum_i \p{\deg_i f + 1}$ elements of $\F$ total. In each round the prover sends a random $\F$, so $n$ elements of $\F$ total. Assuming

Prover. TODO: Prover complexity, especially it being better than the $k⋅\log k$ (in terms of number of values committed) that univariate proving systems have.

Verifier. In each round $i$ the verifier computes a sum over $|\set B_i|$ evaluations of a $\deg_i f$ degree polynomial. With direct evaluation this takes $$ \p{|\set B_i|⋅\p{1 + \deg_i f} - 1} ⋅ \add + |\set B_i| ⋅ \deg_i f ⋅ \mul $$ but there are more efficient algorithms for polynomial multi-evaluation. It is also that likely $\set B_i$ and $f$ have additional structure that can be exploited in a particular application.

Multi-linear functions

A multilinear function is a multivariate polynomial such that no variable occurs with degree higher than one. In other words, when considering a single variable and holding the rest constant the function is linear.

The equality function

A particularly useful multilinear function is the equality function:

$$ \eq(\vec x, \vec y) = \prod_{i∈[0,n)}\p{x_i⋅y_i + (1 - x_i)⋅(1 - y_i)} $$

It is called the equality function because it is the indicator function for equality of two vectors on the unit hypercube:

$$ \∀_{\vec x, \vec y ∈ \{0,1\}^n} \eq(\vec x, \vec y) = \begin{cases} 1 & \vec x = \vec y \\ 0 & \vec x ≠ \vec y \end{cases} $$

It is symmetrical under swapping arguments and permuting the vectors, i.e. given a permutation matrix $P$:

$$ \eq(\vec x, \vec y) = \eq(\vec y, \vec x) = \eq(P⋅\vec x, P⋅\vec y) $$

It can be efficiently evaluated using the product formula above, requiring the following operation in $\F$:

$$ (3⋅n-1) ⋅ \mul + n⋅\add + 2⋅n⋅\addc $$

The hypercube basis

The equality function makes it easy to define basis functions for the corners of the hypercube. Given a corner $\vec c ∈ \{0,1\}^n$ the function $\L_{\vec c}(\vec x) = \eq(\vec c, \vec x)$ is $1$ on $\vec c$ and $0$ on all other corners. The evaluation of this function simplifies to

$$ \L_{\vec c}(\vec x) = \prod_{i∈[0,n)}\begin{cases} x_i & c_i = 1 \\ 1 - x_i & c_i = 0 \end{cases} $$

This simplifies the number of operations in $\F$ to:

$$ (n - 1)⋅\mul + (n - \popcount(\vec c))⋅\addc $$

If we need the values $\L_{\vec c}(\vec r)$ for all $\vec c ∈ \{0,1\}^n$ and some arbitrary $\vec r$ we can compute these efficiently by building up one dimension at a time. Taking $\bar{r_i} = 1 - r_i$

$$ \begin{aligned} x_0 &= \bar{r_0} &\quad x_{00} &= x_0 ⋅ \bar{r_1} &\quad x_{000} &= x_{00} ⋅ \bar{r_2} & \quad&⋯\\ x_1 &= r_0 & x_{01} &= x_0 ⋅ r_1 & x_{001} &= x_{00} ⋅ r_2 & &⋯\\ & & x_{10} &= x_1 ⋅ \bar{r_1} & x_{010} &= x_{01} ⋅ \bar{r_2} & &⋯\\ & & x_{11} &= x_1 ⋅ r_1 & x_{011} &= x_{01} ⋅ r_2 & &⋯\\ & & & & x_{100} &= x_{10} ⋅ \bar{r_2} & &⋯\\ & & & & x_{100} &= x_{11} ⋅ r_2 & &⋯\\ & & & & &\ \ ⋮ & &⋱\\ \end{aligned} $$

This can be done in-place and takes $\p{2^{n+1} - 4}⋅\mul + n⋅\addc$ operations in $\F$. See VSBW13, T17.

This method generalizes to any function that factors into the form $f(\vec x) = \prod_{i∈[0,n)} f_i(x_i)$ where we get the same number of multiplications and also $2⋅n$ evaluations of $f_i$ in $0, 1$. If we want the sum of all these $f(\vec c)$ values we only need $\p{n-1}⋅\mul + n⋅\add$ operations using:

$$ \sum_{x ∈ \ps{0,1}^n} \prod_{i∈[0,n)} f_i(x_i) = \prod_{i∈[0,n)}\p{f_i(0) + f_i(1)} $$

Both these methods generalize to arbitrary product domains $\set B = \set B_0 × ⋯ × \set B_{n-1}$, e.g.

$$ \sum_{\vec x ∈ \set B} \prod_{i∈[0,n)} f_i(x_i) = \prod_{i∈[0,n)} \sum_{x ∈ \set B_i} f_i(x) $$

Multilinear extensions (MLEs)

Given $k = 2^n$ values in $\F$ we can store them at the corners of a $n$ dimensional hypercube $\{0,1\}^n$. Take some ordering of the corners $\vec c_i$ and values $y_i$ then the multilinear function interpolating these values is

$$ f(\vec x) = \sum_{i ∈ [0,k)} y_i ⋅ \L_{\vec c_i}(\vec x) $$

Directly evaluating this takes $O\p{k⋅\log k}$ operations (see CTY11, T17). This can be reduced to $O\p{k}$ using $k$ memory by first computing all $\L_{\vec c}\p{\vec x}$ values using the method above.

Hyperrectangles

We have focussed on the unit hypercube, $\{0,1\}^n$, so far. Another interesting basis is the symmetric hypercube $\{-1, 1\}$. And really this should generalize straightforward with non-degenerate affine transformations.

TODO: Wikipedia has an interesting remark that transforming between coefficient and evaluation form on the unit hypercube is a Möbius transform, and for the symmetric hypercube a Hadamard transform.

Zero testing

Sumcheck can be used to prove $f(\vec x) = 0$ for all $\vec x ∈ \{0,1\}^n$. To do this, we first construct a function $g(\vec x)$ that is a multilinear extension of $f$ resricted to the unit cube:

$$ g(\vec x) = \sum_{\vec y ∈ \{0,1\}^n} \L_{\vec y}\p{\vec x}⋅f(\vec y) $$

Since the interpolation is unique, $g(\vec x)$ is the zero polynomial if and only if $f$ is zero on the unit cube. By Schwartz-Zippel if $g(\vec r) = 0$ for a random $\vec r ∈\F^n$ then it is likely the zero polynomial. Thus, to proof our claim we do the following:

  1. Verifier send a random $\vec r ∈ \F^n$.
  2. Prover and verifier use sumcheck on the sum $$ 0 = g(\vec r) = \sum_{\vec y ∈ \ps{0,1}^n} \L_{\vec y}\p{\vec r}⋅f(\vec y) $$
  3. The prover and verifier use further methods to check $$ \L_{\vec r'}\p{\vec r}⋅f(\vec r') = c $$

This reduced claim is identical to $\eq\p{\vec r, \vec r'}⋅f\p{\vec r'} = c$.

Reduction to one query

Suppose we have a polynomial $f: \F^n → \F$ and we want to prove $f(\vec x_1) = y_1$ and $f(\vec x_2) = y_2$. We can reduce this to a single evaluation. Create a linear function $l: \F → \F^n$ such that $l(0) = \vec x_1$ and $l(1) = \vec x_2$:

$$ l(x) = \vec x_1 + x⋅(\vec x_2 - \vec x_1) $$

  1. Prover sends univariate polynomial $f'(x) = f(l(x))$.
  2. Verifier sends random $r ∈ \F$ and checks $f'(0) = y_1$ and $f'(1) = y_2$.
  3. The prover and verifier use further methods to check $$ f'(r) = f(l(r)) $$

This last claim is of the form $f(\vec x) = y$.

Q: What is the size of $f'$? In T17 it is stated to be at most degree $n$, but this could be a special case.

If $f$ is multilinear then $f'$ is degree at most $n$. This generalizes to $k$ evaluation points with $l$ interpolating $l(0)=\vec x_0, l(1) = \vec x_1, …, l(k-1) = x_{k-1}$. And $\deg f'$ at most $\p{k-1}⋅ n$.

Alternative using linear combinations

See CFS17 and XZZ+19.

  1. Verifier sends random $r ∈ 𝔽$.
  2. Prover sends

GKR

In GKR08 we have a $k$-layered circuit with the first layer being the input and each further layer computed from the previous one by adding or multiplying two values. For simplicity let's assume each layer has the same number of values $2^n$. We encode each layers values on a $\ps{0,1}^n$ hypercube, creating multi-linear extensions $w_i(\vec x)$ for each layer $i$. The inputs are encoded in $w_0(\vec x)$ and the output will be in $w_k(\vec x)$. The circuit is encoded using two indicator functions, they are MLEs with the following values on the hypercube $\ps{0,1}^{3⋅n}$:

$$ \begin{aligned} \mathrm{add}_i\p{\vec x, \vec y, \vec z} &= \begin{cases} 1 & w_{i+1}(\vec x) = w_i(\vec y) + w_i(\vec z)\\ 0 & \text{otherwise} \end{cases} \\ \mathrm{mul}_i\p{\vec x, \vec y, \vec z} &= \begin{cases} 1 & w_{i+1}(\vec x) = w_i(\vec y) ⋅ w_i(\vec z)\\ 0 & \text{otherwise} \end{cases} \\ \end{aligned} $$

Note that the order of $\vec y$, $\vec z$ is important to avoid double counting, there should be only one $1$ per output node, i.e. for each value of $\vec x$. With these circuit-encoding functions the values on the layers are related by

$$ w_{i+1}(\vec x) = \sum_{\vec y, \vec z ∈ \{0,1\}^n} \p{\hspace{.1em} \begin{aligned} &\phantom{+}\mathrm{add}_i\p{\vec x, \vec y, \vec z} ⋅ \p{w_i(\vec y) + w_i(\vec z)}\\ &+\mathrm{mul}_i\p{\vec x, \vec y, \vec z} ⋅ \p{w_i(\vec y) ⋅ w_i(\vec z)} \end{aligned} \hspace{.5em}} $$

By constuction this holds for $\vec x ∈ \ps{0,1}^n$. To see that it holds for all $\vec x ∈ \F^n$ observe that $\mathrm{add}_i$ and $\mathrm{mul}_i$ are MLEs in $\vec x$ and a linear combination of MLEs is also an MLE. Since MLEs are uniquely identified by their values on $\ps{0,1}^n$ they must be identical. Note that $\mathrm{add}_i$ and $\mathrm{mul}_i$ do not need to be multilinear in $\vec y, \vec z$. In GKR08 a slightly more complex expression is used that does not require multilinearity in $\vec x$ either. The above expression is due to T15.

  1. Prover sends the output values $w_k(\vec x)$.
  2. Verifier sends random $\vec r_{\vec x} ∈ \F^n$ and computes $w_k(\vec r_{\vec x})$.
  3. The prover and verifier run sumcheck on the relation between $w_{k}(\vec r_{\vec x})$ and $w_{k-1}(\vec x)$ as above (using $f$ for the summand) $$ w_k(\vec r_{\vec x}) = \sum_{\vec y, \vec z ∈ \ps{0,1}^n} f(\vec r_{\vec x}, \vec y, \vec z) $$ this results in $\vec c ∈ \F$, random $\vec r_{\vec y}, \vec r_{\vec z} ∈ \F^n$ and a claim $c = f\p{\vec r_{\vec x}, \vec r_{\vec y}, \vec r_{\vec z}}$
  4. Prover sends $w_{k-1}(\vec r_{\vec y})$ and $w_{k-1}(\vec r_{\vec z})$, to be proven in a moment.
  5. Verifier computes $\mathrm{add}_{k-1}\p{\vec r_{\vec x}, \vec r_{\vec y}, \vec r_{\vec z}}$ and $\mathrm{mul}_{k-1}\p{\vec r_{\vec x}, \vec r_{\vec y}, \vec r_{\vec z}}$ locally and checks $c = f\p{\vec r_{\vec x}, \vec r_{\vec y}, \vec r_{\vec z}}$.
  6. The prover and verifier reduce the two queries $w_{k-1}(\vec r_{\vec y})$ and $w_{k-1}(\vec r_{\vec z})$ to one query $w_{k-1}(\vec r_{\vec x}')$.
  7. The prover and verifier run repeat steps 3-6 for $k-1$ more times.
  8. The prover and verifier use further methods to checks $w_0(\vec r_{\vec x})$.

This reduces a claim on the output values $w_k(\vec x)$ to a claim on the input values $w_0(\vec r_{\vec x})$.

Note that it is assumed the verifier knows $\mathrm{add}_i$ and $\mathrm{mul}_i$ and it is economical for the verifier to evaluate it. Alternatively the prover can use a polynomial commitment scheme or other means to convey those values to the verifier.

This protocol can be straightforward generalized to support unequal sized layers $n_i$, leading to potential further reductions in proof size and compute cost.

TODO: Analysis, especially complexity and proof size. Especially efficient evaluation of $\mathrm{add}_i$, $\mathrm{mul}_i$.

Q: The choice of addition and multiplication gates is natural, but also arbitrary. What other gates can be supported? S23 presents a clever scheme using an exponentiation operator.

Quickly evaluating $\add_i$ and $\mul_i$ in random $\vec r$

Observe that in a fully defined circuit each output node has either $\add$ or $\mul$ return $1$ for it. So together they have one-hot on each corner of the $\vec x$ hypercube. We can construct this basis in

$$ \p{2^{n+1} - 4}⋅\mul + n⋅\addc $$

operations.

$$ \mul_i(\vec r_x, \vec r_y, \vec r_z) = \sum_{\vec x ∈ \set M} \L_{\vec x}(\vec r_x)⋅\L_{\vec y(\vec x)}\p{\vec r_y}⋅\L_{\vec z(\vec x)}\p{\vec r_z} $$

What remains is computing the various $\L_{\vec c_x}$ values. Here $\vec c_x$ encodes the connectivity and is sparse: only $2^n$ out of $2^{2⋅n}$ values are used. We can however again assume that each node is an input at least once. So all values appear. We get

$$ 3⋅\p{2^{n+1} - 4}⋅\mul + 3⋅n⋅\addc $$

To compute the three lookup tables for the $\L$'s. Then to combine them into $\add$ and $\mul$ we need a further

$$ 2^{n + 1}⋅\mul $$

  1. Compute the hypercube basis for $\L_{\vec c}(\vec r_x)$, $\L_{\vec c}(\vec r_y)$ and $\L_{\vec c}(\vec r_z)$ in $3⋅2^n$ space.
  2. Use two acumulators $\add$ and $\mul$. For each entry in

See XZZ+19.

References

https://github.com/ingonyama-zk/papers/blob/main/sumcheck_201_chapter_1.pdf

Remco Bloemen
Math & Engineering
https://2π.com