Sumcheck and MLEs

$$ \gdef\delim#1#2#3{\mathopen{}\mathclose{\left#1 #2 \right#3}} \gdef\p#1{\delim({#1})} \gdef\ps#1{\delim\{{#1}\}} \gdef\F{\mathbb F} \gdef\set#1{\mathcal #1} \gdef\mat#1{\mathrm #1} \gdef\vec#1{\bm #1} \gdef\popcount{\mathrm{popcount}} \gdef\eq{\mathrm{eq}} \gdef\∀{\mathop{\huge ∀}\limits} \gdef\mul{\mathsf{mul}} \gdef\add{\mathsf{add}} \gdef\addc{\mathsf{add^C}} \gdef\bar#1{\overline{#1}} $$

Sumcheck

Sumcheck was introduced in LFKN92 and allows reducing certain sums of evaluations to a single evaluation. Given a field $\F$ multivariate polynomial $f ∈ \F[X_1,…,X_n]$ and an evaluation domain $\set B = \set B_1 × ⋯ × \set B_n$ with $\set B_i \subseteq \F$. Typically the evaluation domain is a unit hypercube, $\set B = \ps{0,1}^n$, but I'll present it generically here.

The goal is to reduce proving

$$ h = \sum_{\vec x ∈ \set B} f\p{\vec x} $$

to proving $f(\vec r) = c$ for some $\vec r ∈ \F^n$ and $c ∈ \F$. This can then be proven by other means, for example by an opening of a polynomial commitment. The protocol proceeds as follows:

  1. Prover sends $f_1 ∈ \F[X]$, a univariate polynomial such that $$ f_1(X) = \sum_{\vec x ∈ \set B_2×⋯×\set B_n} f\p{X, \vec x} $$
  2. Verifier sends a random $r_1 ∈ \F$ and checks that $\deg f_1 = \deg_1 f$ and computes $$ h = \sum_{x ∈ \set B_1} f_1\p{x} $$
  3. Prover sends $f_2(X)$ where $$ f_2(X) = \sum_{\vec x ∈ \set B_3×⋯×\set B_n} f\p{r_1, X, \vec x} $$
  4. Verifier sends a random $r_2 ∈ \F$ and checks that $\deg f_2 = \deg_2 f$ and $$ f_1(r_1) = \sum_{x ∈ \set B_2} f_2\p{x} $$
  5. The prover and verifier repeat this process $n-3$ more times.
  6. Prover sends $f_n$ where $$ f_n(X) = f\p{\vec r, X} $$
  7. Verifier sends a random $r_n ∈ \F$ and computes $c = f_n(r_n)$.
  8. The prover and verifier use further methods to check $$ c = f(\vec r) $$

See Justin Thaler's notes on Sum-Check T20a, T23a.

Q: Does this generalize to other reductions than addition? What is it about addition that makes this work?

Q: Does this generalize to other projections than on an axis? Would a sequence of linear independent lines $\F → \F^n$ work?

Analysis

Soundness. (TODO: check) Soundness $O\p{\frac {b ⋅ n}{|\F|}}$ where $b = \max_i |\set B_i|$. See also Thaler's note on small fields T23b.

Communication. There are $n$ rounds in the protocol and at each round $i$ the prover sends a $\deg_i f$ univariate polynomial. Which in general takes $\sum_i \p{\deg_i f + 1}$ elements of $\F$ total. In each round the prover sends a random $\F$, so $n$ elements of $\F$ total. Assuming

Prover. TODO: Prover complexity, especially it being better than the $k⋅\log k$ (in terms of number of values committed) that univariate proving systems have.

Verifier. In each round $i$ the verifier computes a sum over $|\set B_i|$ evaluations of a $\deg_i f$ degree polynomial. With direct evaluation this takes $$ \p{|\set B_i|⋅\p{1 + \deg_i f} - 1} ⋅ \add + |\set B_i| ⋅ \deg_i f ⋅ \mul $$ but there are more efficient algorithms for polynomial multi-evaluation. It is also that likely $\set B_i$ and $f$ have additional structure that can be exploited in a particular application.

Multi-linear functions

A multilinear function is a multivariate polynomial such that no variable occurs with degree higher than one. In other words, when considering a single variable and holding the rest constant the function is linear.

The equality function

A particularly useful multilinear function is the equality function:

$$ \eq(\vec x, \vec y) = \prod_{i∈[0,n)}\p{x_i⋅y_i + (1 - x_i)⋅(1 - y_i)} $$

It is called the equality function because it is the indicator function for equality of two vectors on the unit hypercube:

$$ \∀_{\vec x, \vec y ∈ \{0,1\}^n} \eq(\vec x, \vec y) = \begin{cases} 1 & \vec x = \vec y \\ 0 & \vec x ≠ \vec y \end{cases} $$

It is symmetrical under swapping arguments and permuting the vectors, i.e. given a permutation matrix $P$:

$$ \eq(\vec x, \vec y) = \eq(\vec y, \vec x) = \eq(P⋅\vec x, P⋅\vec y) $$

It can be efficiently evaluated using the product formula above, requiring the following operation in $\F$:

$$ (3⋅n-1) ⋅ \mul + n⋅\add + 2⋅n⋅\addc $$

Addendum. It actually can be rewritten into one multiplication $$ x_i⋅y_i + (1 - x_i)⋅(1 - y_i) = 1 - x_i - y_i + 2⋅x_i⋅y_i $$

The hypercube basis

The equality function makes it easy to define basis functions for the corners of the hypercube. Given a corner $\vec c ∈ \{0,1\}^n$ the function $\eq(\vec c, \vec x)$ is $1$ on $\vec c$ and $0$ on all other corners. The evaluation of this function simplifies to $$ \eq(\vec c, \vec x) = \prod_{i∈[0,n)}\begin{cases} x_i & c_i = 1 \\ 1 - x_i & c_i = 0 \end{cases} $$

This simplifies the number of operations in $\F$ to: $$ (n - 1)⋅\mul + (n - \popcount(\vec c))⋅\addc $$

If we need the values $\eq(\vec c, \vec r)$ for all $\vec c ∈ \{0,1\}^n$ and some arbitrary $\vec r$ we can compute these efficiently by building up one dimension at a time. Taking $\bar{r_i} = 1 - r_i$

$$ \begin{aligned} x_0 &= \bar{r_0} &\quad x_{00} &= x_0 ⋅ \bar{r_1} &\quad x_{000} &= x_{00} ⋅ \bar{r_2} & \quad&⋯\\ x_1 &= r_0 & x_{01} &= x_0 ⋅ r_1 & x_{001} &= x_{00} ⋅ r_2 & &⋯\\ & & x_{10} &= x_1 ⋅ \bar{r_1} & x_{010} &= x_{01} ⋅ \bar{r_2} & &⋯\\ & & x_{11} &= x_1 ⋅ r_1 & x_{011} &= x_{01} ⋅ r_2 & &⋯\\ & & & & x_{100} &= x_{10} ⋅ \bar{r_2} & &⋯\\ & & & & x_{100} &= x_{11} ⋅ r_2 & &⋯\\ & & & & &\ \ ⋮ & &⋱\\ \end{aligned} $$

This can be done in-place and takes $\p{2^{n+1} - 4}⋅\mul + n⋅\addc$ operations in $\F$. See VSBW13, T17.

This method generalizes to any function that factors into the form $f(\vec x) = \prod_{i∈[0,n)} f_i(x_i)$ where we get the same number of multiplications and also $2⋅n$ evaluations of $f_i$ in $0, 1$. If we want the sum of all these $f(\vec c)$ values we only need $\p{n-1}⋅\mul + n⋅\add$ operations using:

$$ \sum_{x ∈ \ps{0,1}^n} \prod_{i∈[0,n)} f_i(x_i) = \prod_{i∈[0,n)}\p{f_i(0) + f_i(1)} $$

Both these methods generalize to arbitrary product domains $\set B = \set B_0 × ⋯ × \set B_{n-1}$, e.g.

$$ \sum_{\vec x ∈ \set B} \prod_{i∈[0,n)} f_i(x_i) = \prod_{i∈[0,n)} \sum_{x ∈ \set B_i} f_i(x) $$

Q. Would it be posible to compute it in $O(n)$ time with $O(1)$ memory if we permute the order of elements to be a Gray code?

Multilinear extensions (MLEs)

Given $k = 2^n$ values in $\F$ we can store them at the corners of a $n$ dimensional hypercube $\{0,1\}^n$. Take some ordering of the corners $\vec c_i$ and values $y_i$ then the multilinear function interpolating these values is

$$ f(\vec x) = \sum_{i ∈ [0,k)} y_i ⋅ \eq(\vec c_i, \vec x) $$

Directly evaluating this takes $O\p{k⋅\log k}$ operations (see CTY11, T17). This can be reduced to $O\p{k}$ using $k$ memory by first computing all $\eq(\vec c, \vec x)$ values using the method above.

Hyperrectangles

We have focussed on the unit hypercube, $\{0,1\}^n$, so far. Another interesting basis is the symmetric hypercube $\{-1, 1\}$. And really this should generalize straightforward with non-degenerate affine transformations.

Note. Wikipedia has an interesting remark that transforming between coefficient and evaluation form on the unit hypercube is a Möbius transform, and for the symmetric hypercube a Hadamard transform.

Note. Bit equality indicator in $\{0,1\}$ vs $\{-1,1\}$: $$ \begin{aligned} 1 - a - b + 2⋅a⋅b &&\quad& \frac{a⋅b + 1}{2} \end{aligned} $$ in the $\operatorname{eq}$ usecase the $\frac12$ factor can be moved out of the product, and likely absorbed somewhere else entirely. So this seems beneficial. It does not appear possible for any basis to do better than $a⋅b + 1$ as neither addition or multiplication alone is sufficient.

Zero testing

Sumcheck can be used to prove $f(\vec x) = 0$ for all $\vec x ∈ \{0,1\}^n$. To do this, we first construct a function $g(\vec x)$ that is a multilinear extension of $f$ resricted to the unit cube: $$ g(\vec x) = \sum_{\vec y ∈ \{0,1\}^n} \eq(\vec y, \vec x) ⋅ f(\vec y) $$

Since the interpolation is unique, $g(\vec x)$ is the zero polynomial if and only if $f$ is zero on the unit cube. By Schwartz-Zippel if $g(\vec r) = 0$ for a random $\vec r ∈\F^n$ then it is likely the zero polynomial. Thus, to proof our claim we do the following:

  1. Verifier send a random $\vec r ∈ \F^n$.
  2. Prover and verifier use sumcheck on the sum $$ 0 = g(\vec r) = \sum_{\vec y ∈ \ps{0,1}^n} \eq(\vec y, \vec r) ⋅ f(\vec y) $$
  3. The prover and verifier use further methods to check $$ \eq(\vec r', \vec r) ⋅ f(\vec r') = c $$

Reduction to one query

Suppose we have a polynomial $f: \F^n → \F$ and we want to prove $f(\vec x_1) = y_1$ and $f(\vec x_2) = y_2$. We can reduce this to a single evaluation. Create a linear function $l: \F → \F^n$ such that $l(0) = \vec x_1$ and $l(1) = \vec x_2$:

$$ l(x) = \vec x_1 + x⋅(\vec x_2 - \vec x_1) $$

  1. Prover sends univariate polynomial $f'(x) = f(l(x))$.
  2. Verifier sends random $r ∈ \F$ and checks $f'(0) = y_1$ and $f'(1) = y_2$.
  3. The prover and verifier use further methods to check $$ f'(r) = f(l(r)) $$

This last claim is of the form $f(\vec x) = y$.

Q. What is the size of $f'$? In T17 it is stated to be at most degree $n$, but this could be a special case.

If $f$ is multilinear then $f'$ is degree at most $n$. This generalizes to $k$ evaluation points with $l$ interpolating $l(0)=\vec x_0, l(1) = \vec x_1, …, l(k-1) = x_{k-1}$. And $\deg f'$ at most $\p{k-1}⋅ n$.

Alternative using linear combinations

See CFS17 and XZZ⁺19.

  1. Verifier sends random $r ∈ 𝔽$.
  2. Prover sends

Zero-knowledge

Note. This still leaks a single evaluation of $f$.

The sum-check protocol by itself is not zero-knowledge, but can be made such. In CFS17 the prover does this using a random masking polynomial. Starting with the prover claiming $h = \sum_{\vec x ∈ \set B} f\p{\vec x}$.

  1. Prover picks a random $g ∈ \F[X_1,…,X_n]$ of equal degrees as $f$ and computes $h' = \sum_{\vec x ∈ \set B} g\p{\vec x}$
  2. Prover sends $h'$ and a commitment to $g$.
  3. Verifier sends random $ρ ∈ 𝔽$.
  4. The prover and verifier use the sumcheck protocol on $$ h + ρ ⋅ h' = \sum_{\vec x ∈ \set B} f\p{\vec x} + ρ ⋅ g\p{\vec x} $$ to reduce it to $$ c + ρ ⋅ c' = f\p{\vec r} + ρ ⋅ g\p{\vec r} $$
  5. Prover decommits $c' = g\p{\vec r}$.
  6. The prover and verifier now continue with the claim $c = f\p{\vec r}$.

The masking polynomial $g$ has $\prod_i(1 + \deg_i f)$ random coefficients, which is as large as the original $f$. But often $f$ has special structure that allows much faster evaluation than an arbitrary polynomial. But $g$ being arbitrary ruins these often crucial optimizations. In XZZ⁺19 (aka Libra) this is addressed by realizing that a smaller $g$ is sufficient. In particular $$ g(\vec x) = g_1(x_1) + g_2(x_2) + ⋯ + g_n(x_n) $$ where $g_i ∈ 𝔽[X]$ are random with $\deg g_i = \deg_{i} f$ is sufficient. Note that the constant term only has to be generated once, resulting in only $1 + \sum_i \deg_i f$ random coefficients.

References

Remco Bloemen
Math & Engineering
https://2π.com