Sumcheck and MLEs

\gdef\delim#1#2#3{\mathopen{}\mathclose{\left#1 #2 \right#3}} \gdef\p#1{\delim({#1})} \gdef\ps#1{\delim\{{#1}\}} \gdef\F{\mathbb F} \gdef\set#1{\mathcal #1} \gdef\mat#1{\mathrm #1} \gdef\vec#1{\bm #1} \gdef\popcount{\mathrm{popcount}} \gdef\eq{\mathrm{eq}} \gdef\∀{\mathop{\huge ∀}\limits} \gdef\mul{\mathsf{mul}} \gdef\add{\mathsf{add}} \gdef\addc{\mathsf{add^C}} \gdef\bar#1{\overline{#1}}

Sumcheck

Sumcheck was introduced in LFKN92 and allows reducing certain sums of evaluations to a single evaluation. Given a field \F multivariate polynomial f ∈ \F[X_1,…,X_n] and an evaluation domain \set B = \set B_1 × ⋯ × \set B_n with \set B_i \subseteq \F. Typically the evaluation domain is a unit hypercube, \set B = \ps{0,1}^n, but I'll present it generically here.

The goal is to reduce proving

h = \sum_{\vec x ∈ \set B} f\p{\vec x}

to proving f(\vec r) = c for some \vec r ∈ \F^n and c ∈ \F. This can then be proven by other means, for example by an opening of a polynomial commitment. The protocol proceeds as follows:

  1. Prover sends f_1 ∈ \F[X], a univariate polynomial such that f_1(X) = \sum_{\vec x ∈ \set B_2×⋯×\set B_n} f\p{X, \vec x}
  2. Verifier sends a random r_1 ∈ \F and checks that \deg f_1 = \deg_1 f and computes h = \sum_{x ∈ \set B_1} f_1\p{x}
  3. Prover sends f_2(X) where f_2(X) = \sum_{\vec x ∈ \set B_3×⋯×\set B_n} f\p{r_1, X, \vec x}
  4. Verifier sends a random r_2 ∈ \F and checks that \deg f_2 = \deg_2 f and f_1(r_1) = \sum_{x ∈ \set B_2} f_2\p{x}
  5. The prover and verifier repeat this process n-3 more times.
  6. Prover sends f_n where f_n(X) = f\p{\vec r, X}
  7. Verifier sends a random r_n ∈ \F and computes c = f_n(r_n).
  8. The prover and verifier use further methods to check c = f(\vec r)

See Justin Thaler's notes on Sum-Check T20a, T23a.

Q: Does this generalize to other reductions than addition? What is it about addition that makes this work?

Q: Does this generalize to other projections than on an axis? Would a sequence of linear independent lines \F → \F^n work?

Analysis

Soundness. (TODO: check) Soundness O\p{\frac {b ⋅ n}{|\F|}} where b = \max_i |\set B_i|. See also Thaler's note on small fields T23b.

Communication. There are n rounds in the protocol and at each round i the prover sends a \deg_i f univariate polynomial. Which in general takes \sum_i \p{\deg_i f + 1} elements of \F total. In each round the prover sends a random \F, so n elements of \F total. Assuming

Prover. TODO: Prover complexity, especially it being better than the k⋅\log k (in terms of number of values committed) that univariate proving systems have.

Verifier. In each round i the verifier computes a sum over |\set B_i| evaluations of a \deg_i f degree polynomial. With direct evaluation this takes \p{|\set B_i|⋅\p{1 + \deg_i f} - 1} ⋅ \add + |\set B_i| ⋅ \deg_i f ⋅ \mul but there are more efficient algorithms for polynomial multi-evaluation. It is also that likely \set B_i and f have additional structure that can be exploited in a particular application.

Multi-linear functions

A multilinear function is a multivariate polynomial such that no variable occurs with degree higher than one. In other words, when considering a single variable and holding the rest constant the function is linear.

The equality function

A particularly useful multilinear function is the equality function:

\eq(\vec x, \vec y) = \prod_{i∈[0,n)}\p{x_i⋅y_i + (1 - x_i)⋅(1 - y_i)}

It is called the equality function because it is the indicator function for equality of two vectors on the unit hypercube:

\∀_{\vec x, \vec y ∈ \{0,1\}^n} \eq(\vec x, \vec y) = \begin{cases} 1 & \vec x = \vec y \\ 0 & \vec x ≠ \vec y \end{cases}

It is symmetrical under swapping arguments and permuting the vectors, i.e. given a permutation matrix P:

\eq(\vec x, \vec y) = \eq(\vec y, \vec x) = \eq(P⋅\vec x, P⋅\vec y)

It can be efficiently evaluated using the product formula above, requiring the following operation in \F:

(3⋅n-1) ⋅ \mul + n⋅\add + 2⋅n⋅\addc

Addendum. It actually can be rewritten into one multiplication x_i⋅y_i + (1 - x_i)⋅(1 - y_i) = 1 - x_i - y_i + 2⋅x_i⋅y_i

The hypercube basis

The equality function makes it easy to define basis functions for the corners of the hypercube. Given a corner \vec c ∈ \{0,1\}^n the function \eq(\vec c, \vec x) is 1 on \vec c and 0 on all other corners. The evaluation of this function simplifies to \eq(\vec c, \vec x) = \prod_{i∈[0,n)}\begin{cases} x_i & c_i = 1 \\ 1 - x_i & c_i = 0 \end{cases}

This simplifies the number of operations in \F to: (n - 1)⋅\mul + (n - \popcount(\vec c))⋅\addc

If we need the values \eq(\vec c, \vec r) for all \vec c ∈ \{0,1\}^n and some arbitrary \vec r we can compute these efficiently by building up one dimension at a time. Taking \bar{r_i} = 1 - r_i

\begin{aligned} x_0 &= \bar{r_0} &\quad x_{00} &= x_0 ⋅ \bar{r_1} &\quad x_{000} &= x_{00} ⋅ \bar{r_2} & \quad&⋯\\ x_1 &= r_0 & x_{01} &= x_0 ⋅ r_1 & x_{001} &= x_{00} ⋅ r_2 & &⋯\\ & & x_{10} &= x_1 ⋅ \bar{r_1} & x_{010} &= x_{01} ⋅ \bar{r_2} & &⋯\\ & & x_{11} &= x_1 ⋅ r_1 & x_{011} &= x_{01} ⋅ r_2 & &⋯\\ & & & & x_{100} &= x_{10} ⋅ \bar{r_2} & &⋯\\ & & & & x_{100} &= x_{11} ⋅ r_2 & &⋯\\ & & & & &\ \ ⋮ & &⋱\\ \end{aligned}

This can be done in-place and takes \p{2^{n+1} - 4}⋅\mul + n⋅\addc operations in \F. See VSBW13, T17.

This method generalizes to any function that factors into the form f(\vec x) = \prod_{i∈[0,n)} f_i(x_i) where we get the same number of multiplications and also 2⋅n evaluations of f_i in 0, 1. If we want the sum of all these f(\vec c) values we only need \p{n-1}⋅\mul + n⋅\add operations using:

\sum_{x ∈ \ps{0,1}^n} \prod_{i∈[0,n)} f_i(x_i) = \prod_{i∈[0,n)}\p{f_i(0) + f_i(1)}

Both these methods generalize to arbitrary product domains \set B = \set B_0 × ⋯ × \set B_{n-1}, e.g.

\sum_{\vec x ∈ \set B} \prod_{i∈[0,n)} f_i(x_i) = \prod_{i∈[0,n)} \sum_{x ∈ \set B_i} f_i(x)

Q. Would it be posible to compute it in O(n) time with O(1) memory if we permute the order of elements to be a Gray code?

Multilinear extensions (MLEs)

Given k = 2^n values in \F we can store them at the corners of a n dimensional hypercube \{0,1\}^n. Take some ordering of the corners \vec c_i and values y_i then the multilinear function interpolating these values is

f(\vec x) = \sum_{i ∈ [0,k)} y_i ⋅ \eq(\vec c_i, \vec x)

Directly evaluating this takes O\p{k⋅\log k} operations (see CTY11, T17). This can be reduced to O\p{k} using k memory by first computing all \eq(\vec c, \vec x) values using the method above.

Hyperrectangles

We have focussed on the unit hypercube, \{0,1\}^n, so far. Another interesting basis is the symmetric hypercube \{-1, 1\}. And really this should generalize straightforward with non-degenerate affine transformations.

Note. Wikipedia has an interesting remark that transforming between coefficient and evaluation form on the unit hypercube is a Möbius transform, and for the symmetric hypercube a Hadamard transform.

Note. Bit equality indicator in \{0,1\} vs \{-1,1\}: \begin{aligned} 1 - a - b + 2⋅a⋅b &&\quad& \frac{a⋅b + 1}{2} \end{aligned} in the \operatorname{eq} usecase the \frac12 factor can be moved out of the product, and likely absorbed somewhere else entirely. So this seems beneficial. It does not appear possible for any basis to do better than a⋅b + 1 as neither addition or multiplication alone is sufficient.

Zero testing

Sumcheck can be used to prove f(\vec x) = 0 for all \vec x ∈ \{0,1\}^n. To do this, we first construct a function g(\vec x) that is a multilinear extension of f resricted to the unit cube: g(\vec x) = \sum_{\vec y ∈ \{0,1\}^n} \eq(\vec y, \vec x) ⋅ f(\vec y)

Since the interpolation is unique, g(\vec x) is the zero polynomial if and only if f is zero on the unit cube. By Schwartz-Zippel if g(\vec r) = 0 for a random \vec r ∈\F^n then it is likely the zero polynomial. Thus, to proof our claim we do the following:

  1. Verifier send a random \vec r ∈ \F^n.
  2. Prover and verifier use sumcheck on the sum 0 = g(\vec r) = \sum_{\vec y ∈ \ps{0,1}^n} \eq(\vec y, \vec r) ⋅ f(\vec y)
  3. The prover and verifier use further methods to check \eq(\vec r', \vec r) ⋅ f(\vec r') = c

Reduction to one query

Suppose we have a polynomial f: \F^n → \F and we want to prove f(\vec x_1) = y_1 and f(\vec x_2) = y_2. We can reduce this to a single evaluation. Create a linear function l: \F → \F^n such that l(0) = \vec x_1 and l(1) = \vec x_2:

l(x) = \vec x_1 + x⋅(\vec x_2 - \vec x_1)

  1. Prover sends univariate polynomial f'(x) = f(l(x)).
  2. Verifier sends random r ∈ \F and checks f'(0) = y_1 and f'(1) = y_2.
  3. The prover and verifier use further methods to check f'(r) = f(l(r))

This last claim is of the form f(\vec x) = y.

Q. What is the size of f'? In T17 it is stated to be at most degree n, but this could be a special case.

If f is multilinear then f' is degree at most n. This generalizes to k evaluation points with l interpolating l(0)=\vec x_0, l(1) = \vec x_1, …, l(k-1) = x_{k-1}. And \deg f' at most \p{k-1}⋅ n.

Alternative using linear combinations

See CFS17 and XZZ⁺19.

  1. Verifier sends random r ∈ 𝔽.
  2. Prover sends

Zero-knowledge

Note. This still leaks a single evaluation of f.

The sum-check protocol by itself is not zero-knowledge, but can be made such. In CFS17 the prover does this using a random masking polynomial. Starting with the prover claiming h = \sum_{\vec x ∈ \set B} f\p{\vec x}.

  1. Prover picks a random g ∈ \F[X_1,…,X_n] of equal degrees as f and computes h' = \sum_{\vec x ∈ \set B} g\p{\vec x}
  2. Prover sends h' and a commitment to g.
  3. Verifier sends random ρ ∈ 𝔽.
  4. The prover and verifier use the sumcheck protocol on h + ρ ⋅ h' = \sum_{\vec x ∈ \set B} f\p{\vec x} + ρ ⋅ g\p{\vec x} to reduce it to c + ρ ⋅ c' = f\p{\vec r} + ρ ⋅ g\p{\vec r}
  5. Prover decommits c' = g\p{\vec r}.
  6. The prover and verifier now continue with the claim c = f\p{\vec r}.

The masking polynomial g has \prod_i(1 + \deg_i f) random coefficients, which is as large as the original f. But often f has special structure that allows much faster evaluation than an arbitrary polynomial. But g being arbitrary ruins these often crucial optimizations. In XZZ⁺19 (aka Libra) this is addressed by realizing that a smaller g is sufficient. In particular g(\vec x) = g_1(x_1) + g_2(x_2) + ⋯ + g_n(x_n) where g_i ∈ 𝔽[X] are random with \deg g_i = \deg_{i} f is sufficient. Note that the constant term only has to be generated once, resulting in only 1 + \sum_i \deg_i f random coefficients.

References

Remco Bloemen
Math & Engineering
https://2π.com