2023-09-19

Discrete Convolutions

$\gdef\delim#1#2#3{\mathopen{}\mathclose{\left#1 #2 \right#3}} \gdef\p#1{\delim({#1})} \gdef\Mod#1{\ \operatorname{mod}\ #1} \gdef\F{\mathbb F} \gdef\M{\mathsf{\small M}} \gdef\A{\mathsf{\small A}} \gdef\Mc{\mathsf{\small M^c}} \gdef\Ac{\mathsf{\small A^c}} \gdef\NTT{\mathsf{\small NTT}} \gdef\LCV{\mathsf{\small LCV}} \gdef\CCV{\mathsf{\small CCV}} \gdef\NCV{\mathsf{\small NCV}} \gdef\CRT{\mathsf{\small CRT}} \gdef\ω{\text{ω}} \gdef\O{\mathcal{O}} \gdef\MF{\mathrm{F}} \gdef\T{\mathsf{T}} \gdef\E{\mathrm{E}} \gdef\diag{\operatorname{diag}} \gdef\I{\operatorname{I}}$

Given two sequence $a_i$ and $b_i$ their discrete convolution, denoted $a * b$ , is a sequence $c_i$ such that

$c_i = \sum_j a_j ⋅ b_{i - j}$

In practice the sequences are finite and variations of convolution depend on how the boundary is handled. In a linear convolution the sequences are zero-extended where missing. In a cyclic convolution $a$ , $b$ and $c$ have the same length $n$ and indices are taken modulo $n$ , i.e. they are periodic. If $b$ is periodic except for the sign changing every period it is a negacyclic convolution. Finally, correlation, denoted $c = a \star b$ , is the same but with $b$ reversed, i.e. $c_i = \sum_j a_j ⋅ b_{i + j}$ . In linear algebra, linear and cyclic convolution are equivalent to multiplying by Toeplitz or circulant matrix respectively. In digital signal processing, linear convolutions are known as finite impulse response filters. In machine learning, convolutional neural networks make use of linear convolutions. They also show up in many FFT and NTT algorithms.

In algebra, discrete convolutions appears as polynomial multiplication. Given $A, B, C ∈ \F[X]$ such that $C(X) = A(X) \cdot B(X)$ . Write all three coefficient form like $A(x) = a_0 + a_1 \cdot X + a_2\cdot X^2 \dots$ then $c = a * b$ . This is the core of the large integer multiplication problem. To see this consider that a number like $123$ equals the polynomial $3 + 2 \cdot X + 1 \cdot X^2$ evaluated at $X=10$ . To multiply two large numbers one multiplies the polynomials and then does carry propagation. The resulting inner multiplications will have argument of size equal to the chosen base. When this base is itself large the method can be employed recursively.

Using polynomials (quotient) rings we can concisely define linear ( $\LCV$ ), cyclic ( $\CCV$ ) and negacyclic ( $\NCV$ ) convolution as multiplication in $\F[X]$ , $\F[X]/\p{X^n -1}$ and $\F[X]/\p{X^n +1}$ respectively:

$\begin{aligned} &\LCV &\hspace{3em} C(X) &= A(X) ⋅ B(X) \\ &\CCV & C(X) &= A(X) ⋅ B(X) \Mod{X^n - 1} \\ &\NCV & C(X) &= A(X) ⋅ B(X) \Mod{X^n + 1} \\ \end{aligned}$

A direct computation of $n\timesm$ linear convolution takes $\p{n⋅m -n -m +1}⋅\A + n⋅m⋅\M$ . For a (nega) cyclic convolution this is $n⋅\p{n - 1}⋅\A + n^2⋅\M$ .

Toom-Cook methods

Polynomials can be represented in many bases. The most common is the monomial basis, i.e. coefficient form. But they can also be represented by their evaluations on an (arbitrary) set of points $\{x_i\}$ , called a Lagrange basis. The Lagrange bases have the unique advantage that polynomial multiplication becomes a simple element-wise product:

$C(x_i) = A(x_i) \cdot B(x_i)$

Computing this takes only $n$ multiplication operations, where $n$ is the number of terms in $C$ . To use it for convolutions we also need to convert between monomial basis and lagrange basis (i.e. coefficients and evaluations). This is a basis transformation in the linear algebra sense, so we can represent it as a so-called Vandermonde matrix $V$ :

$V = \begin{bmatrix} 1 & x_0 & x_0^2 & ⋯ \\[.7em] 1 & x_1 & x_1^2 & ⋯ \\[.7em] 1 & x_2 & x_2^2 & ⋯ \\[.7em] ⋮ & ⋮ & ⋮ & ⋱ \end{bmatrix}$

We can now write the Toom-Cook convolution algorithm using matrix-vector products $\cdot$ and element-wise products $\circ$ :

$c = V^{-1} ⋅\p{\p{V⋅a} ∘ \p{V⋅b}}$

The evaluation points $\{x_i\}$ are arbitrary, so which points do we pick to make computations as easy as possible? Some great first candidates are $\{0, 1, -1\}$ , which respectively result in the first coefficient, the sum of the coefficients and the alternating sum of the coefficients; no multiplications required.

Point at infinity

Convolutions are symmetrical under reversing all the sequences. In the polynomial version this means reversing the order of the coefficients, or equivalently $P'(X) = P(X^{-1})⋅X^{n-1}$ . Each previous evaluation point has a corresponding point $x' = x^{-1}$ in the reverse polynomials, except for $0$ which has no inverse. Instead, we get a new evaluation point $x'= 0$ that has no pre-image. Evaluating at the new point $P'(0)$ gives $p_{n-1}$ much like $P(0) = p_0$ . These are both trivial to compute, so we like to add both to our evaluation set. Somewhat informally we refer to the $x'=0$ point as the 'point at infinity' $x=\infty$ . Fortunately there is an easy way to add it to a Vandermonde matrix: coefficient reversal is identical to reversing a row in the matrix, so the row corresponding to $x=\infty$ is the row corresponding $x=0$ reversed, i.e. $\begin{bmatrix} 0 & 0 & ⋯ & 1 \end{bmatrix}$ .

Karatsuba

Consider a $2\times2$ linear convolution. The result is $c = [a_0\!⋅\!b_0,\ a_0\!⋅\!b_1\!+\!a_1\!⋅\!b_0,\ a_1\!⋅\!b_1]$ which takes four multiplications when evaluated directly. If we pick the basis $\{0, 1, ∞\}$ we end up with the following algorithm for a $2\times2$ linear convolution:

$\begin{aligned} V &= \begin{bmatrix} 1 & 0 & 0 \\ 1 & 1 & 1 \\ 0 & 0 & 1 \\ \end{bmatrix} &\hspace{2em} V^{-1} &= \begin{bmatrix} 1 & 0 & 0 \\ -1 & 1 & -1 \\ 0 & 0 & 1 \\ \end{bmatrix} & \end{aligned}$

$\begin{aligned} \begin{bmatrix} c_0 \\ c_1 \\ c_2 \end{bmatrix} &= \begin{bmatrix} 1 & 0 & 0 \\ -1 & 1 & -1 \\ 0 & 0 & 1 \\ \end{bmatrix} ⋅ \p{ \p{ \begin{bmatrix} 1 & 0 \\ 1 & 1 \\ 0 & 1 \\ \end{bmatrix} ⋅ \begin{bmatrix} a_0 \\ a_1\end{bmatrix} } ∘ \p{ \begin{bmatrix} 1 & 0 \\ 1 & 1 \\ 0 & 1 \\ \end{bmatrix} ⋅ \begin{bmatrix} b_0 \\ b_1\end{bmatrix} } } \end{aligned}$

Note that when we trim the $V$ matrix we should keep the $1$ in the last column for the point at infinity. Writing out all the multiplications we get

$\begin{aligned} y_0 &= a_0 ⋅ b_0 &\hspace{2em} c_0 &= y_0\\ y_1 &= \p{a_0 + a_1} ⋅ \p{b_0 + b_1} & c_1 &= y_1 - y_0 - y_2\\ y_2 &= a_1 ⋅ b_1 & c_2 &= y_2 \\ \end{aligned}$

We now have three multiplications instead of four, though we went from one addition to four. This specific instance of Toom-Cook is known as the Karatsuba algorithm. It is the core of the first sub-quadratic method for multiplication and led to the development of Toom-Cook and other methods.

Toom-3

For a $3\times3$ linear convolution we can use the points $\{0,1,-1,-2,∞\}$ , these are all the easy ones with the addition of $-2$ . The choice of $-2$ leads to further optimizations due to Bod07.

$\begin{aligned} V &= {\small\begin{bmatrix} 1 & 0 & 0 & 0 & 0 \\[.2em] 1 & 1 & 1 & 1 & 1 \\[.2em] 1 & -1 & 1 & -1 & 1 \\[.2em] 1 & -2 & 4 & -8 & 16 \\[.2em] 0 & 0 & 0 & 0 & 1 \\ \end{bmatrix}} &\hspace{2em} V^{-1} &= {\small\begin{bmatrix} 1 & 0 & 0 & 0 & 0 \\[.2em] \frac{1}{2} & \frac{1}{3} & -1 & \frac{1}{6} & -2 \\[.2em] -1 & \frac{1}{2} & \frac{1}{2} & 0 & -1 \\[.2em] \frac{1}{2} & -2 & -\frac{1}{2} & -\frac{1}{6} & 2 \\[.2em] 0 & 0 & 0 & 0 & 1 \\ \end{bmatrix}} & \end{aligned}$

$\begin{aligned} \begin{bmatrix} c_0 \\ c_1 \\ c_2 \\ c_3 \\ c_4 \end{bmatrix} &= V^{-1} ⋅ \p{ \p{ {\small\begin{bmatrix} 1 & 0 & 0 \\ 1 & 1 & 1 \\ 1 & -1 & 1 \\ 1 & -2 & 4 \\ 0 & 0 & 1 \\ \end{bmatrix}} ⋅ \begin{bmatrix} a_0 \\ a_1 \\ a_2 \end{bmatrix} } ∘ \p{ {\small\begin{bmatrix} 1 & 0 & 0 \\ 1 & 1 & 1 \\ 1 & -1 & 1 \\ 1 & -2 & 4 \\ 0 & 0 & 1 \\ \end{bmatrix}} ⋅ \begin{bmatrix} b_0 \\ b_1 \\ b_2 \end{bmatrix} } } \end{aligned}$

Both $V$ and $V^{-1}$ allow for more efficient evaluation than direct. Take $t = a_0 + a_2$ then $a' = V\cdota$ can be computed as

$\begin{aligned} a_0' &= a_0 \\ a_1' &= t + a_1 \\ a_2' &= t - a_1 \\ a_3' &= 2⋅\p{a_2' + a_2} - a_0 \\ a_4' &= a_2 \\ \end{aligned}$

using add for doubling, this gives $5⋅\A$ instead of $7⋅\A + \Mc$ . Similarly $V^{-1} ⋅ y$ can be computed in $9⋅\A + 3⋅\Mc$ instead of $14⋅\A + 9⋅\Mc$ using

$\begin{split} \begin{aligned} t_1 &= \frac 12 ⋅ \p{y_1 - y_2} \\ t_2 &= y_2 - y_0 \\ t_3 &= \frac 13 ⋅ \p{y_3 - y_1} \end{aligned} \end{split} \hspace{2em} \begin{split} \begin{aligned} c_0 &= y_0 \\ c_2 &= t_2 + t_1 - y_4 \\ c_3 &= \frac 12 ⋅ \p{t_2 + t_3} + 2⋅y_4 \\ c_1 &= t_1 - c_3 \\ c_4 &= y_4 \\ \end{aligned} \end{split}$

This brings the total to $19⋅\A + 3⋅\Mc + 5⋅\M$ compared to $4⋅\A + 9⋅\M$ for direct convolution.

Question. Are there algorithms to find the optimal evaluation for a given matrix?

For larger sizes GMP uses the point set $\{0, ∞, +1, -1, \pm 2^i, \pm 2^{-i} \}$ . This gives some implementation optimization opportunities, but doing the evaluation/interpolation step efficiently becomes increasingly difficult for larger sets. Without evaluation tricks this would result in $\p{}⋅\A + \p{}⋅\Mc + \p{2⋅n - 1}⋅\M$ for an $n\timesn$ linear convolution.

$\Mc$ $\p{2⋅n - 5}⋅\p{n - 1}$ for $V$ . $\p{2⋅n - 3}⋅(2⋅n - 1)$ for $V^{-1}$ .

Convolution theorem

If we pick a $n$ -th order root of unity $\ω_n$ and uses its powers as $\{\ω_n^0, \ω_n^1,…,\ω_n^{n-1}\}$ as evaluation points, then our Vandermonde matrix $V$ also becomes a DFT Matrix and the evaluation/interpolation steps become Fourier transforms $\NTT_n$ . This result is known as the convolution theorem.

$c = \NTT_n^{-1}\p{\NTT_n\p{a} ∘ \NTT_n\p{b} }$

This was first used for very large integer multiplication with the carefully optimized Schönhage–Strassen algorithm. This algorithm was further improved leading to the Harvey-van der Hoeven algorithm which runs in $\O\p{n⋅\log n}$ .

Bilinear algorithms

We can generalize the algorithms considered so far as special cases of bilinear algorithms of the form

$c = C⋅\p{\p{A ⋅ a} ∘ \p{B ⋅ b} }$

where $A∈\F^{\ r × n_a}$ , $B∈\F^{\ r × n_b}$ and $B∈\F^{\ n_c × r}$ for some $r$ called the bilinear rank. To motivate this generalization consider the following bilinear algorithm for $3\times3$ linear convolution due to Bla10 (§5.4):

$\small \begin{bmatrix} 1 & 0 & 0 & 0 & 0 & 0 \\ -1 & -1 & 0 & 1 & 0 & 0 \\ -1 & 1 & -1 & 0 & 1 & 0 \\ 0 & -1 & -1 & 0 & 0 & 1\\ 0 & 0 & 1 & 0 & 0 & 0\\ \end{bmatrix} ⋅ \p{ \p{ \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \\ 1 & 1 & 0 \\ 1 & 0 & 1 \\ 0 & 1 & 1 \\ \end{bmatrix} ⋅ \begin{bmatrix} a_0 \\ a_1 \\ a_2 \end{bmatrix} } ∘ \p{ \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \\ 1 & 1 & 0 \\ 1 & 0 & 1 \\ 0 & 1 & 1 \\ \end{bmatrix} ⋅ \begin{bmatrix} b_0 \\ b_1 \\ b_2 \end{bmatrix} } }$

It has one rank higher than Toom-3, but at $13⋅\A + 6⋅\M$ it has fewer additions and no multiplications by constants. It can also not be formulated as evaluation on some set of points. (Question. Can it be formulated as a Winograd factorization?)

Usually A, B and C are sparse by construction. The number of operations required for such matrices are at most

$\p{m - n} ⋅ \A + k⋅\Mc$

where $n$ is the number of rows, $m$ is the number of non-zero elements and $k$ the number of elements not equal to $0,1$ or $-1$ . Note that this is an upper bound, as in the example of

The Matrix Exchange Theorem

In Toom-Cook we have nice $A$ and $B$ matrices, but $C$ becomes unwieldy. Fortunately, there is a trick to swap $C$ with $A$ or $B$ . This is useful as we can often precompute $A\cdota$ or $B\cdotb$ . Without loss of generality we'll assume swapping $C$ and $B$ .

Start with the observation that $x ∘ y = \diag\p{x}⋅y$ where $\diag : \F^{\,n} →\F^{\,n×n}$ is the vector-to-diagonal-matrix operator. Furthermore, given matrices $A,B$ and diagonal matrix $D$ we have $A⋅D⋅B = \p{B⋅\E}^{\T}⋅D⋅\p{\E⋅A}^{\T}$ where $\E$ is the exchange matrix (see Bla10 5.6.1). (To do. This requires further restrictions such that the matrix is persymmetric) Note that for any matrix $M$ , $\E⋅M$ is just $M$ with the rows in reverse order and $M⋅\E$ has the columns in reversed. Applying this to our bilinear algorithm results in

$\begin{aligned} c &= C ⋅\p{\p{A⋅a} ∘ \p{B⋅b}} \\ &= C ⋅\diag\p{A⋅a} ⋅ B ⋅ b \\ &= \p{B⋅\E}^{\T}⋅\diag\p{A⋅a} ⋅\p{\E ⋅ C}^{\T} ⋅b \\ &= \p{B⋅\E}^{\T}⋅\p{ \p{A⋅a} ∘ \p{\p{\E ⋅ C}^{\T} ⋅b }} \\ \end{aligned}$

thus we have a new bilinear algorithm where $B$ and $C$ are swapped, transposed and have their rows/columns reversed.

Question. Does this apply to all kinds of convolutions?

Correlations

Given an algorithm $(A, B, C)$ for linear convolution, an algorithm for correlation is constructed by $(A, C, B)$ . This is the Matrix Interchange theorem in Bla10.

Two-dimensional convolutions

Given a $n\timesm$ convolution $(A_1, B_1, C_1)$ and a $k\timesl$ convolution $(A_2, B_2, C_2)$ we can construct a two-dimensional $(n,k)\times(m,l)$ convolution as

$\p{A_1⊗A_2, B_1⊗B_2, C_1⊗C_2}$

where the inputs are vectorized with inner dimensions $k,l$ , i.e. $(i,j)\mapsto i⋅k +j$ and $i\cdotl + j$ respectively. This result holds for linear, cyclic and negacyclic convolutions.

Question. Does this hold for general polynomial product mod m? Does this hold in general for bilinear maps?

Overlap-add

Furthermore, we can use this to construct a $(n\cdotk)\times(m\cdotl)$ linear convolution by recombining the outputs.

$\begin{aligned} P(X) &= P_0(X) + P_1(X) ⋅ X^2 \\ Q(X) &= Q_0(X) + Q_1(X) ⋅ X^2 \end{aligned}$

$P(X)\cdotQ(X) =$

$Q$

Agrawal-Cooley

Modular reduction

Given a polynomial $P(X) = \sum_i p_i⋅X^i$ and a monic modulus $M(X) = \sum_i m_i⋅X^i$ we want to compute

$Q(X) = P(X) \Mod M(X)$

$\begin{bmatrix} q_0 \\ q_1 \\ ⋮ \\ q_{m-1} \end{bmatrix} \left[ \begin{array}{c|c} \I_m & \begin{array}{ccc} m_0 & m_{m-2} & m_{m-3} & ⋯ & \\ m_1 & m_0 & m_{m-2} & ⋯ & \\ m_2 & m_1 & m_0 \\ ⋮ & ⋮ & ⋮ & ⋱ \\ m_{m-2} & m_{m-3} & m_{m-4} & ⋯ & \end{array} \end{array} \right] ⋅ \begin{bmatrix} p_0 \\ p_1 \\ ⋮ \\ p_{n-1} \end{bmatrix}$

$\left[ \begin{array}{c|c} \I_m & \operatorname{Toeplitz}\p{\begin{bmatrix} m_0 & m_1 & ⋯ & m_{m-2} \end{bmatrix}} \end{array} \right]$

Where the Toeplitz matrix is extended with however many columns necessary.

In particular this can be used to turn linear convolution algorithms into (nega)cyclic by reducing modulo $X^n \pm 1$ .

Winograd Convolution

A generalization of Toom-Cook can be made be realizing that evaluating a polynomial at a point $P(x_i)$ is the same as reducing it modulo $X - x_i$ . We are then solving the system of equations

$C(X) = A(X) ⋅ B(X) \mod{X- x_i}$

and then reconstructing $C(X)$ from the residues mod $X-x_i$ . This is a special case of the Chinese Remainder Theorem ( $\CRT$ ). Generalizing we pick a set of coprime moduli polynomials $\{M_i(X)\}$ , their product $M(X) = \prod_i M_i(X)$ and a CRT basis $\{E_i(X)\}$ (see appendix), compute

$C_i(X) = \p{A(X) ⋅ B(X) \Mod{M_i(X)}}$

$C(X) = \sum_i C_i(X) ⋅ E_i(X)$

If $M(X) = X^n - 1$ we have cyclic convolution, and with $M(X) = X^n + 1$ negacyclic.

If $R_i(X)$ is a multiple of $M_i(X)$ , then $E_i(X)$ is a multiple of $M(X)$ , which reflects the fact that $\CRT$ is only unique up to $\Mod{M(X)}$ .

Question. Another generalization of point evaluations is to include derivatives at those points. How does this generalization combine with CRT?

Question. Can all bilinear algorithms for convolutions be interpreted as Winograd convolutions? Does this extend to non-convolution bilinear algorithms?

Recursion

This method allows for recursion. For example to compute a $2\cdotn$ sized cyclic convolution we can pick a basis $M(X) = (X^n + 1) \cdot (X^n - 1)$ and get

$\begin{aligned} C(X) &= A(X) ⋅ B(X) \mod{X^n - 1} \\ C(X) &= A(X) ⋅ B(X) \mod{X^n + 1} \end{aligned}$

$\begin{split}\begin{aligned} E_0(X) &= R_0(X) ⋅\p{X^n + 1} \\ E_1(X) &= R_1(X) ⋅\p{X^n - 1} \end{aligned}\end{split}\hspace{3em} \begin{split}\begin{aligned} R_0(X) &= \p{X^n + 1}^{-1} \mod{X^n - 1} \\ R_1(X) &= \p{X^n - 1}^{-1} \mod{X^n + 1} \end{aligned}\end{split}$

$X^n + 1 \Mod\p{X^n - 1} = 2$

$X^n - 1 \Mod\p{X^n + 1} = -2$

So $R_0(X) = \frac 12$ and $R_1(X) = -\frac 12$ .

$\begin{aligned} E_0(X) &= \frac 12 ⋅\p{X^n + 1} &\hspace{2em} E_1(X) &= -\frac 12 ⋅\p{X^n - 1} \end{aligned}$

Re-introducing the point at infinity

We can add a $M_i(X) = X - \infty$ .

$X^{n_c - 1} ⋅ C(X^{-1}) = \p{X^{n_a - 1} ⋅ A(X^{-1})} ⋅ \p{X^{n_b - 1} ⋅ B(X^{-1})} \mod{X - 0}$

Larger convolutions

Nested Convolutions

Agarwal-Cooley

Lower bounds

Linear convolution can not be computed with fewer than $\p{n_a + n_b - 1}⋅\M$ (see Bla10 thm 5.8.3)

The polynomial product $A(X)⋅B(X) \Mod{M(X)}$ can not be computed with less than $\p{2⋅n - t}⋅\M$ where $t$ is the number of prime factors of $M$ in $\F[X]$ . (see Bla10 thm 5.8.5)

Specifically, cyclic convolution can not be computed with fewer then $\p{2⋅n - t}⋅\M$ where $t$ is the number of prime factors of $X^n -1$ in $\F[X]$ .

References

JS20 Ju, Solomonik (2020). Derivation and Analysis of Fast Bilinear Algorithms for Convolution. Github

Bod07 Bodrato (2007). Towards Optimal Toom-Cook Multiplication for Univariate and Multivariate Polynomials in Characteristic 2 and 0.

Bla10 Blahut (2010). Fast Algorithms for Signal Processing.

TAL97 Tolimieri, An, Lu (1997). Algorithms for Discrete Fourier Transform and Convolution.

Appendix: Solving the Chinese Remainder Theorem

Given a set of coprime moduli $m_i$ and values $x_i$ such that $x_i = x \Mod{m_i}$ for some $x$ , we want to reconstruct $x \Mod{m}$ where $m = \prod_i m_i$ . To do this we will construct a basis $e_i$ similar to Lagrange basis such that

$x = \sum_i x_i ⋅ e_i \quad \Mod{m}$

where $e_i \Mod{m_j}$ equals $0$ when $i\neq1$ and $1$ when $i=j$ . The former implies $e_i = r_i ⋅ \prod_{i≠j} m_j$ for some $r_j$ . The latter implies

$r_i = \Bigl(\prod_{i≠j} m_j\Bigr)^{-1} \quad\Mod{m_i}$

There are multiple solutions $r_i' = r_i + k\cdotm_i$ and this reflects multiple solutions $x'= x + k\cdotm$ . By convention the smallest is used. We can write the CRT basis concisely using $[-]_m$ for modular reduction as

$e_i = \frac{m}{m_i}⋅\left[\frac{m_i}{m}\right]_{m_i}$

If the moduli are not coprime we can correct with $m = \operatorname{lcm}\p{m_i}$ . (Question does the above solution using $\frac{m}{m_i}$ still hold?) Note that in this case solutions may not exist. For example if $m_0$ and $m_1$ have a common factor $d$ and $x_0 ≠ x_1 \Mod d$ there is no solution. (Question what does the above solution produce in this case?).

The CRT establishes a ring isomorphism

$\F/m ≅ \F/m_0 × \F/m_1 × ⋯$

with the maps $x \mapsto (\left[x\right]_{m_0}, \left[x\right]_{m_1},… )$ and $(x_0, x_1, …) \mapsto x_0⋅e_0 + x_1⋅e_1+⋯$ .

Question. How does this generalize to include derivatives?

Appendix: applications to proof systems

Note the similarity between bilinear algorithms and an R1CS predicate $φ(w) ⇔ \p{A ⋅ w} ∘ \p{B ⋅ w} = C⋅w$ . If $C$ is invertible this can be restated as a bilinear algorithm $β(a,b) = C^{-1}⋅\p{\p{A ⋅ a} ∘ \p{B ⋅ b} }$ and we have $φ(w) \Leftrightarrow β(w,w) = w$ . Applying this to the matrix exchange theorem we get that a given R1CS $(A, B, C)$ is equivalent to $\p{A, \p{\E ⋅ C^{-1}}^{\T}, \p{\p{B⋅\E}^{\T}}^{-1}}$ .

Since linear convolutions is equivalent to polynomial multiplication, a polynomial commitment scheme allows for an efficient proof of convolution. To proof $c = a*b$ , proof $C(τ) = A(τ)\cdotB(τ)$ at a random point $τ∈\F$ . To proof a size $n$ (nega)cyclic convolution test that the following holds for $n$ term polynomials $A, B, C, Q$ :

$C(τ) + Q(τ)⋅\p{τ^n \pm 1}= A(τ)⋅B(τ)$

If we have a proof for reduction from FFT to convolution this could be used to proof FFTs.

Appendix: bilinear maps

Given vectors spaces $U, V, W$ over a base field $\F$ , a bilinear map is a function $f: U \times V \to W$ such that

$\begin{aligned} f(u_1 + u_2, v) &= f(u_1, v) + f(u_2, v) & f(λ⋅u, v) &= λ⋅f(u,v)\\ f(u, v_1 + v_2) &= f(u, v_1) + f(u, v_2) & f(u, λ⋅v) &= λ⋅f(u,v) \\ \end{aligned}$

Bilinear algorithms are bilinear maps, so convolutions and polynomial modular multiplication are bilinear maps. Other examples of bilinear maps are matrix multiplications and the cross product. When $W = \F$ , it is also called a bilinear form with further examples such as inner products.

Given bases a bilinear map can be written as a tensor $u, v, w$ .

$f_k(u, v) = \sum f_{ijk} ⋅ u_i ⋅ v_j$

From this a bilinear algorithm follows. For each non-zero element of $f_{ijk}$ create rows in $A$ and $B$ with all zeros except for a single $1$ in columns i and j respectively. To $C$ we add a column with the value $f_{ijk}$ in row $k$ . This has complexity $m⋅ \Mc+n⋅\M$ where $n$ is the number of nonzero entries and $m$ the number of nontrivial entries.

Appendix: Polynomial composition

Given two polynomials $P(X) = \sum_i p_i ⋅ X^i$ and $Q(X) = \sum_i q_i ⋅ X^i$ , define the composition $R(X) = P(Q(X))$ where the coefficients of $R$ are given by:

$\begin{aligned} R(X) &= P(Q(X)) \\ &= \sum_i p_i ⋅ \p{ \sum_j q_j ⋅ X^j }^i \end{aligned}$

Multinomial theorem.

$\begin{aligned} P(X)^i &= \p{ \sum_j p_j ⋅ X^j }^i &= \sum_j \frac{i!}{j_1! j_2! j_3!⋯} ⋅ p_{j_1} ⋅ p_{j_2} ⋅ ⋯ X^{j_1 + j_2 + ⋯} \end{aligned}$

$\begin{aligned} P(X)^i &= \p{ \sum_j q_j ⋅ X^j }^i &= \sum_j \frac{i!}{j_1! j_2! j_3!⋯} ⋅ \end{aligned}$

To do

https://arxiv.org/pdf/2201.10369.pdf

https://cr.yp.to/lineartime/multapps-20080515.pdf

https://eprint.iacr.org/2016/504.pdf page 5 describes how to turn convolution theorem into negacyclic.

Fast modular composition. p. 338 in Modern Computer Algebra.

https://relate.cs.illinois.edu/course/cs598evs-f20/ https://relate.cs.illinois.edu/course/cs598evs-f20/f/lectures/03-lecture.pdf

https://arxiv.org/abs/1707.04618

https://www.nature.com/articles/s41586-022-05172-4

https://doc.sagemath.org/html/en/reference/polynomial_rings/sage/rings/polynomial/convolution.html https://jucaleb4.github.io/files/conv_ba.pdf

https://jucaleb4.github.io/files/poster1.pdf