Discrete Convolutions
\gdef\delim#1#2#3{\mathopen{}\mathclose{\left#1 #2 \right#3}} \gdef\p#1{\delim({#1})} \gdef\Mod#1{\ \operatorname{mod}\ #1} \gdef\F{\mathbb F} \gdef\M{\mathsf{\small M}} \gdef\A{\mathsf{\small A}} \gdef\Mc{\mathsf{\small M^c}} \gdef\Ac{\mathsf{\small A^c}} \gdef\NTT{\mathsf{\small NTT}} \gdef\LCV{\mathsf{\small LCV}} \gdef\CCV{\mathsf{\small CCV}} \gdef\NCV{\mathsf{\small NCV}} \gdef\CRT{\mathsf{\small CRT}} \gdef\ω{\text{ω}} \gdef\O{\mathcal{O}} \gdef\MF{\mathrm{F}} \gdef\T{\mathsf{T}} \gdef\E{\mathrm{E}} \gdef\diag{\operatorname{diag}} \gdef\I{\operatorname{I}}
Given two sequence a_i and b_i their discrete convolution, denoted a * b, is a sequence c_i such that
c_i = \sum_j a_j ⋅ b_{i - j}
In practice the sequences are finite and variations of convolution depend on how the boundary is handled. In a linear convolution the sequences are zero-extended where missing. In a cyclic convolution a, b and c have the same length n and indices are taken modulo n, i.e. they are periodic. If b is periodic except for the sign changing every period it is a negacyclic convolution. Finally, correlation, denoted c = a \star b, is the same but with b reversed, i.e. c_i = \sum_j a_j ⋅ b_{i + j}. In linear algebra, linear and cyclic convolution are equivalent to multiplying by Toeplitz or circulant matrix respectively. In digital signal processing, linear convolutions are known as finite impulse response filters. In machine learning, convolutional neural networks make use of linear convolutions. They also show up in many FFT and NTT algorithms.
In algebra, discrete convolutions appears as polynomial multiplication. Given A, B, C ∈ \F[X] such that C(X) = A(X) ⋅ B(X). Write all three coefficient form like A(x) = a_0 + a_1 ⋅ X + a_2⋅ X^2 ⋯ then c = a * b. This is the core of the large integer multiplication problem. To see this consider that a number like 123 equals the polynomial 3 + 2 ⋅ X + 1 ⋅ X^2 evaluated at X=10. To multiply two large numbers one multiplies the polynomials and then does carry propagation. The resulting inner multiplications will have argument of size equal to the chosen base. When this base is itself large the method can be employed recursively.
Using polynomials (quotient) rings we can concisely define linear (\LCV), cyclic (\CCV) and negacyclic (\NCV) convolution as multiplication in \F[X], \F[X]/\p{X^n -1} and \F[X]/\p{X^n +1} respectively:
\begin{aligned} &\LCV &\hspace{3em} C(X) &= A(X) ⋅ B(X) \\ &\CCV & C(X) &= A(X) ⋅ B(X) \Mod{X^n - 1} \\ &\NCV & C(X) &= A(X) ⋅ B(X) \Mod{X^n + 1} \\ \end{aligned}
A direct computation of n×m linear convolution takes \p{n⋅m -n -m +1}⋅\A + n⋅m⋅\M. For a (nega) cyclic convolution this is n⋅\p{n - 1}⋅\A + n^2⋅\M.
Toom-Cook methods
Polynomials can be represented in many bases. The most common is the monomial basis, i.e. coefficient form. But they can also be represented by their evaluations on an (arbitrary) set of points \{x_i\}, called a Lagrange basis. The Lagrange bases have the unique advantage that polynomial multiplication becomes a simple element-wise product:
C(x_i) = A(x_i) ⋅ B(x_i)
Computing this takes only n multiplication operations, where n is the number of terms in C. To use it for convolutions we also need to convert between monomial basis and lagrange basis (i.e. coefficients and evaluations). This is a basis transformation in the linear algebra sense, so we can represent it as a so-called Vandermonde matrix V:
V = \begin{bmatrix} 1 & x_0 & x_0^2 & ⋯ \\[.7em] 1 & x_1 & x_1^2 & ⋯ \\[.7em] 1 & x_2 & x_2^2 & ⋯ \\[.7em] ⋮ & ⋮ & ⋮ & ⋱ \end{bmatrix}
We can now write the Toom-Cook convolution algorithm using matrix-vector products ⋅ and element-wise products ∘:
c = V^{-1} ⋅\p{\p{V⋅a} ∘ \p{V⋅b}}
The evaluation points \{x_i\} are arbitrary, so which points do we pick to make computations as easy as possible? Some great first candidates are \{0, 1, -1\}, which respectively result in the first coefficient, the sum of the coefficients and the alternating sum of the coefficients; no multiplications required.
Point at infinity
Convolutions are symmetrical under reversing all the sequences. In the polynomial version this means reversing the order of the coefficients, or equivalently P'(X) = P(X^{-1})⋅X^{n-1}. Each previous evaluation point has a corresponding point x' = x^{-1} in the reverse polynomials, except for 0 which has no inverse. Instead, we get a new evaluation point x'= 0 that has no pre-image. Evaluating at the new point P'(0) gives p_{n-1} much like P(0) = p_0. These are both trivial to compute, so we like to add both to our evaluation set. Somewhat informally we refer to the x'=0 point as the 'point at infinity' x=∞. Fortunately there is an easy way to add it to a Vandermonde matrix: coefficient reversal is identical to reversing a row in the matrix, so the row corresponding to x=∞ is the row corresponding x=0 reversed, i.e. \begin{bmatrix} 0 & 0 & ⋯ & 1 \end{bmatrix}.
Karatsuba
Consider a 2×2 linear convolution. The result is c = [a_0\!⋅\!b_0,\ a_0\!⋅\!b_1\!+\!a_1\!⋅\!b_0,\ a_1\!⋅\!b_1] which takes four multiplications when evaluated directly. If we pick the basis \{0, 1, ∞\} we end up with the following algorithm for a 2×2 linear convolution:
\begin{aligned} V &= \begin{bmatrix} 1 & 0 & 0 \\ 1 & 1 & 1 \\ 0 & 0 & 1 \\ \end{bmatrix} &\hspace{2em} V^{-1} &= \begin{bmatrix} 1 & 0 & 0 \\ -1 & 1 & -1 \\ 0 & 0 & 1 \\ \end{bmatrix} & \end{aligned}
\begin{aligned} \begin{bmatrix} c_0 \\ c_1 \\ c_2 \end{bmatrix} &= \begin{bmatrix} 1 & 0 & 0 \\ -1 & 1 & -1 \\ 0 & 0 & 1 \\ \end{bmatrix} ⋅ \p{ \p{ \begin{bmatrix} 1 & 0 \\ 1 & 1 \\ 0 & 1 \\ \end{bmatrix} ⋅ \begin{bmatrix} a_0 \\ a_1\end{bmatrix} } ∘ \p{ \begin{bmatrix} 1 & 0 \\ 1 & 1 \\ 0 & 1 \\ \end{bmatrix} ⋅ \begin{bmatrix} b_0 \\ b_1\end{bmatrix} } } \end{aligned}
Note that when we trim the V matrix we should keep the 1 in the last column for the point at infinity. Writing out all the multiplications we get
\begin{aligned} y_0 &= a_0 ⋅ b_0 &\hspace{2em} c_0 &= y_0\\ y_1 &= \p{a_0 + a_1} ⋅ \p{b_0 + b_1} & c_1 &= y_1 - y_0 - y_2\\ y_2 &= a_1 ⋅ b_1 & c_2 &= y_2 \\ \end{aligned}
We now have three multiplications instead of four, though we went from one addition to four. This specific instance of Toom-Cook is known as the Karatsuba algorithm. It is the core of the first sub-quadratic method for multiplication and led to the development of Toom-Cook and other methods.
Toom-3
For a 3×3 linear convolution we can use the points \{0,1,-1,-2,∞\}, these are all the easy ones with the addition of -2. The choice of -2 leads to further optimizations due to Bod07.
\begin{aligned} V &= {\small\begin{bmatrix} 1 & 0 & 0 & 0 & 0 \\[.2em] 1 & 1 & 1 & 1 & 1 \\[.2em] 1 & -1 & 1 & -1 & 1 \\[.2em] 1 & -2 & 4 & -8 & 16 \\[.2em] 0 & 0 & 0 & 0 & 1 \\ \end{bmatrix}} &\hspace{2em} V^{-1} &= {\small\begin{bmatrix} 1 & 0 & 0 & 0 & 0 \\[.2em] \frac{1}{2} & \frac{1}{3} & -1 & \frac{1}{6} & -2 \\[.2em] -1 & \frac{1}{2} & \frac{1}{2} & 0 & -1 \\[.2em] \frac{1}{2} & -2 & -\frac{1}{2} & -\frac{1}{6} & 2 \\[.2em] 0 & 0 & 0 & 0 & 1 \\ \end{bmatrix}} & \end{aligned}
\begin{aligned} \begin{bmatrix} c_0 \\ c_1 \\ c_2 \\ c_3 \\ c_4 \end{bmatrix} &= V^{-1} ⋅ \p{ \p{ {\small\begin{bmatrix} 1 & 0 & 0 \\ 1 & 1 & 1 \\ 1 & -1 & 1 \\ 1 & -2 & 4 \\ 0 & 0 & 1 \\ \end{bmatrix}} ⋅ \begin{bmatrix} a_0 \\ a_1 \\ a_2 \end{bmatrix} } ∘ \p{ {\small\begin{bmatrix} 1 & 0 & 0 \\ 1 & 1 & 1 \\ 1 & -1 & 1 \\ 1 & -2 & 4 \\ 0 & 0 & 1 \\ \end{bmatrix}} ⋅ \begin{bmatrix} b_0 \\ b_1 \\ b_2 \end{bmatrix} } } \end{aligned}
Both V and V^{-1} allow for more efficient evaluation than direct. Take t = a_0 + a_2 then a' = V⋅a can be computed as
\begin{aligned} a_0' &= a_0 \\ a_1' &= t + a_1 \\ a_2' &= t - a_1 \\ a_3' &= 2⋅\p{a_2' + a_2} - a_0 \\ a_4' &= a_2 \\ \end{aligned}
using add for doubling, this gives 5⋅\A instead of 7⋅\A + \Mc. Similarly V^{-1} ⋅ y can be computed in 9⋅\A + 3⋅\Mc instead of 14⋅\A + 9⋅\Mc using
\begin{split} \begin{aligned} t_1 &= \frac 12 ⋅ \p{y_1 - y_2} \\ t_2 &= y_2 - y_0 \\ t_3 &= \frac 13 ⋅ \p{y_3 - y_1} \end{aligned} \end{split} \hspace{2em} \begin{split} \begin{aligned} c_0 &= y_0 \\ c_2 &= t_2 + t_1 - y_4 \\ c_3 &= \frac 12 ⋅ \p{t_2 + t_3} + 2⋅y_4 \\ c_1 &= t_1 - c_3 \\ c_4 &= y_4 \\ \end{aligned} \end{split}
This brings the total to 19⋅\A + 3⋅\Mc + 5⋅\M compared to 4⋅\A + 9⋅\M for direct convolution.
Question. Are there algorithms to find the optimal evaluation for a given matrix?
For larger sizes GMP uses the point set \{0, ∞, +1, -1, \pm 2^i, \pm 2^{-i} \}. This gives some implementation optimization opportunities, but doing the evaluation/interpolation step efficiently becomes increasingly difficult for larger sets. Without evaluation tricks this would result in \p{}⋅\A + \p{}⋅\Mc + \p{2⋅n - 1}⋅\M for an n×n linear convolution.
\Mc \p{2⋅n - 5}⋅\p{n - 1} for V. \p{2⋅n - 3}⋅(2⋅n - 1) for V^{-1}.
Convolution theorem
If we pick a n-th order root of unity \ω_n and uses its powers as \{\ω_n^0, \ω_n^1,…,\ω_n^{n-1}\} as evaluation points, then our Vandermonde matrix V also becomes a DFT Matrix and the evaluation/interpolation steps become Fourier transforms \NTT_n. This result is known as the convolution theorem.
c = \NTT_n^{-1}\p{\NTT_n\p{a} ∘ \NTT_n\p{b} }
This was first used for very large integer multiplication with the carefully optimized Schönhage–Strassen algorithm. This algorithm was further improved leading to the Harvey-van der Hoeven algorithm which runs in \O\p{n⋅\log n}.
Bilinear algorithms
We can generalize the algorithms considered so far as special cases of bilinear algorithms of the form
c = C⋅\p{\p{A ⋅ a} ∘ \p{B ⋅ b} }
where A∈\F^{\ r × n_a}, B∈\F^{\ r × n_b} and B∈\F^{\ n_c × r} for some r called the bilinear rank. To motivate this generalization consider the following bilinear algorithm for 3×3 linear convolution due to Bla10 (§5.4):
\small \begin{bmatrix} 1 & 0 & 0 & 0 & 0 & 0 \\ -1 & -1 & 0 & 1 & 0 & 0 \\ -1 & 1 & -1 & 0 & 1 & 0 \\ 0 & -1 & -1 & 0 & 0 & 1\\ 0 & 0 & 1 & 0 & 0 & 0\\ \end{bmatrix} ⋅ \p{ \p{ \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \\ 1 & 1 & 0 \\ 1 & 0 & 1 \\ 0 & 1 & 1 \\ \end{bmatrix} ⋅ \begin{bmatrix} a_0 \\ a_1 \\ a_2 \end{bmatrix} } ∘ \p{ \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \\ 1 & 1 & 0 \\ 1 & 0 & 1 \\ 0 & 1 & 1 \\ \end{bmatrix} ⋅ \begin{bmatrix} b_0 \\ b_1 \\ b_2 \end{bmatrix} } }
It has one rank higher than Toom-3, but at 13⋅\A + 6⋅\M it has fewer additions and no multiplications by constants. It can also not be formulated as evaluation on some set of points. (Question. Can it be formulated as a Winograd factorization?)
Usually A, B and C are sparse by construction. The number of operations required for such matrices are at most
\p{m - n} ⋅ \A + k⋅\Mc
where n is the number of rows, m is the number of non-zero elements and k the number of elements not equal to 0,1 or -1. Note that this is an upper bound, as in the example of
The Matrix Exchange Theorem
In Toom-Cook we have nice A and B matrices, but C becomes unwieldy. Fortunately, there is a trick to swap C with A or B. This is useful as we can often precompute A⋅a or B⋅b. Without loss of generality we'll assume swapping C and B.
Start with the observation that x ∘ y = \diag\p{x}⋅y where \diag : \F^{\,n} →\F^{\,n×n} is the vector-to-diagonal-matrix operator. Furthermore, given matrices A,B and diagonal matrix D we have A⋅D⋅B = \p{B⋅\E}^{\T}⋅D⋅\p{\E⋅A}^{\T} where \E is the exchange matrix (see Bla10 5.6.1). (To do. This requires further restrictions such that the matrix is persymmetric) Note that for any matrix M, \E⋅M is just M with the rows in reverse order and M⋅\E has the columns in reversed. Applying this to our bilinear algorithm results in
\begin{aligned} c &= C ⋅\p{\p{A⋅a} ∘ \p{B⋅b}} \\ &= C ⋅\diag\p{A⋅a} ⋅ B ⋅ b \\ &= \p{B⋅\E}^{\T}⋅\diag\p{A⋅a} ⋅\p{\E ⋅ C}^{\T} ⋅b \\ &= \p{B⋅\E}^{\T}⋅\p{ \p{A⋅a} ∘ \p{\p{\E ⋅ C}^{\T} ⋅b }} \\ \end{aligned}
thus we have a new bilinear algorithm where B and C are swapped, transposed and have their rows/columns reversed.
Question. Does this apply to all kinds of convolutions?
Correlations
Given an algorithm (A, B, C) for linear convolution, an algorithm for correlation is constructed by (A, C, B). This is the Matrix Interchange theorem in Bla10.
Two-dimensional convolutions
Given a n×m convolution (A_1, B_1, C_1) and a k×l convolution (A_2, B_2, C_2) we can construct a two-dimensional (n,k)×(m,l) convolution as
\p{A_1⊗A_2, B_1⊗B_2, C_1⊗C_2}
where the inputs are vectorized with inner dimensions k,l, i.e. (i,j)\mapsto i⋅k +j and i⋅l + j respectively. This result holds for linear, cyclic and negacyclic convolutions.
Question. Does this hold for general polynomial product mod m? Does this hold in general for bilinear maps?
Overlap-add
Furthermore, we can use this to construct a (n⋅k)×(m⋅l) linear convolution by recombining the outputs.
\begin{aligned} P(X) &= P_0(X) + P_1(X) ⋅ X^2 \\ Q(X) &= Q_0(X) + Q_1(X) ⋅ X^2 \end{aligned}
P(X)⋅Q(X) =
Q
Agrawal-Cooley
Modular reduction
Given a polynomial P(X) = \sum_i p_i⋅X^i and a monic modulus M(X) = \sum_i m_i⋅X^i we want to compute
Q(X) = P(X) \Mod M(X)
\begin{bmatrix} q_0 \\ q_1 \\ ⋮ \\ q_{m-1} \end{bmatrix} \left[ \begin{array}{c|c} \I_m & \begin{array}{ccc} m_0 & m_{m-2} & m_{m-3} & ⋯ & \\ m_1 & m_0 & m_{m-2} & ⋯ & \\ m_2 & m_1 & m_0 \\ ⋮ & ⋮ & ⋮ & ⋱ \\ m_{m-2} & m_{m-3} & m_{m-4} & ⋯ & \end{array} \end{array} \right] ⋅ \begin{bmatrix} p_0 \\ p_1 \\ ⋮ \\ p_{n-1} \end{bmatrix}
\left[ \begin{array}{c|c} \I_m & \operatorname{Toeplitz}\p{\begin{bmatrix} m_0 & m_1 & ⋯ & m_{m-2} \end{bmatrix}} \end{array} \right]
Where the Toeplitz matrix is extended with however many columns necessary.
In particular this can be used to turn linear convolution algorithms into (nega)cyclic by reducing modulo X^n \pm 1.
Winograd Convolution
A generalization of Toom-Cook can be made be realizing that evaluating a polynomial at a point P(x_i) is the same as reducing it modulo X - x_i. We are then solving the system of equations
C(X) = A(X) ⋅ B(X) \mod{X- x_i}
and then reconstructing C(X) from the residues mod X-x_i. This is a special case of the Chinese Remainder Theorem (\CRT). Generalizing we pick a set of coprime moduli polynomials \{M_i(X)\}, their product M(X) = \prod_i M_i(X) and a CRT basis \{E_i(X)\} (see appendix), compute
C_i(X) = \p{A(X) ⋅ B(X) \Mod{M_i(X)}}
C(X) = \sum_i C_i(X) ⋅ E_i(X)
If M(X) = X^n - 1 we have cyclic convolution, and with M(X) = X^n + 1 negacyclic.
If R_i(X) is a multiple of M_i(X), then E_i(X) is a multiple of M(X), which reflects the fact that \CRT is only unique up to \Mod{M(X)}.
Question. Another generalization of point evaluations is to include derivatives at those points. How does this generalization combine with CRT?
Question. Can all bilinear algorithms for convolutions be interpreted as Winograd convolutions? Does this extend to non-convolution bilinear algorithms?
Recursion
This method allows for recursion. For example to compute a 2⋅n sized cyclic convolution we can pick a basis M(X) = (X^n + 1) ⋅ (X^n - 1) and get
\begin{aligned} C(X) &= A(X) ⋅ B(X) \mod{X^n - 1} \\ C(X) &= A(X) ⋅ B(X) \mod{X^n + 1} \end{aligned}
\begin{split}\begin{aligned} E_0(X) &= R_0(X) ⋅\p{X^n + 1} \\ E_1(X) &= R_1(X) ⋅\p{X^n - 1} \end{aligned}\end{split}\hspace{3em} \begin{split}\begin{aligned} R_0(X) &= \p{X^n + 1}^{-1} \mod{X^n - 1} \\ R_1(X) &= \p{X^n - 1}^{-1} \mod{X^n + 1} \end{aligned}\end{split}
X^n + 1 \Mod\p{X^n - 1} = 2
X^n - 1 \Mod\p{X^n + 1} = -2
So R_0(X) = \frac 12 and R_1(X) = -\frac 12.
\begin{aligned} E_0(X) &= \frac 12 ⋅\p{X^n + 1} &\hspace{2em} E_1(X) &= -\frac 12 ⋅\p{X^n - 1} \end{aligned}
Re-introducing the point at infinity
We can add a M_i(X) = X - ∞.
X^{n_c - 1} ⋅ C(X^{-1}) = \p{X^{n_a - 1} ⋅ A(X^{-1})} ⋅ \p{X^{n_b - 1} ⋅ B(X^{-1})} \mod{X - 0}
Larger convolutions
Nested Convolutions
Agarwal-Cooley
Lower bounds
Linear convolution can not be computed with fewer than \p{n_a + n_b - 1}⋅\M (see Bla10 thm 5.8.3)
The polynomial product A(X)⋅B(X) \Mod{M(X)} can not be computed with less than \p{2⋅n - t}⋅\M where t is the number of prime factors of M in \F[X]. (see Bla10 thm 5.8.5)
Specifically, cyclic convolution can not be computed with fewer then \p{2⋅n - t}⋅\M where t is the number of prime factors of X^n -1 in \F[X].
References
- JS20 Ju, Solomonik (2020). Derivation and Analysis of Fast Bilinear Algorithms for Convolution. Github
- Bod07 Bodrato (2007). Towards Optimal Toom-Cook Multiplication for Univariate and Multivariate Polynomials in Characteristic 2 and 0.
- Bla10 Blahut (2010). Fast Algorithms for Signal Processing.
- TAL97 Tolimieri, An, Lu (1997). Algorithms for Discrete Fourier Transform and Convolution.
Appendix: Solving the Chinese Remainder Theorem
Given a set of coprime moduli m_i and values x_i such that x_i = x \Mod{m_i} for some x, we want to reconstruct x \Mod{m} where m = \prod_i m_i. To do this we will construct a basis e_i similar to Lagrange basis such that
x = \sum_i x_i ⋅ e_i \quad \Mod{m}
where e_i \Mod{m_j} equals 0 when i≠1 and 1 when i=j. The former implies e_i = r_i ⋅ \prod_{i≠j} m_j for some r_j. The latter implies
r_i = \Bigl(\prod_{i≠j} m_j\Bigr)^{-1} \quad\Mod{m_i}
There are multiple solutions r_i' = r_i + k⋅m_i and this reflects multiple solutions x'= x + k⋅m. By convention the smallest is used. We can write the CRT basis concisely using [-]_m for modular reduction as
e_i = \frac{m}{m_i}⋅\left[\frac{m_i}{m}\right]_{m_i}
If the moduli are not coprime we can correct with m = \operatorname{lcm}\p{m_i}. (Question does the above solution using \frac{m}{m_i} still hold?) Note that in this case solutions may not exist. For example if m_0 and m_1 have a common factor d and x_0 ≠ x_1 \Mod d there is no solution. (Question what does the above solution produce in this case?).
The CRT establishes a ring isomorphism
\F/m ≅ \F/m_0 × \F/m_1 × ⋯
with the maps x \mapsto (\left[x\right]_{m_0}, \left[x\right]_{m_1},… ) and (x_0, x_1, …) \mapsto x_0⋅e_0 + x_1⋅e_1+⋯.
Question. How does this generalize to include derivatives?
Appendix: applications to proof systems
Note the similarity between bilinear algorithms and an R1CS predicate φ(w) ⇔ \p{A ⋅ w} ∘ \p{B ⋅ w} = C⋅w. If C is invertible this can be restated as a bilinear algorithm β(a,b) = C^{-1}⋅\p{\p{A ⋅ a} ∘ \p{B ⋅ b} } and we have φ(w) ⇔ β(w,w) = w. Applying this to the matrix exchange theorem we get that a given R1CS (A, B, C) is equivalent to \p{A, \p{\E ⋅ C^{-1}}^{\T}, \p{\p{B⋅\E}^{\T}}^{-1}}.
Since linear convolutions is equivalent to polynomial multiplication, a polynomial commitment scheme allows for an efficient proof of convolution. To proof c = a*b, proof C(τ) = A(τ)⋅B(τ) at a random point τ∈\F. To proof a size n (nega)cyclic convolution test that the following holds for n term polynomials A, B, C, Q:
C(τ) + Q(τ)⋅\p{τ^n \pm 1}= A(τ)⋅B(τ)
If we have a proof for reduction from FFT to convolution this could be used to proof FFTs.
Appendix: bilinear maps
Given vectors spaces U, V, W over a base field \F, a bilinear map is a function f: U × V → W such that
\begin{aligned} f(u_1 + u_2, v) &= f(u_1, v) + f(u_2, v) & f(λ⋅u, v) &= λ⋅f(u,v)\\ f(u, v_1 + v_2) &= f(u, v_1) + f(u, v_2) & f(u, λ⋅v) &= λ⋅f(u,v) \\ \end{aligned}
Bilinear algorithms are bilinear maps, so convolutions and polynomial modular multiplication are bilinear maps. Other examples of bilinear maps are matrix multiplications and the cross product. When W = \F, it is also called a bilinear form with further examples such as inner products.
Given bases a bilinear map can be written as a tensor u, v, w.
f_k(u, v) = \sum f_{ijk} ⋅ u_i ⋅ v_j
From this a bilinear algorithm follows. For each non-zero element of f_{ijk} create rows in A and B with all zeros except for a single 1 in columns i and j respectively. To C we add a column with the value f_{ijk} in row k. This has complexity m⋅ \Mc+n⋅\M where n is the number of nonzero entries and m the number of nontrivial entries.
Appendix: Polynomial composition
Given two polynomials P(X) = \sum_i p_i ⋅ X^i and Q(X) = \sum_i q_i ⋅ X^i, define the composition R(X) = P(Q(X)) where the coefficients of R are given by:
\begin{aligned} R(X) &= P(Q(X)) \\ &= \sum_i p_i ⋅ \p{ \sum_j q_j ⋅ X^j }^i \end{aligned}
Multinomial theorem.
\begin{aligned} P(X)^i &= \p{ \sum_j p_j ⋅ X^j }^i &= \sum_j \frac{i!}{j_1! j_2! j_3!⋯} ⋅ p_{j_1} ⋅ p_{j_2} ⋅ ⋯ X^{j_1 + j_2 + ⋯} \end{aligned}
\begin{aligned} P(X)^i &= \p{ \sum_j q_j ⋅ X^j }^i &= \sum_j \frac{i!}{j_1! j_2! j_3!⋯} ⋅ \end{aligned}
To do
https://arxiv.org/pdf/2201.10369.pdf
https://cr.yp.to/lineartime/multapps-20080515.pdf
https://eprint.iacr.org/2016/504.pdf page 5 describes how to turn convolution theorem into negacyclic.
Fast modular composition. p. 338 in Modern Computer Algebra.
https://relate.cs.illinois.edu/course/cs598evs-f20/ https://relate.cs.illinois.edu/course/cs598evs-f20/f/lectures/03-lecture.pdf
https://arxiv.org/abs/1707.04618
https://www.nature.com/articles/s41586-022-05172-4
https://doc.sagemath.org/html/en/reference/polynomial_rings/sage/rings/polynomial/convolution.html https://jucaleb4.github.io/files/conv_ba.pdf
https://jucaleb4.github.io/files/poster1.pdf