Groth16
$$ \gdef\delim#1#2#3{\mathopen{}\mathclose{\left#1 #2 \right#3}} \gdef\p#1{\delim({#1})} \gdef\F{\mathbb{F}} \gdef\G{\mathbb{G}} \gdef\g{\mathrm{G}} \gdef\h{\mathtt{H}} \gdef\e{\mathrm{e}} \gdef\Z{\mathrm{Z}} $$
Notes on the Groth16 proof system. An evergreen from the era of R1CS with extensive and widely used tooling such as bellman, zokrates and circom. It is the proof system currently used by ZCash, Hermez, DarkForest, TornadoCash, Semaphore, Loopring, and others.
Pick an elliptic curve pairing. Let $\G_1$, $\G_2$, $\G_3$ be groups with the same scalar field $\F$, generators $\g_1$, $\g_2$, $\g_3$ and a pairing $\e: \G_1 × \G_2 → \G_3$. For a multiset of $\mathcal S$ of elements from $\F$ define the zero polynomial $\Z_{\mathcal S} ∈ \F[X]$ as
$$ \Z_{\mathcal S}(X) = \prod_{s ∈ \mathcal S} \p{X - s} $$
Circuit
To proof your program in Groth16 you first need to convert it to an R1CS constraint system. A classic introduction to this transformation is Vitalik's writup. An early formal definition is in BCGTV13 appendix E.1. There are many proof systems besides Groth16 that use it and for a while it looked to become a standard format, until Starks and Plonks took over. To create R1CSs we now have high level programming languages and compilers like Circom.
R1CS
A rank-1 constraint system (R1CS) with $n$ variables and $m$ constraints and $p$ public inputs has a witness vector $\vec w ∈ \F^n$. By convention, the first $p$ elements of $\vec w$ are the public input and the first public input $w_0$ is fixed to $1$. The $m$ constraints in R1CS are a product equation between three inner products:
$$ (\vec w ⋅ \vec a_i) ⋅ (\vec w ⋅ \vec b_i) = \vec w ⋅ \vec c_i $$
where vectors $(\vec a_i, \vec b_i, \vec c_i) ∈ \F^{3⋅n}$ specify the constraint. The constraint vectors are usually very sparse, typically only nonzero for a single or few values. These constraint vectors can be aggregated in $n × m$ matrices so that the whole constraint system can be concisely written using an element-wise product
$$ \p{\vec w ⋅ A} ∘ \p{\vec w ⋅ B} = \vec w ⋅ C $$
Polynomials (QAP)
We fix a basis $\vec x ∈ \F^m$ and for each constraint define polynomials such that $A_i(x_j) = A_{ij}$, similarly for $B$, $C$. Then define $A(X) = \sum_{i ∈ [0,n)} w_i ⋅ A_i(X)$, similarly for $B, C$. Now the constraints are equivalent to $A(x_i) ⋅ B(x_i) = C(x_i)$. This form is called a quadratic arithmetic program (QAP) and can be verified by the existence of a low-degree $H(X)$ that satisfies
$$ A(X) ⋅ B(X) - C(X) = H(X) ⋅ \Z_{\vec x}(X) $$
To construct $H(X)$ divide the left-hand-side by $\Z_{\vec x}(X)$, which will only be 'exact' (precisely, a low-degree polynomial) if the constraints hold. The polynomial equation itself is efficiently verified by evaluating at a random value $τ$.
Note. In BKSV20 and GR22 it is shown that for Groth16 there exists potential malleability on the public input unless certain linear independence requirements are met. An easy way to meet these requirements is to add additional constraints $w_i ⋅ 0 = 0$ for all public inputs. This is done implicitly by the Arkworks and SnarkJS implementations.
Trusted setup
Groth16 requires some setup to create a shared set of values known as the common reference string. This setup can be split in two parts, one generic and one specific to the circuit.
Powers of Tau
Pick an upper bound $m$ for the number of constraints. Generate random values $τ, α, β$ and compute:
$$ \p{τ^0, τ^1, τ^2, …, τ^{2 ⋅ m - 2}} ⋅ \g_1 $$
$$ \p{τ^0, τ^1, τ^2, …, τ^{m - 1} } ⋅ \g_2 $$
$$ α ⋅ \p{ τ^0, τ^1, τ^2, …, τ^{m - 1} } ⋅ \g_1 $$
$$ β ⋅ \p{τ^0, τ^1, τ^2, …, τ^{m - 1}} ⋅ \g_1 $$
$$ β ⋅ \g_2 $$
Phase 2
Given circuit polynomials $A_i, B_i, C_i$, generate random values $γ, δ$ and define the polynomials $L_i$ by:
$$ L_i(X) = β ⋅ A_i(X) + α ⋅ B_i(X) + C_i(X) $$
We can not compute the $L_i(X)$ directly since we don't know $α$ and $β$, but we can still construct $L_i(τ) ⋅ \g_1$ using linear combinations of the values from the previous step. This way, we compute the proving key:
$$ (α, β, δ) ⋅ \g_1 $$
$$ (τ^0, τ^1, τ^2, τ^3, …, τ^{m-1}) ⋅ \g_1 $$
$$ δ^{-1} ⋅ (L_p(τ), L_{p+1}(τ), …, L_{m-1}(τ)) ⋅ \g_1, $$
$$ δ^{-1} ⋅ (τ^0, τ^1, τ^2, τ^3, …, τ^{m-2}) ⋅ \Z_{\vec x}(τ) ⋅ \g_1 $$
$$ (β, δ) ⋅ \g_2 $$
$$ (τ^0, τ^1, τ^2, τ^3, …, τ^{m-1}) ⋅ \g_2 $$
and the verification key:
$$ α ⋅ \g_1 $$
$$ γ^{-1} ⋅ (L_0(τ), L_1(τ), L_2(τ), …, L_{p-1}(τ)) ⋅ \g_1 $$
$$ (β, γ, δ) ⋅ \g_2 $$
Instead of providing $α ⋅ \g_1$ and $β ⋅ \g_2$ we could also provide $α ⋅ β ⋅ \g_3$ in the verifying key, but $\G_3$ elements tend to be big and weird so we avoid them.
The Phase 2 ceremony is critical for soundness. If $δ$ is known the prover can construct any $δ^{-1} ⋅ P(τ) ⋅ \g_1$ of degree $2 ⋅ n - 1$ where otherwise it is constraint to the form $δ^{-1} ⋅ H(τ) ⋅ \Z_{\vec x}(τ) ⋅ \g_1$. The forced factorization into $H$ and $\Z_{\vec x}$ is critical for soundness.
Multi-party computation
It is critical for the security of the system that the values $τ, α, β, δ, γ$ are not known to anyone. They must be disposed of securely, hence they are known as toxic waste. The ZCash team has developed multi-party computation protocols to share the burden of the setup over many participants, such that each produces only a fragment of the waste and you need all fragments to break the system. That is, the system is secure as long as at least one participant is secure. The protocol is described in BGM17 with historical context in a blog post. A good introduction to how trusted setups work is Vitalik's writeup.
I'll give a sketch of how such protocols work. Imagine we need values $S_i = α ⋅ β^i ⋅ A$ with secret $α$ and $β$. Start with some value initial values $α = β = 1$ and $S_i = A$ and send these to the first participant. The participant generates random $a$, $b$ and computes
$$ S_i' = a ⋅ b^i ⋅ S_i = α' ⋅ β'^i⋅ A $$
with updated secrets $α' = a ⋅ α$ and $β' = b ⋅ β$. It passes $S_i'$ on to the next participant and the process repeats. To make sure participants do this correctly they also publish $a ⋅ \g_2$, $b ⋅ \g_1$ and $b^i ⋅ \g_2$ and we verify
$$ \begin{aligned} \e(S_0, a ⋅ \g_2) &= \e(S_0', \g_2) \\ \e(S_{i-1}', b^i ⋅ \g_2) &= \e(S_i', \g_2) \\ \e(b ⋅ \g_1, b^{i-1} ⋅ \g_2) &= \e(\g_1, b^i ⋅ \g_2) \\ \end{aligned} $$
The actual protocol described in BGM17 includes additional safeguards such as proof-of-knowledge of $a$, $b$ and a random beacon to guarantee the result is uniformly distributed.
The setup for Groth16 has two phases where the second phase is specific for the circuit you want to proof. This is unfortunate because it means we need to recruit participants and do a setup ceremony for each circuit. Later proof systems have simpler setups that only require a secret value $τ$ and values $(τ^0, τ^1, τ^2, τ^3, …, τ^{n-1}) ⋅ (\g_1, \g_2)$. These can then be used for many proof systems and circuits. Large, well-constructed sets of values are available.
Proving
Given a witness $\vec w = (1, w_1, w_2, …, w_m)$ that satisfies the constraints, we first compute the degree $m$ polynomials $A, B, C, H$ as above. Then compute the private witness polynomials:
$$ \begin{aligned} L(X) &= \sum_{i ∊ [p,m)} w_i ⋅ L_i(X) \end{aligned} $$
Generate two random values $r, s$ and compute
$$ \begin{aligned} A &= (α + r ⋅ δ + A(τ)) ⋅ \g_1 \\ B &= (β + s ⋅ δ + B(τ)) ⋅ \g_2 \end{aligned} $$
$$ C = \p{\begin{aligned} & δ^{-1} ⋅ \p{L(τ) + H(τ) ⋅ \Z_{\vec x}(τ)} \\ + & s ⋅ \p{α + r ⋅ δ + A(τ)} \\ + & r ⋅ \p{β + s ⋅ δ + B(τ)} \\ - & r ⋅ s ⋅ δ \end{aligned} } ⋅ \g_1 $$
The proof is the triplet $(A, B, C)$. To compute $A(τ)⋅\g_1$ and, $B(τ)⋅\g_1$ efficiently we can precompute a basis $A_i(X), B_i(X)$ on $\vec x$ from the proving key and use a multi-scalar-multiplication (MSM)⋅ To compute $δ^{-1}⋅H(τ)⋅\Z_{\vec x}(τ)⋅\g_1$ we use Jordi's trick: Observe that $H(X)⋅\Z_{\vec x}(X)$ is a degree $2⋅n$ polynomial that equals zero on the domain $\vec x$. We pick a disjunct domain $\vec y$ and prepare a $2⋅n$ size basis over $\vec x \cup \vec y$. Since evaluations on $\vec x$ are zero we only need to MSM over the $n$ $\vec y$ values. To get evaluations on $\vec y$ we can interpolate $A(X), B(X), C(X)$ to $\vec y$ and element-wise compute $A(X)⋅B(X)-C(X)$. If $\vec x$ and $\vec y$ is chosen to be the odd/even $2⋅n$ roots of unity, the interpolation can be done using $6⋅\mathsf{NTT}_n$.
Q. Can we get rid of the $\mathsf{NTT}$'s on $C$ by MSMing it directly and subtracting $C(τ)⋅Z_{\vec x}(τ)$?. This won't work for $A⋅B$ since we can't multiply at $τ$. We also no longer have $A⋅B$ evaluating to zero on $\vec x$, so we end up with $4⋅\mathsf{NTT}_{n} + \mathsf{NTT}_{2⋅n}$ which cost about the same.
Note that the proof is malleable. Given a proof $(A, B, C)$ and arbitrary values $t, u$ the triplet $(t^{-1} ⋅ A, t ⋅ B + t ⋅ u ⋅ δ ⋅ \g_2, C + u ⋅ A)$ is also a valid proof and indistinguishable from a regularly produced one, see GM17 or BKSV20.
Verifying
Given proof $(A, B, C)$ and public inputs $w = (1, w_1, …, w_l)$. Aggregate the public inputs
$$ \begin{aligned} {\overline L} &= \sum_{i ∊ [0, p)} w_i ⋅ \p{γ^{-1} ⋅ L_i(τ) ⋅ \g_1} \\ \end{aligned} $$
then check
$$ \e(A, B) = \e(α ⋅ \g_1, β ⋅ \g_2) + \e( {\overline L}, γ ⋅ \g_2) + \e(C, δ ⋅ \g_2) $$
Intuitively, this verifies $A(τ) ⋅ B(τ) = C(τ) + H(τ) ⋅ \Z_{\vec x}(τ)$ with some extra terms mixed in. Some of these extra terms guarantee that the same value $\vec w$ is used consistently, others are just there to cancel out spurious terms. Groth16 is remarkable in how everything happens simultaneously in a single complex expression. It is hard to separate in clear parts. Later systems based on polynomial commitments have a much more modular structure, but also have larger proofs and more steps in the verification.
Completeness follows from an expansion of the pairing check. Soundness follows from the observation that the values $τ, α, β, γ, δ$ are unknown, therefore prover provided values must be linear combinations of the setup values. We can consider the pairing check a multivariate polynomial equation in these secret values, invoke Schwartz-Zippel and equate the coefficients. Straightforward but tedious solving for each coefficient shows the existence of a satisfying witness $w$. Zero knowledge follows from $A$ and $B$ being uniformly random through $r$ and $s$ and $C$ being fully determined through the claim.