Gaussian Process inference

\gdef\delim#1#2#3{\mathopen{}\mathclose{\left#1 #2 \right#3}} \gdef\p#1{\delim({#1})} \gdef\ps#1{\delim\{{#1}\}} \gdef\N{\mathcal N} \gdef\vec#1{\bm #1} \gdef\mat#1{\mathrm #1} \gdef\k{\mathrm k} \gdef\u{\mathrm u} \gdef\c{\mathrm c} \gdef\T{\mathrm T}

Given \vec μ and \mat Σ such that \vec y is multivariate normal distributed.

P\p{\vec y} ∼ \N\p{\vec μ, \mat Σ}

Given a draw of standard normal individually independent values \vec n, we can transform this to a draw from \N\p{\vec μ, \mat Σ}:

\vec y = \vec μ + \mat L \cdot \vec n

Where \mat L is the Cholesky factor such that \mat L ⋅ \mat L^{\T} = \mat Σ.

Computing the Cholesky decompostion

https://en.wikipedia.org/wiki/Cholesky_decomposition#The_Cholesky_algorithm

https://en.wikipedia.org/wiki/Cholesky_decomposition#The_Cholesky%E2%80%93Banachiewicz_and_Cholesky%E2%80%93Crout_algorithms

do i = 1, size(A,1)
    L(i,i) = sqrt(A(i,i) - dot_product(L(i,1:i-1), L(i,1:i-1)))
    L(i+1:,i) = (A(i+1:,i) - matmul(conjg(L(i,1:i-1)), L(i+1:,1:i-1))) / L(i,i)
end do

Inversion using Cholesky

Inference

Supposed we know a subset of values \vec y_{\mathrm k} and we want to use this to infer about the remaining unknown values \vec y_{\mathrm u}.

P\p{\vec y_{\u} \vert \vec y_{\k}} ∼ \N\p{\vec μ_{\c}, \mat Σ_{\c}}

This is again normally distributed with

\begin{aligned} \vec μ_{\c} &= \vec μ_{\u} + Σ_{\k\u}⋅Σ_{\k\k}^{-1}⋅ \p{\vec y_{\k} - \vec μ_{\k}} \\ \mat Σ_{\c} &= \mat Σ_{\u\u} - Σ_{\u\k}⋅Σ_{\k\k}^{-1}⋅ \mat Σ_{\k\u} \end{aligned}

Where {}_\k represents known indices and {}_\u the unknown indices. To avoid inverting we want to re-use \mat L.

Note that the second matrix is also kind of an update:

\mat Σ_{\c} = \mat Σ_{\u\u} - Σ_{\u\k} ⋅ \p{\mat L_{\k\k} ⋅ \mat L_{\k\k}^\T}^{-1} ⋅ \mat Σ_{\u\k}^\T

Efficient computation

Suppose we know \mat L. Can we use this to quickly compute \mat L_{\k\k}? This amounts to computing the Cholesky factor of a subset of the matrix. This can be done using repeated single-row deletion update:

https://en.wikipedia.org/wiki/Cholesky_decomposition#Adding_and_removing_rows_and_columns

Can we also compute \mat L_{\c} efficiently?

Remco Bloemen
Math & Engineering
https://2π.com