Bayesian Networks

graph BT;
    BN[Bayesian Networks];
    DBN[Dynamic Bayesian Networks]-->BN;
    HMM[Hidden Markov Model]-->DBN;
    KF[Kalman Filter]-->BN;


A Bayesian network is a directed acyclic graph $G = (V, E)$ where each node $x ∈ V$ has a conditional probability distribution associated such that the joint probability on $V$ is

$$ \Pr{V} = \prod_{x ∈ V} \Prc{x}{π_x} $$

where $π_x$ denotes the parents of node $x$.

A dynamic Bayesian Network is a pair $(B_0, B_t)$ where $B_0$ is a Bayesian network representing the initial distribution (i.e. at time $0$) and $B_t$ is the transition network on the same nodes $V$ but with a different set of edges allowing cycles and self-loops. Define with $τ_x$ the ancestors in the transition network.

Now replicate $V$ for each time $t$ so we have $V_0, V_1, …$ all containing the same nodes. Similarly we have $x_0, x_1$.

$$ \Pr{V_0} = \prod_{x_0 ∈ V_0} \Prc{x_0}{π_{x_0}} $$

$$ \Pr{V_t}{V_{t-1}} = \prod_{x_t ∈ V_t} \Prc{x_t}{π_{x_t}, τ_{x_{t-1}}} $$


$$ \Pr{V_{0:T}} = \Pr{V_0} ⋅ \Pr{V_{1:T}} = \prod_{x ∈ V} \Prc{x}{π_x} ⋅ \prod_{t ∈ 1:T} \prod_{x_t ∈ V_t} \Prc{x_t}{π_{x_t}, τ_{x_{t-1}}} $$

Alternative. A special case is when $x_t$ only depends on $x_{t-1}$ and other nodes at time $t$. So history propagates directly. In this case $E_t$ only contains self-loops.

this leads to

$$ \Prc{V_t}{V_{t-1}} = \prod_{x ∈ V} \prod_{π_x ∈ V} \Prc{x_t}{π_{x_t}} $$

Note. This model is Markovian in that temporal relations are only between $t$ and $t+1$. If different lags are required we can add these as state variables. (Or modify the model, but state seems more meaningful).

Partial observation. Only a subset of $V$ is observable.

Note. Hidden Markov Models are a special case of dynamic bayesian models.

Note. Kalman Filters are a special case of dynamic bayesian models.

Note. Dynamic Bayesian Networks are a special case of Bayesian Networks.

Note. Time series models AR, MA, ARMA, ARIMA, and ARIFMA(?) are special cases of dynamic bayesian networks. FIR and IIR filters have ARIMA structure.

Learning structure

It is possible to learn the structure of the graph from the data.

For learning the base structure we can use all the available data for each variable, ignoring the temporal information. This is equivalent to learning a BN. For learning the transition network we consider the temporal information, in particular the data for all variables in two consecutive time slices, $X_t$ and $X_{t+1}$. Considering the base structure, we can then learn the dependencies between the variables at time $t$ and $t+1$. [source]

Learning distributions parameters: CMA-ES on cost function

In contracts with the 'Expectation Maximization' method, the model is not optimized for distribution fit, but for a specified cost function. This makes the learning less sensitive to modelling errors.

One cost function is maximum likelihood.

Special case: Kalman filters

$$ \vec x_{t+1} = A ⋅ \vec x_t + \vec c_t + \vec w_t $$

$$ \vec z_{t} = H ⋅ \vec x_t + \vec v_t $$

With noise vectors $\vec w_t ∼ \mathcal{N}(0, Q_t)$ and $\vec v_t ∼ \mathcal{N}(0, R_t)$.

$$ \Prc{\vec x_{t+1}}{\vec x_t} \sim \mathcal{N}(A ⋅ \vec x_t, Q) $$

$$ \Prc{\vec z_t}{\vec x_t} \sim \mathcal{N}(H ⋅ \vec x_t, R) $$

Special case: ARIMA

$$ Y_t = \mu_t + x_t \beta + S_t + e_t $$

$$ \mu_{t+1} = \mu_t + v_t $$


To do. How does this relate to

Remco Bloemen
Math & Engineering