Time Series
Dynamic Bayesian Networks are a general class of state-space time series model encompassing as special cases
- Hidden Markov Models
- Kalman Filters
- ARMA
Definition
A Bayesian network is a directed acyclic graph G = (V, E) where each node x ∈ V has a conditional probability distribution associated such that the joint probability on V is
\Pr{V} = \prod_{x ∈ V} \Prc{x}{π_x}
where π_x denotes the parents of node x.
A dynamic Bayesian Network is a pair (B_0, B_t) where B_0 is a Bayesian network representing the initial distribution (i.e. at time 0) and B_t is the transition network on the same nodes V but with a different set of edges allowing cycles and self-loops. Define with τ_x the ancestors in the transition network.
Now replicate V for each time t so we have V_0, V_1, … all containing the same nodes. Similarly we have x_0, x_1.
\Pr{V_0} = \prod_{x_0 ∈ V_0} \Prc{x_0}{π_{x_0}}
\Pr{V_t}{V_{t-1}} = \prod_{x_t ∈ V_t} \Prc{x_t}{π_{x_t}, τ_{x_{t-1}}}
Tracing:
\Pr{V_{0:T}} = \Pr{V_0} ⋅ \Pr{V_{1:T}} = \prod_{x ∈ V} \Prc{x}{π_x} ⋅ \prod_{t ∈ 1:T} \prod_{x_t ∈ V_t} \Prc{x_t}{π_{x_t}, τ_{x_{t-1}}}
Alternative. A special case is when x_t only depends on x_{t-1} and other nodes at time t. So history propagates directly. In this case E_t only contains self-loops.
this leads to
\Prc{V_t}{V_{t-1}} = \prod_{x ∈ V} \prod_{π_x ∈ V} \Prc{x_t}{π_{x_t}}
Note. This model is Markovian in that temporal relations are only between t and t+1. If different lags are required we can add these as state variables. (Or modify the model, but state seems more meaningful).
Partial observation. Only a subset of V is observable.
Learning structure
It is possible to learn the structure of the graph from the data.
For learning the base structure we can use all the available data for each variable, ignoring the temporal information. This is equivalent to learning a BN. For learning the transition network we consider the temporal information, in particular the data for all variables in two consecutive time slices, X_t and X_{t+1}. Considering the base structure, we can then learn the dependencies between the variables at time t and t+1. [source]
Learning distributions parameters: CMA-ES on cost function
In contracts with the 'Expectation Maximization' method, the model is not optimized for distribution fit, but for a specified cost function. This makes the learning less sensitive to modelling errors.
One cost function is maximum likelihood.
Special case: Kalman filters
\vec x_{t+1} = A ⋅ \vec x_t + \vec c_t + \vec w_t
\vec z_{t} = H ⋅ \vec x_t + \vec v_t
With noise vectors \vec w_t ∼ \mathcal{N}(0, Q_t) and \vec v_t ∼ \mathcal{N}(0, R_t).
\Prc{\vec x_{t+1}}{\vec x_t} \sim \mathcal{N}(A ⋅ \vec x_t, Q)
\Prc{\vec z_t}{\vec x_t} \sim \mathcal{N}(H ⋅ \vec x_t, R)
Special case: ARIMA
https://multithreaded.stitchfix.com/blog/2016/04/21/forget-arima/
Y_t = \mu_t + x_t \beta + S_t + e_t
\mu_{t+1} = \mu_t + v_t
References
- A new approach to learning in Dynamic BayesianNetworks (DBNs) https://arxiv.org/pdf/1812.09027.pdf
- Kalman filter demystified: from intuition to probabilistic graphical model to real case in financial markets https://arxiv.org/pdf/1811.11618.pdf
- A TutorialonDynamicBayesianNetworks https://www.cs.ubc.ca/~murphyk/Papers/dbntalk.pdf
- M. I. Jordan. An introduction to probabilistic graphical models. Berkeley, 2016 https://people.eecs.berkeley.edu/~jordan/prelims/
- https://www.amazon.com/gp/product/0262018020
- Learning temporal nodes Bayesian networks. https://ccc.inaoep.mx/~emorales/Papers/2013/2013-HernandezLearningTemporalNodes.pdf https://ccc.inaoep.mx/~esucar/Clases-mgp/Notes/c9-dbn.pdf
- https://www.bayesserver.com/
To do. How does this relate to https://en.wikipedia.org/wiki/Bayesian_structural_time_series?
https://courses.cs.washington.edu/courses/cse515/09sp/slides/varel.pdf
https://ethz.ch/content/dam/ethz/special-interest/mtec/chair-of-entrepreneurial-risks-dam/documents/dissertation/master%20thesis/Master_Thesis_%20Morzywolek.pdf
https://github.com/jsyoon0823/TimeGAN
http://isomorphisms.sdf.org/maxdama.pdf