Random variables
For the length of this chapter, assume we are give a probability space (Ω, Σ, \operatorname{Pr}).
Definition. Given a measurable space (Ω_X, Σ_X), a random variable is a measurable function X: Ω → Ω_X.
Note. The space (Ω_X, Σ_X) is called the observation space.
Definition. The pushforward measure X_* \operatorname{Pr} is denoted as
\operatorname{Pr}_X ≜ X_* \operatorname{Pr}
Theorem. The measure \operatorname{Pr}_X is a probability measure and (Ω_X, Σ_X, \operatorname{Pr}_X) is a probability space.
Note. Later, we will use \Es X ⋅, \Vars X ⋅, etc. to denote expect value and other operators in the (Ω_X, Σ_X, \operatorname{Pr}_X) probability space.
Definition. For any predicate P: E → \set{⊤,⊥} we define the set
P\p{X} ≜ \setb{ω \in Ω}{P\p{X\p{ω}}}
Note. It is not a-priory known if P\p{X} ∈ F. To solve this we can assume that F is constructed from Ω in a 'standard manner' such as Borel sets \mathcal{B}\p{\dummyarg}. To avoid going deep into measure theory, let's take the informal hand wavy assumption that any 'well-behaved set' is member of F.
Example. Consider the predicate X = x, we can now derive
\Pr{X = x} = \Pr{ \setb{ω ∈Ω}{X\p{ω} = x}}
Real-valued random variables
Definition. A real-valued random variable is a random variable X: Ω → \R measurable on the obervation space (\R, \mathcal{B}\p{\R}), i.e. the real numbers with a Borel σ-algebra.
Note. Again, to avoid less insightful digressions into measure theory, I will skip the measurability requirement from here on. You can safely assume that any practical function X:Ω → \R is a real-valued random variable. Non-measurable real-valued functions are very rare in practice and tend to be very pathological. In fact, even the Dirichlet function (the rational indicator function) is measurable.
To do. Generalize this to \C, \R^n, \C^n, ...
Expected value
Definition. Given a random variable X, it's expected value is
\E X ≜ \int_Ω X\p{ω} \d \Pr{ω}
To do. https://en.wikipedia.org/wiki/Entropy_(information_theory) https://en.wikipedia.org/wiki/Mutual_information
Theorem. The expected value is a linear map, i.e. \E{X + Y} = \E{X} + \E{Y}, \E{a ⋅ X} = a ⋅ \E{X}. If X ≤ Y almost surely and \E{X} and \E{Y} exist, then \E{X} ≤ \E{Y}.
Theorem. (Jensen's inequality) Given a random variable X: Ω → \R and a convex function f:\R → \R, then:
f\p{\E{X}} ≤ \E{f\p{X}}
Proof.
To do. Mean. Skewness. Kurtosis. Moments. Cumulants. https://en.wikipedia.org/wiki/Moment_(mathematics)
Variance
Definition. Given a random variable X, it's variance is
\Var X ≜ \E{\p{X - \E X}^2}
Theorem. \Var X = \E{X^2} - \E{X}^2.
Covariance
Definition. Given random variables X, Y, their covariance is
\Cov{X, Y} ≜ \E{\p{X - \E X} ⋅ \p{Y - \E Y}}
Theorem. \Cov{X,Y} = \E{X⋅Y} - \E X ⋅ \E Y.
Theorem. \Cov{X,X} = \Var X.
Probability density function
Definition. Given random variables X with values in measure space (Ω_X, Σ_X, μ), the probability density of X with respect to μ is the Radon-Nikodym derivative
p_X ≜ \frac{\d \operatorname{Pr}_X}{\d μ}
Theorem. By definition p_X satisfies
\forall_{Σ_X}^A\ \Pr{X \in A} = \int_{X^{-1}\p{A}} \d \Pr{ω} = \int_A \d \Prs{X}{x} = \int_A p_X(x) \d μ(x)
Note. For a continuous random variable the reference measure μ would be the Lebesgue measure.
Note. For a discrete random variable the reference measure μ would be the counting measure and p_X becomes a probability mass function such that
\Pr{X = b} = \Pr{X^{-1}\p{\set{b}}} = \int_{X^{-1}\p{\set{b}}} \d \Pr{ω} = \int_{\set{b}} p_X(x) \d μ\p{x} = p_X(b)
To do. How does this work for probability spaces that are neither discrete nor continuous, for example a Fock space.
Definition. Exponential family (to do. https://en.wikipedia.org/wiki/Exponential_family#Table_of_distributions).
\Prc{X = \vec x}{\vec θ} = h(x) ⋅ \exp\p{\vec \eta(\vec θ) ⋅ \vec T(\vec x) - A(\vec θ)}
The natural family
\Prc{X = \vec x}{\vec θ} = h(x) ⋅ \exp\p{\vec θ ⋅ \vec x - A(\vec θ)}
To do. Probability distributions like Uniform and Normal.
To do.: https://ermongroup.github.io/cs228-notes/representation/directed/
To do.: https://www.randomservices.org/random/dist/Mixed.html
To do.: https://en.wikipedia.org/wiki/Scoring_rule https://en.wikipedia.org/wiki/Loss_function https://en.wikipedia.org/wiki/Regret_(decision_theory)