2020-12-08

Random variables

For the length of this chapter, assume we are give a probability space $(Ω, Σ, \operatorname{Pr})$ .

Definition. Given a measurable space $(Ω_X, Σ_X)$ , a random variable is a measurable function $X: Ω \to Ω_X$ .

Note. The space $(Ω_X, Σ_X)$ is called the observation space.

Definition. The pushforward measure $X_* \operatorname{Pr}$ is denoted as

$\operatorname{Pr}_X ≜ X_* \operatorname{Pr}$

Theorem. The measure $\operatorname{Pr}_X$ is a probability measure and $(Ω_X, Σ_X, \operatorname{Pr}_X)$ is a probability space.

Note. Later, we will use $\Es X ⋅$ , $\Vars X ⋅$ , etc. to denote expect value and other operators in the $(Ω_X, Σ_X, \operatorname{Pr}_X)$ probability space.

Definition. For any predicate $P: E → \set{⊤,⊥}$ we define the set

$P\p{X} ≜ \setb{ω \in Ω}{P\p{X\p{ω}}}$

Note. It is not a-priory known if $P\p{X} ∈ F$ . To solve this we can assume that $F$ is constructed from $Ω$ in a 'standard manner' such as Borel sets $\mathcal{B}\p{\dummyarg}$ . To avoid going deep into measure theory, let's take the informal hand wavy assumption that any 'well-behaved set' is member of $F$ .

Example. Consider the predicate $X = x$ , we can now derive

$\Pr{X = x} = \Pr{ \setb{ω ∈Ω}{X\p{ω} = x}}$

Real-valued random variables

Definition. A real-valued random variable is a random variable $X: Ω → \R$ measurable on the obervation space $(\R, \mathcal{B}\p{\R})$ , i.e. the real numbers with a Borel σ-algebra.

Note. Again, to avoid less insightful digressions into measure theory, I will skip the measurability requirement from here on. You can safely assume that any practical function $X:Ω → \R$ is a real-valued random variable. Non-measurable real-valued functions are very rare in practice and tend to be very pathological. In fact, even the Dirichlet function (the rational indicator function) is measurable.

To do. Generalize this to $\C$ , $\R^n$ , $\C^n$ , ...

Expected value

Definition. Given a random variable $X$ , it's expected value is

$\E X ≜ \int_Ω X\p{ω} \d \Pr{ω}$

To do. https://en.wikipedia.org/wiki/Entropy_(information_theory) https://en.wikipedia.org/wiki/Mutual_information

Theorem. The expected value is a linear map, i.e. $\E{X + Y} = \E{X} + \E{Y}$ , $\E{a ⋅ X} = a ⋅ \E{X}$ . If $X \leq Y$ almost surely and $\E{X}$ and $\E{Y}$ exist, then $\E{X} ≤ \E{Y}$ .

Theorem. (Jensen's inequality) Given a random variable $X: Ω → \R$ and a convex function $f:\R → \R$ , then:

$f\p{\E{X}} ≤ \E{f\p{X}}$

Proof.

To do. Mean. Skewness. Kurtosis. Moments. Cumulants. https://en.wikipedia.org/wiki/Moment_(mathematics)

Variance

Definition. Given a random variable $X$ , it's variance is

$\Var X ≜ \E{\p{X - \E X}^2}$

Theorem. $\Var X = \E{X^2} - \E{X}^2$ .

Covariance

Definition. Given random variables $X, Y$ , their covariance is

$\Cov{X, Y} ≜ \E{\p{X - \E X} ⋅ \p{Y - \E Y}}$

Theorem. $\Cov{X,Y} = \E{X⋅Y} - \E X ⋅ \E Y$ .

Theorem. $\Cov{X,X} = \Var X$ .

Probability density function

Definition. Given random variables $X$ with values in measure space $(Ω_X, Σ_X, μ)$ , the probability density of $X$ with respect to $μ$ is the Radon-Nikodym derivative

$p_X ≜ \frac{\d \operatorname{Pr}_X}{\d μ}$

Theorem. By definition $p_X$ satisfies

$\forall_{Σ_X}^A\ \Pr{X \in A} = \int_{X^{-1}\p{A}} \d \Pr{ω} = \int_A \d \Prs{X}{x} = \int_A p_X(x) \d μ(x)$

Note. For a continuous random variable the reference measure $μ$ would be the Lebesgue measure.

Note. For a discrete random variable the reference measure $μ$ would be the counting measure and $p_X$ becomes a probability mass function such that

$\Pr{X = b} = \Pr{X^{-1}\p{\set{b}}} = \int_{X^{-1}\p{\set{b}}} \d \Pr{ω} = \int_{\set{b}} p_X(x) \d μ\p{x} = p_X(b)$

To do. How does this work for probability spaces that are neither discrete nor continuous, for example a Fock space.

Definition. Exponential family (to do. https://en.wikipedia.org/wiki/Exponential_family#Table_of_distributions).

$\Prc{X = \vec x}{\vec θ} = h(x) ⋅ \exp\p{\vec \eta(\vec θ) ⋅ \vec T(\vec x) - A(\vec θ)}$

The natural family

$\Prc{X = \vec x}{\vec θ} = h(x) ⋅ \exp\p{\vec θ ⋅ \vec x - A(\vec θ)}$

To do. Probability distributions like Uniform and Normal.

To do.: https://ermongroup.github.io/cs228-notes/representation/directed/

To do.: https://www.randomservices.org/random/dist/Mixed.html

To do.: https://en.wikipedia.org/wiki/Scoring_rule https://en.wikipedia.org/wiki/Loss_function https://en.wikipedia.org/wiki/Regret_(decision_theory)