# Random variables

For the length of this chapter, assume we are give a probability space $(Ω, Σ, \operatorname{Pr})$.

**Definition.** Given a measurable space $(Ω_X, Σ_X)$, a *random variable* is a measurable function $X: Ω → Ω_X$.

**Note.** The space $(Ω_X, Σ_X)$ is called the *observation space*.

**Definition.** The pushforward measure $X_* \operatorname{Pr}$ is denoted as

$$ \operatorname{Pr}_X ≜ X_* \operatorname{Pr} $$

**Theorem.** The measure $\operatorname{Pr}_X$ is a probability measure and $(Ω_X, Σ_X, \operatorname{Pr}_X)$ is a probability space.

**Note.** Later, we will use $\Es X ⋅$, $\Vars X ⋅$, etc. to denote expect value and other operators in the $(Ω_X, Σ_X, \operatorname{Pr}_X)$ probability space.

**Definition.** For any predicate $P: E → \set{⊤,⊥}$ we define the set

$$ P\p{X} ≜ \setb{ω \in Ω}{P\p{X\p{ω}}} $$

**Note.** It is not a-priory known if $P\p{X} ∈ F$. To solve this we can assume that $F$ is constructed from $Ω$ in a 'standard manner' such as Borel sets $\mathcal{B}\p{\dummyarg}$. To avoid going deep into measure theory, let's take the informal hand wavy assumption that any 'well-behaved set' is member of $F$.

**Example.** Consider the predicate $X = x$, we can now derive

$$ \Pr{X = x} = \Pr{ \setb{ω ∈Ω}{X\p{ω} = x}} $$

## Real-valued random variables

**Definition.** A *real-valued random variable* is a random variable $X: Ω → \R$ measurable on the obervation space $(\R, \mathcal{B}\p{\R})$, i.e. the real numbers with a Borel σ-algebra.

**Note.** Again, to avoid less insightful digressions into measure theory, I will skip the measurability requirement from here on. You can safely assume that any practical function $X:Ω → \R$ is a real-valued random variable. Non-measurable real-valued functions are very rare in practice and tend to be very pathological. In fact, even the Dirichlet function (the rational indicator function) is measurable.

**To do.** Generalize this to $\C$, $\R^n$, $\C^n$, ...

### Expected value

**Definition.** Given a random variable $X$, it's *expected value* is

$$ \E X ≜ \int_Ω X\p{ω} \d \Pr{ω} $$

**To do.** https://en.wikipedia.org/wiki/Entropy_(information_theory) https://en.wikipedia.org/wiki/Mutual_information

**Theorem.** The expected value is a *linear map*, i.e. $\E{X + Y} = \E{X} + \E{Y}$, $\E{a ⋅ X} = a ⋅ \E{X}$.
If $X ≤ Y$ almost surely and $\E{X}$ and $\E{Y}$ exist, then $\E{X} ≤ \E{Y}$.

**Theorem.** (Jensen's inequality) Given a random variable $X: Ω → \R$ and a convex function $f:\R → \R$, then:

$$ f\p{\E{X}} ≤ \E{f\p{X}} $$

**Proof.**

**To do.** Mean. Skewness. Kurtosis. Moments. Cumulants. https://en.wikipedia.org/wiki/Moment_(mathematics)

### Variance

**Definition.** Given a random variable $X$, it's *variance* is

$$ \Var X ≜ \E{\p{X - \E X}^2} $$

**Theorem.** $\Var X = \E{X^2} - \E{X}^2$.

### Covariance

**Definition.** Given random variables $X, Y$, their *covariance* is

$$ \Cov{X, Y} ≜ \E{\p{X - \E X} ⋅ \p{Y - \E Y}} $$

**Theorem.** $\Cov{X,Y} = \E{X⋅Y} - \E X ⋅ \E Y$.

**Theorem.** $\Cov{X,X} = \Var X$.

### Probability density function

**Definition.** Given random variables $X$ with values in measure space $(Ω_X, Σ_X, μ)$, the *probability density* of $X$ with respect to $μ$ is the Radon-Nikodym derivative

$$ p_X ≜ \frac{\d \operatorname{Pr}_X}{\d μ} $$

**Theorem.** By definition $p_X$ satisfies

$$ \forall_{Σ_X}^A\ \Pr{X \in A} = \int_{X^{-1}\p{A}} \d \Pr{ω} = \int_A \d \Prs{X}{x} = \int_A p_X(x) \d μ(x) $$

**Note.** For a continuous random variable the reference measure $μ$ would be the Lebesgue measure.

**Note.** For a discrete random variable the reference measure $μ$ would be the counting measure and $p_X$ becomes a probability mass function such that

$$ \Pr{X = b} = \Pr{X^{-1}\p{\set{b}}} = \int_{X^{-1}\p{\set{b}}} \d \Pr{ω} = \int_{\set{b}} p_X(x) \d μ\p{x} = p_X(b) $$

**To do.** How does this work for probability spaces that are neither discrete nor continuous, for example a Fock space.

**Definition.** *Exponential family* (**to do.** https://en.wikipedia.org/wiki/Exponential_family#Table_of_distributions).

$$ \Prc{X = \vec x}{\vec θ} = h(x) ⋅ \exp\p{\vec \eta(\vec θ) ⋅ \vec T(\vec x) - A(\vec θ)} $$

The natural family

$$ \Prc{X = \vec x}{\vec θ} = h(x) ⋅ \exp\p{\vec θ ⋅ \vec x - A(\vec θ)} $$

**To do.** Probability distributions like Uniform and Normal.

**To do.**: https://ermongroup.github.io/cs228-notes/representation/directed/

**To do.**: https://www.randomservices.org/random/dist/Mixed.html

**To do.**: https://en.wikipedia.org/wiki/Scoring_rule https://en.wikipedia.org/wiki/Loss_function https://en.wikipedia.org/wiki/Regret_(decision_theory)