# Random variables

For the length of this chapter, assume we are give a probability space $(Ω, Σ, \operatorname{Pr})$.

Definition. Given a measurable space $(Ω_X, Σ_X)$, a random variable is a measurable function $X: Ω → Ω_X$.

Note. The space $(Ω_X, Σ_X)$ is called the observation space.

Definition. The pushforward measure $X_* \operatorname{Pr}$ is denoted as

$$\operatorname{Pr}_X ≜ X_* \operatorname{Pr}$$

Theorem. The measure $\operatorname{Pr}_X$ is a probability measure and $(Ω_X, Σ_X, \operatorname{Pr}_X)$ is a probability space.

Note. Later, we will use $\Es X ⋅$, $\Vars X ⋅$, etc. to denote expect value and other operators in the $(Ω_X, Σ_X, \operatorname{Pr}_X)$ probability space.

Definition. For any predicate $P: E → \set{⊤,⊥}$ we define the set

$$P\p{X} ≜ \setb{ω \in Ω}{P\p{X\p{ω}}}$$

Note. It is not a-priory known if $P\p{X} ∈ F$. To solve this we can assume that $F$ is constructed from $Ω$ in a 'standard manner' such as Borel sets $\mathcal{B}\p{\dummyarg}$. To avoid going deep into measure theory, let's take the informal hand wavy assumption that any 'well-behaved set' is member of $F$.

Example. Consider the predicate $X = x$, we can now derive

$$\Pr{X = x} = \Pr{ \setb{ω ∈Ω}{X\p{ω} = x}}$$

## Real-valued random variables

Definition. A real-valued random variable is a random variable $X: Ω → \R$ measurable on the obervation space $(\R, \mathcal{B}\p{\R})$, i.e. the real numbers with a Borel σ-algebra.

Note. Again, to avoid less insightful digressions into measure theory, I will skip the measurability requirement from here on. You can safely assume that any practical function $X:Ω → \R$ is a real-valued random variable. Non-measurable real-valued functions are very rare in practice and tend to be very pathological. In fact, even the Dirichlet function (the rational indicator function) is measurable.

To do. Generalize this to $\C$, $\R^n$, $\C^n$, ...

### Expected value

Definition. Given a random variable $X$, it's expected value is

$$\E X ≜ \int_Ω X\p{ω} \d \Pr{ω}$$

To do. https://en.wikipedia.org/wiki/Entropy_(information_theory) https://en.wikipedia.org/wiki/Mutual_information

Theorem. The expected value is a linear map, i.e. $\E{X + Y} = \E{X} + \E{Y}$, $\E{a ⋅ X} = a ⋅ \E{X}$. If $X ≤ Y$ almost surely and $\E{X}$ and $\E{Y}$ exist, then $\E{X} ≤ \E{Y}$.

Theorem. (Jensen's inequality) Given a random variable $X: Ω → \R$ and a convex function $f:\R → \R$, then:

$$f\p{\E{X}} ≤ \E{f\p{X}}$$

Proof.

To do. Mean. Skewness. Kurtosis. Moments. Cumulants. https://en.wikipedia.org/wiki/Moment_(mathematics)

### Variance

Definition. Given a random variable $X$, it's variance is

$$\Var X ≜ \E{\p{X - \E X}^2}$$

Theorem. $\Var X = \E{X^2} - \E{X}^2$.

### Covariance

Definition. Given random variables $X, Y$, their covariance is

$$\Cov{X, Y} ≜ \E{\p{X - \E X} ⋅ \p{Y - \E Y}}$$

Theorem. $\Cov{X,Y} = \E{X⋅Y} - \E X ⋅ \E Y$.

Theorem. $\Cov{X,X} = \Var X$.

### Probability density function

Definition. Given random variables $X$ with values in measure space $(Ω_X, Σ_X, μ)$, the probability density of $X$ with respect to $μ$ is the Radon-Nikodym derivative

$$p_X ≜ \frac{\d \operatorname{Pr}_X}{\d μ}$$

Theorem. By definition $p_X$ satisfies

$$\forall_{Σ_X}^A\ \Pr{X \in A} = \int_{X^{-1}\p{A}} \d \Pr{ω} = \int_A \d \Prs{X}{x} = \int_A p_X(x) \d μ(x)$$

Note. For a continuous random variable the reference measure $μ$ would be the Lebesgue measure.

Note. For a discrete random variable the reference measure $μ$ would be the counting measure and $p_X$ becomes a probability mass function such that

$$\Pr{X = b} = \Pr{X^{-1}\p{\set{b}}} = \int_{X^{-1}\p{\set{b}}} \d \Pr{ω} = \int_{\set{b}} p_X(x) \d μ\p{x} = p_X(b)$$

To do. How does this work for probability spaces that are neither discrete nor continuous, for example a Fock space.

Definition. Exponential family (to do. https://en.wikipedia.org/wiki/Exponential_family#Table_of_distributions).

$$\Prc{X = \vec x}{\vec θ} = h(x) ⋅ \exp\p{\vec \eta(\vec θ) ⋅ \vec T(\vec x) - A(\vec θ)}$$

The natural family

$$\Prc{X = \vec x}{\vec θ} = h(x) ⋅ \exp\p{\vec θ ⋅ \vec x - A(\vec θ)}$$

To do. Probability distributions like Uniform and Normal.

To do.: https://ermongroup.github.io/cs228-notes/representation/directed/

To do.: https://www.randomservices.org/random/dist/Mixed.html

To do.: https://en.wikipedia.org/wiki/Scoring_rule https://en.wikipedia.org/wiki/Loss_function https://en.wikipedia.org/wiki/Regret_(decision_theory)

Remco Bloemen
Math & Engineering
https://2π.com