# Probability spaces

Naive probability theory is either discrete or continuous. Consider a six-sided die, we can talk about the expected value using the discrete definition $\E{X} = \sum_x x ⋅ \Pr{X = x}$ and find $3.5$. Similarly, consider a wheel spinner with values $\delim[{0,1})$, we can talk about the expected value using the continuous definition $\E{X} = \int_0^1 x ⋅ p(x) \d x$ and find $0.5$. But now we do something interesting: we remove the $1$ face of the die and in its place we put the spinner. Now when we roll the die, we get either the value $2,3,4,5,6$, or a number in $\delim[{0,1})$. If we roll this die a number of times and take the average, we find it is about $3.4$, with a bit of reasoning we can deduce the exact value $\frac{41}{12}$. The outcomes are neither discrete nor continuous, how do we rigorously define what we intuitively mean (no pun intended) with expected value?

Another limitation of naive probability theory is with distributions such as the Dirac delta function. These are ill-defined in the normal understanding of real numbers and Riemann integrals. But again, they make intuitive sense and are useful in practice. How do we make them rigorous?

**Note.** I use the operator notation $A_B^C$ as shorthand for $A_{C ∈ B}$ meaning to apply operator $A$ with bound variable $C$ ranging over set $B$, for example $\sum_{[0,n)}^i x_i$ denotes the sum over $x_0, \dots x_{n-1}$.

## Measure theory

### σ-algebra

**Definition.** Given a set $Ω$, a *σ-algebra* on $Ω$ is a set of subsets $Σ ⊆ \powerset{Ω}$ such that it includes itself,

$$ \tag{1} Ω ∈ Σ $$

is closed under complements, for any $S ∈ Σ$

$$ \tag{2} \p{Ω \setminus S} ∈ Σ $$

and is closed under countable unions, for any $I ∈ \powerset{Σ}$ such that $\card{I} ≤ \aleph_0$

$$ \tag{3} \p{\∪_{S ∈ I} S} ∈ Σ $$

**Note.** Within a particular σ-algebra, $Ω$ acts as a *universe* and I will use $\comp E$ to denote the complement with respect to $Ω$

$$ \comp E ≜ Ω \setminus E $$

From the definition follows that $∅ ∈ Ω$ because $∅ = \comp Ω$ and for is closed under countable intersections, i.e. given $I ∈ \powerset Σ$ such that $\card{I} ≤ \aleph_0$

$$ \Intersection_{S ∈ I} S = \comp{\Union_{S ∈ I} \comp S} ∈ Σ $$

### Measure space

**Definition.** A pair $(Ω, Σ)$ is a *measurable space* iff $Σ$ is a σ-algebra over $Ω$.

**Note.** In the probability theory that follows below, the set $Ω$ will represent the outcome space and $Σ$ the event space.

**Definition.** Given a measurable space $(Ω, Σ)$, a function $μ: Σ → [0, ∞]$ is a *measure* on $(Ω, Σ)$ iff

- $\∀_F^E μ(E) \ge 0$,
- $μ(∅) = 0$, and
- $\∀_{\powerset{Σ}}^S \ \norm S ∈ \N ∧ \∀_S^A \∀_{S \setminus \set{A}}^B A ∩ B = ∅ → μ\p{\∪_S^E E} = \sum_S^E μ(E)$.

**Note.** While measure theory allows $μ$ to range over $[0, ∞]$, in probability theory the measure will be a probability and be restricted to $[0,1]$ with $μ(Ω) = 1$, see below. For now it is kept generic.

**Definition.** A triple $(Ω, Σ, μ)$ is a *measure space* iff

- $Ω$ is a set.
- $Σ$ is a σ-algebra on set $Ω$.
- $μ$ is a measure on $(Ω, Σ)$.

**Theorem.** Given a measure space $(Ω, Σ, μ)$ then

- If $S, T ∈ Σ$ and $S ⊂ T$ then $μ(S) ≤ μ(T)$.
- Given countable $I ∈ \powerset Σ$ then $μ\p{\Union_{S ∈ I} S} ≤ \sum_{S ∈ I} μ(S)$.
- Given ascending chain $S_1 ⊂ S_2 ⊂ S_3 ⊂ ⋯$ then $μ\p{\Union_i S_i} = \lim_{i \to ∞} μ(S_i)$.
- Given descending chain $S_1 ⊃ S_2 ⊃ S_3 ⊃ ⋯$ then $μ\p{\Intersection_i S_i} = \lim_{i \to ∞} μ(S_i)$.

**Definition.** Given a measure space $(Ω, Σ, μ)$, a subset $S ∈ Σ$ is a *μ-null set* iff $μ(S) = 0$. A subset $S ∈ Σ$ is a *μ-full measure set* iff $\comp S$ is a μ-null set.

**Definition.** Given a measure space $(Ω, Σ, μ)$, a property $P$ of $Ω$ holds *μ-almost everywhere* iff there exists a μ-null set $S$ such that $P(x)$ for all $x ∈ \comp S$.

**Note.** In probability theory, this is also known as *μ-almost surely* and implies that the probability that the property holds is $1$.

**Definition.** A measure space $(Ω, Σ, μ)$ is a *complete measure space* iff for every μ-null set $S$ we have $\powerset S ⊆ Σ$.

**Definition.** Given a measure space $(Ω, Σ, μ)$, the *completion* is the smallest extensions such that the measure space is complete.

### Lebesgue measure

### Lebesgue's decomposition theorem

$$ μ = μ_{\text{continuous}} + \mu_{\text{discrete}} + \mu_{\text{singular}} $$

### Measurable function

**To do.** Define *measurable function*.

### Pushforward measure

**Definition.** Given two measurable spaces $(Ω_1, Σ_1)$ and $(Ω_2, Σ_2)$, a measure $μ: Σ_1 → \R ∪ \set{+∞}$ and a measurable function $f: Ω_1 → Ω_2$, the *pushforward measure* $f_* μ: Σ_2 → \R ∪ \set{+∞}$ is defined by:

$$ f_* μ (X) ≜ μ\p{f^{-1}\p{B}} $$

**Theorem.** The pushforward measure is a measure on $(Ω_2, Σ_2)$.

### Lebesgue integration

**To do.** Define the Lebesgue integral $\int_Ω ⋅ \d \Pr{ω}$.

**Definition.** Given

$$ \int_Ω f \d \operatorname{Pr} = \int_Ω f\p{x} \d \Pr{x} $$

### Radon-Nikodym theorem

**Theorem.** Given a measurable space $(Ω, Σ)$ with two measures $μ$ and $ν$ such that $\forall_Σ^A\ μ(A) = 0 → ν(A) = 0$ (i.e. $ν ≪ μ$, $ν$ is absolute continuous with respect to $μ$), then there exists a measurable function $f: Ω → \R ∪ \set{+∞}$ such that

$$ \forall_Σ^A\ ν(A) = \int_A f(ω) \d μ(ω) $$

furthermore this function is unique up to a $μ$-null set.

**Definition.** Denote the function $f$ from the above theorem as the *Radon–Nikodym derivative* $\frac{\d ν}{\d μ}$.

## Probability theory

### Probability measure

**Definition.** A measure $\operatorname{Pr}$ on $(Ω, Σ)$ is a *probability measure* iff $\Pr{Ω} = 1$.

**Theorem.** From this it follows $\operatorname{Pr}: Σ → [0, 1]$.

### Probability space

**Definition.** A measure space $(Ω, Σ, \operatorname{Pr})$ is a *probability space* iff $P(Ω) = 1$.

**Note.** The set $Ω$ is known as the *sample space*. The members of $Ω$ are known as *outcomes*. The set $Σ$ is known as the *event space* and members of $Σ$ are known as *events*. If and event is a singleton (i.e. contains a single outcome) it is know as an *elementary event*.

**Note.** From the above definitions the Kolgomorov axioms are apparent.

**To do.** Elementary theorems from https://ermongroup.github.io/cs228-notes/preliminaries/probabilityreview/.

From here on, assume we are give a probability space $(Ω, Σ, \operatorname{Pr})$.

**Note.** Given an event $e ∈ Σ$ and number $n ∈ (0, \infty)$, the *odds of $e$* are “$n$ to $\frac{n ⋅ \Pr{e}}{1 - \Pr{e}}$” and the *odds against $e$* are “$\frac{n⋅\p{1 - \Pr{e}}}{\Pr{e}}$ to $n$”. If $n$ is left out it is assumed $1$. The *logit* or *log-odds* of $e$ is $\log \frac{\Pr{e}}{1 - \Pr{e}}$. The *log-probability* of $e$ is $\log \Pr e$.

**To do.** https://en.wikipedia.org/wiki/Odds_ratio https://en.wikipedia.org/wiki/Risk_ratio Values from here https://en.wikipedia.org/wiki/Odds_ratio#Numerical_example, https://en.wikipedia.org/wiki/Odds_ratio#See_also https://en.wikipedia.org/wiki/Category:Summary_statistics_for_contingency_tables

**Definition.** Given a basis $b \in \R$, the *information content* $\operatorname I :Σ → \R ∪ \set{+∞}$ is a measure on $(Ω, Σ)$ defined by

$$ \operatorname I \p e ≜ - \log_b \Pr e $$

**Note.** The information content is also called *self-information*, *surprisal* or *Shannon information*.

**Note.** For basis $b=2$ the units of $\operatorname I$ are called *bits* or *shannons*, for $b=\operatorname e$ they are called *nats* and for $b=10$ *hartleys*. These are collectively *units of information*. From here on if the basis is not specified it is $\operatorname e$.

### Conditional probability

**Definition.** Given $A, B ∈ Σ$

$$ \Prc AB ≜ \frac{\Pr{A ∩ B}}{\Pr B} $$

**To do.** Likelihood and such.

**Theorem.** (Bayes rule)

$$ \Prc AB = \Prc BA ⋅ \frac{\Pr{A}}{\Pr B} $$

*Proof*. Expand the definition of conditional probability

$$ \frac{\Pr{A ∩ B}}{\Pr B} = \frac{\Pr{B ∩ A}}{\Pr A} ⋅ \frac{\Pr{A}}{\Pr B} $$ ∎

### Independence

**Definition.** Given $A, B ∈ Σ$, $A$ and $B$ are *independent* iff

$$ \Pr{A ∩ B} = \Pr A ⋅ \Pr B $$

## References

- Brent Nelson. "The Lebesgue Integral". pdf