Probability spaces

Naive probability theory is either discrete or continuous. Consider a six-sided die, we can talk about the expected value using the discrete definition and find . Similarly, consider a wheel spinner with values , we can talk about the expected value using the continuous definition and find . But now we do something interesting: we remove the face of the die and in its place we put the spinner. Now when we roll the die, we get either the value , or a number in . If we roll this die a number of times and take the average, we find it is about , with a bit of reasoning we can deduce the exact value . The outcomes are neither discrete nor continuous, how do we rigorously define what we intuitively mean (no pun intended) with expected value?

Another limitation of naive probability theory is with distributions such as the Dirac delta function. These are ill-defined in the normal understanding of real numbers and Riemann integrals. But again, they make intuitive sense and are useful in practice. How do we make them rigorous?

Note. I use the operator notation as shorthand for meaning to apply operator with bound variable ranging over set , for example denotes the sum over .

Measure theory

σ-algebra

Definition. Given a set , a σ-algebra on is a set of subsets such that it includes itself,

is closed under complements, for any

and is closed under countable unions, for any such that

Note. Within a particular σ-algebra, acts as a universe and I will use to denote the complement with respect to

From the definition follows that because and for is closed under countable intersections, i.e. given such that

Measure space

Definition. A pair is a measurable space iff is a σ-algebra over .

Note. In the probability theory that follows below, the set will represent the outcome space and the event space.

Definition. Given a measurable space , a function is a measure on iff

  1. ,
  2. , and
  3. .

Note. While measure theory allows to range over , in probability theory the measure will be a probability and be restricted to with , see below. For now it is kept generic.

Definition. A triple is a measure space iff

  1. is a set.
  2. is a σ-algebra on set .
  3. is a measure on .

Theorem. Given a measure space then

  1. If and then .
  2. Given countable then .
  3. Given ascending chain then .
  4. Given descending chain then .

Definition. Given a measure space , a subset is a μ-null set iff . A subset is a μ-full measure set iff is a μ-null set.

Definition. Given a measure space , a property of holds μ-almost everywhere iff there exists a μ-null set such that for all .

Note. In probability theory, this is also known as μ-almost surely and implies that the probability that the property holds is .

Definition. A measure space is a complete measure space iff for every μ-null set we have .

Definition. Given a measure space , the completion is the smallest extensions such that the measure space is complete.

Lebesgue measure

Lebesgue's decomposition theorem

Measurable function

To do. Define measurable function.

Pushforward measure

Definition. Given two measurable spaces and , a measure and a measurable function , the pushforward measure is defined by:

Theorem. The pushforward measure is a measure on .

Lebesgue integration

To do. Define the Lebesgue integral .

Definition. Given

Radon-Nikodym theorem

Theorem. Given a measurable space with two measures and such that (i.e. , is absolute continuous with respect to ), then there exists a measurable function such that

furthermore this function is unique up to a -null set.

Definition. Denote the function from the above theorem as the Radon–Nikodym derivative .

Probability theory

Probability measure

Definition. A measure on is a probability measure iff .

Theorem. From this it follows .

Probability space

Definition. A measure space is a probability space iff .

Note. The set is known as the sample space. The members of are known as outcomes. The set is known as the event space and members of are known as events. If and event is a singleton (i.e. contains a single outcome) it is know as an elementary event.

Note. From the above definitions the Kolgomorov axioms are apparent.

To do. Elementary theorems from https://ermongroup.github.io/cs228-notes/preliminaries/probabilityreview/.

From here on, assume we are give a probability space .

Note. Given an event and number , the odds of are “ to ” and the odds against are “ to ”. If is left out it is assumed . The logit or log-odds of is . The log-probability of is .

To do. https://en.wikipedia.org/wiki/Odds_ratio https://en.wikipedia.org/wiki/Risk_ratio Values from here https://en.wikipedia.org/wiki/Odds_ratio#Numerical_example, https://en.wikipedia.org/wiki/Odds_ratio#See_also https://en.wikipedia.org/wiki/Category:Summary_statistics_for_contingency_tables

Definition. Given a basis , the information content is a measure on defined by

Note. The information content is also called self-information, surprisal or Shannon information.

Note. For basis the units of are called bits or shannons, for they are called nats and for hartleys. These are collectively units of information. From here on if the basis is not specified it is .

Conditional probability

Definition. Given

To do. Likelihood and such.

Theorem. (Bayes rule)

Proof. Expand the definition of conditional probability

Independence

Definition. Given , and are independent iff

References

Remco Bloemen
Math & Engineering
https://2π.com