Probabilistic Uniqueness

Ground Truth

Consider a set of observations \mathcal{O} and a binary relation \mathord{\sim} \subseteq \mathcal{O} \times \mathcal{O} such that a \sim b means that the observations are of the same entity. Then \sim is an equivalence relation and the unique entities are given by the quotient set \mathcal{O}/\mathord{\sim}.

Noisy Equivalence

Now consider we can not observe a \sim b directly, but only through a noisy channel \hat{\sim}. This is modelled through the two probabilities of false-match-rate (FMR) and false-non-match-rate (FNMR).

\Pr(a\mathrel{\hat{\sim}}b~|~a \nsim b)

\Pr(a\mathrel{\hat{\nsim}}b~|~a \sim b)

Noisy Distances

Assume a distance function d:\mathcal{O} \times \mathcal{O} \rightarrow {\mathbb{R}}_{\geq 0} and probability mass functions

\begin{aligned} p_{\text{same}}(d) & = \Pr(d(a,b) = d~|~a \sim b) \\ p_{\text{diff}}(d) & = \Pr(d(a,b) = d~|~a \nsim b) \end{aligned}

Approximate Uniqueness

We can imaging candidate relations \hat{\sim} and a probability distribution over them. There are 2^{n^{2}} possible relations, but not all of them are equivalence relations. The number of equivalence relations is given by the Bell numbers B_{n}. While also super-exponential, it grows much slower. This suggest the equivalence relation is a useful constraint.

Remco Bloemen
Math & Engineering
https://2π.com