Locality-Sensitive Hashing

$$ \gdef\S{\mathrm{S}} \gdef\darc{d_{\mathrm{arc}}} \gdef\a#1#2{\mathrm{A}_{#1}\p{#2}} \gdef\sign#1{\mathrm{sign}\p{#1}} $$

Let $\S_n$ be an $n$-dimensional unit $n$-sphere

$$ \begin{align} \mathrm{S}_n = \setb{\vec x ∈ \R^n}{\norm{\vec x}_2 = 1} \end{align} $$

and let $\darc$ be the arc length distance

$$ \begin{align} \darc\p{\vec a, \vec b} = \cos^{-1}\p{\vec a ⋅ \vec b} \end{align} $$

Random projections

Hyperplanes are defined by a normal unit vector $\vec h ∈ \S^n$. For a given vector $\vec x$ we can determine on which side of the plane it lies by the sign of the dot product $\vec x ⋅ \vec h$.

$$ \begin{align} \mathrm{Pr}_{\vec h ∈ \mathcal S}\left[\sign{\vec h ⋅ \vec a} = \sign{\vec h ⋅ \vec b} \right] = 1 - \frac{\darc\p{\vec a, \vec b}}{π} \end{align} $$

If we pick $m$ random hyperplanes $\vec h_i ∈ \S^n$ we can map each vector $\vec x ∈ \S^n$ into a $m$-bit vector $b(\vec x)$:

$$ \begin{align} b_i\p{\vec x} = \begin{cases} 0 & \vec h_i ⋅ \vec x ≤ 0 \\ 1 & \vec h_i ⋅ \vec x > 0 \end{cases} \end{align} $$

Assuming the hyperplanes are uniformly random, the probability of two bitvectors being equal is:

$$ \begin{align} \Pr{b\p{\vec a} = b\p{\vec b}} = \p{1 - \frac{\darc\p{\vec a, \vec b}}{π} }^m \end{align} $$

The hyperplanes form a tesselation of $\S^n$ where each region has a unique bitvector and each boundary represents a single bit change. Interestingly this implies that a Hamiltonian path through regions (if one exists) creates a Gray code.

We have two distributions on $\S^n$, $\mathcal{U}_{match}$ and $\mathcal{U}_{non-match}$. Given a vector $\vec x$ we want to classify it's distribution.

Assumption. Iris embeddings of different irises are uniformly distributed on $\S^n$ so the $\darc$ of non-matching irises follows the formula.

$$ \begin{align} \Pr{b\p{\vec a} = b\p{\vec b}} = \p{1 - \frac{\darc\p{\vec a, \vec b}}{π} }^m \end{align} $$


What is the probability of a match if $\darc < T$.

We are a looking for a family of hash functions $\mathcal H$, $h: \S^n → H$ such that


Assume we are given $H\p{\vec a}$, what can we learn about $\vec a$?

We can at best hope to learn all of the hyper-plane signs. This is $m⋅k$ bits of information. The subset of $S^n$ that has the same signs is a convex polygon with an (assumed) expected area of $2^{-m⋅k} A_n$. We would not have any more information than this, so our posterior would at best be the uniform distribution over this region.

We can start by drawing many samples from $\mathcal S$. and find matching vectors $k$.

TODO https://www.notion.so/Draft-Outline-99ecafcd45384d58a06bc1003fe1088b







Remco Bloemen
Math & Engineering