Entropy Coding

In these lab notes I derive an arithmetic coder that is optimally efficient using 64 bit registers.

Range multiplication

Given two sub-intervals of \delim[{0, 1}):

\begin{aligned} B &= \delim[{b₀, b₁}) \\ S &= \delim[{s₀, s₁}) \end{aligned}

and we want to compute the product of S into B:

S · B = \delim[{bₒ + s₀ (b₁ - b₀), bₒ + s₁(b₁ - b₀)})

64 bit intervals

We want to represent non-empty subintervals of \delim[{0,1}) with 64 bit integers, 𝔹^{64}.

\gdef\b{\mathtt{b}} \gdef\r{\mathtt{r}} \gdef\GF#1{\mathrm{GF}\p{#1}}

Take \p{\b,\r} ∈ \p{𝔹^{64}}^2 with the following map:

\p{\b,\r} ↦ \delim[{\frac{\b}{2^{64}}, \frac{\b + \r + 1}{2^{64}}})

Some examples of this map:

\begin{aligned} \p{0,0} &↦ \delim[{0, 2⁻^{64}}) \\ \p{0,2^{64}-1} &↦ \delim[{0, 1}) \\ \p{2^{64}-1, 0} &↦ \delim[{1 - 2⁻^{64}, 1}) \\ \p{2^{64}-1, 2^{64}-1} &↦ \delim[{1 - 2⁻^{64}, 2 - 2⁻^{64}}) \\ \end{aligned}

Where the last example is invalid. For validity we require

\b + \r + 1 ≤ 2^{64}

A reverse map can be created as:

\delim[{l, h}) ↦ \p{\ceil{l · 2^{64}}, \floor{h · 2^{64}} - \ceil{l · 2^{64}} - 1}

In order for \b to be in 𝔹^{64} we require 0 ≤ \ceil{l · 2^{64}} < 2^{64}. We already have 0 ≤ l < 1. To satisfy the upper bound we additionally require:

l ≤ 1 - 2⁻^{64}

In order for \r to be in 𝔹^{64} we require 0 ≤ \floor{h · 2^{64}} - \ceil{l · 2^{64}} - 1 < 2^{64}. The upper bound is already satisfied. To satisfy the lower bound we require:

\begin{aligned} 0 &≤ \floor{h · 2^{64}} - \ceil{l · 2^{64}} - 1 \\ &< \p{h - l} · 2^{64} - 1 - 1 \\ l + 2^{-63} &< h \end{aligned}

Since h ≤ 1 this also implies the stronger bound l < 1 - 2^{-63} on l. For the reverse map to work we need l + 2^{-63} < h ≤ 1.

Going full circle results in:

\delim[{l, h}) ↦ \p{\b,\r} ↦ \delim[{\frac{\ceil{l · 2^{64}}}{2^{64}}, \frac{\floor{h · 2^{64}}}{2^{64}}})

It can be easily shown that the later interval is always a subinterval of the former. It is also easily seen that this operation is idempotent.

64 bit interval multiply

\begin{aligned} \p{\b_B,\r_B} &↦ \delim[{\frac{\b_B}{2^{64}}, \frac{\b_B + \r_B + 1}{2^{64}}}) \\ \p{\b_S,\r_S} &↦ \delim[{\frac{\b_S}{2^{64}}, \frac{\b_S + \r_S + 1}{2^{64}}}) \end{aligned}

\delim[{ \frac{\b_B}{2^{64}} + \frac{\b_S}{2^{64}} \p{\frac{\b_B + \r_B + 1}{2^{64}} - \frac{\b_B}{2^{64}}}, \frac{\b_B}{2^{64}} + \frac{\b_S + \r_S + 1}{2^{64}} \p{\frac{\b_B + \r_B + 1}{2^{64}} - \frac{\b_B}{2^{64}}} })

\delim[{ \frac{\b_B}{2^{64}} + \frac{\b_S\p{\r_B + 1}}{2^{128}}, \frac{\b_B}{2^{64}} + \frac{\p{\b_S + \r_S + 1}\p{\r_B + 1}}{2^{128}} })

The first bound in l + 2^{-63} < h ≤ 1 gives:

\begin{aligned} \frac{\b_B}{2^{64}} + \frac{\b_S\p{\r_B + 1}}{2^{128}} + 2^{-63} &< \frac{\b_B}{2^{64}} + \frac{\p{\b_S + \r_S + 1}\p{\r_B + 1}}{2^{128}} \\ \b_S\p{\r_B + 1} + 2^{64} &< \p{\b_S + \r_S + 1}\p{\r_B + 1} \end{aligned}

The second bound in l + 2^{-63} < h ≤ 1 gives:

\begin{aligned} \frac{\b_B}{2^{64}} + \frac{\p{\b_S + \r_S + 1}\p{\r_B + 1}}{2^{128}} & ≤ 1 \\ \b_B · 2^{64} + \p{\b_S + \r_S + 1}\p{\r_B + 1} & ≤ 2^{128} \end{aligned}

Mapping this back to \p{𝔹^{64}}² using the reverse map:

\begin{aligned} \b_B' &= \ceil{l · 2^{64}} \\ &= \ceil{\p{ \frac{\b_B}{2^{64}} + \frac{\b_S\p{\r_B + 1}}{2^{128}} }·2^{64}} \\ &= \b_B + \ceil{\frac{\b_S\p{\r_B + 1}}{2^{64}} } \\ \end{aligned}

Since we have \b_S < 2^{64} and \r_B + 1 ≤ 2^{64} we can implement this using a 64 bit multiply with 128 bit result, and an 128 bit and 64 bit add as \b_S · \r_B + \b_S, or in code:

uint64 h, l;
std::tie(h, l) = mul128(b_S, r_B);
std::tie(h, l) = add128(h, l, b_S);
const uint64 t = h + (l > 0 ? 1 : 0);
b_B += t;

Where the final term implements the ceiling operator.

\begin{aligned} \r_B' &= \floor{h · 2^{64}} - \ceil{l · 2^{64}} - 1 \\ &= \floor{\p{\frac{\b_B}{2^{64}} + \frac{\p{\b_S + \r_S + 1}\p{\r_B + 1}}{2^{128}}}· 2^{64}} - \b_B + \ceil{\frac{\b_S\p{\r_B + 1}}{2^{64}}} - 1 \\ &= \floor{\frac{\p{\b_S + \r_S + 1}\p{\r_B + 1}}{2^{64}}} - \ceil{\frac{\b_S\p{\r_B + 1}}{2^{64}}} - 1 \\ \end{aligned}

To obtain a valid result we need 0 ≤ \r_B' < 2^{64}. The lower bound goes like:

\begin{aligned} 0 &≤ \floor{\frac{\p{\b_S + \r_S + 1}\p{\r_B + 1}}{2^{64}}} - \ceil{\frac{\b_S\p{\r_B + 1}}{2^{64}}} - 1 \\ 0 &≤ \frac{\p{\b_S + \r_S + 1}\p{\r_B + 1}}{2^{64}} - \frac{\b_S\p{\r_B + 1}}{2^{64}} - 1 \\ 1 &< \frac{\p{\r_S + 1}\p{\r_B + 1}}{2^{64}} \\ 2^{64} &≤ \p{\r_S + 1}\p{\r_B + 1} \\ \r_S &≥ \frac{2^{64}}{\r_B + 1} - 1 \\ \r_S &≥ 1 \end{aligned}

Where in the last step I have used \r_B ≥ 2⁶³, which will be enforced later, in normalization. The upper bound \r_B' < 2^{64} goes like:

\begin{aligned} 2^{64} &> \floor{\frac{\p{\b_S + \r_S + 1}\p{\r_B + 1}}{2^{64}}} - \ceil{\frac{\b_S\p{\r_B + 1}}{2^{64}}} - 1 \\ 2^{64} &> \frac{\p{\b_S + \r_S + 1}\p{\r_B + 1}}{2^{64}} - \ceil{\frac{\b_S\p{\r_B + 1}}{2^{64}}} - 1 \\ \end{aligned}

The second term we already have as t. For the first term we need another tricky multiply. In this case we have \b_S + \r_S + 1 ≤ 2^{64} and \r_B + 1 ≤ 2^{64}, so the intermediate result can actually be 2^{64}. The -1 will make sure the final result will be in 𝔹^{64}. We must be careful, but modular arithmetic will handle the overflow fine. Let's rewrite the multiplication in values <2^{64}:

\p{\b_S + \r_S}·\r_B + \p{\b_S + \r_S} + \r_B + 1

Now we can calculate the final interval:

std::tie(h, l) = mul128(b_S + r_S, r_B);
std::tie(h, l) = add128(h, l, b_S + r_S);
std::tie(h, l) = add128(h, l, r_B);
std::tie(h, l) = add128(h, l, 1);
r_B = h - t - 1;

Normalization

Given an interval \delim[{l,h}) we want to extract the prefix bits that won't change anymore. To determine which bits won't change it is useful to look at h-l:

\begin{aligned} l &= \mathtt{0.0001101101000111111110110010001001}… \\ h &= \mathtt{0.0001101101001000000011000010000111}… \\ h - l &= \mathtt{0.} \underbrace{\mathtt{000000000000}}_{\mathrm{settled}} \underbrace{\mathtt{0000000}}_{\mathrm{outstanding}} \underbrace{\mathtt{100001111111110}…}_{\mathrm{active}} \end{aligned}

The settled bits can be written directly to the output. The outstanding bits can still change because of a carry, but are otherwise settled. To normalize this interval we output the first 12 bits, note that there are 7 bits outstanding, and rescale the interval by 2^{12+7}. There is one edge case we can consider:

\begin{aligned} h - l &= \mathtt{0.0000000000000000000100000000000000}… \\ &= \mathtt{0.00000000000000000000} \underbrace{\mathtt{11111111111111}…}_{\mathrm{active}} \end{aligned}

Writing the number in two different but equal ways can result in a different number of leading zeros. We take the one that results in the largest number of leading zeros. So in general we want to rescale by 2^n with n given by:

n = \floor{-\log_2 \p{h - l}} - 1

The interval is normalized if n=0 and thus iff \frac 12 < h - l.

\begin{aligned} l' = l·2^n &= \mathtt{1101101000111111.110110010001001}… \\ h' = h·2^n &= \mathtt{1101101001000000.011000010000111}… \\ \end{aligned}

Here we can be in one of two situations, either the integral parts are the same, or h's is one larger that l's. In this case h's is in fact one larger. We can subtract the integral part of l:

\begin{aligned} l'' = l'-\floor{l'} &= \mathtt{0.110110010001001}… \\ h'' = h'-\floor{l'} &= \mathtt{1.011000010000111}… \\ \end{aligned}

We have now scaled our \delim[{l,h} interval to a subinterval of \delim[{0,2}), but we also have

0 ≤ l < 1

½ < h-l ≤ 1

In summary, an interval is normalized when it is a subinterval of \delim[{0,2}) and the above inequalities hold.

After normalization we end up in one of two cases:

Our situation is as above, there are no bits outstanding. This is when we have a carry outstanding and h > 1. After further narrowing of the interval we will eventually end up in the first case and flush the carry buffer, or we will end up in a third case:

In this third case we should add the carry and subtract 1 from both h and l. We are then effectively back in the first case.

64 bit normalization

\delim[{\frac{\b}{2^{64}}, \frac{\b + \r + 1}{2^{64}}})

For this to be normalized in \delim[{0, 2}) we must have

And the normalization condition ½ < h-l ≤ 1 reduces to 2⁶³ ≤ \r < 2^{64}. This essentially means that \r ∊ 𝔹^{64} and \r must have the most significant bit set.

\begin{aligned} n &= \floor{-\log_2 \p{\frac{\r + 1}{2^{64}}}} - 1 \\ &= 63 - \ceil{\log_2 \p{\r + 1}} \\ \end{aligned}

The ceiling and the +1 cancel, except when \r=0. What remains is essentially a count leading zeroes operation:

const uint n = r == 0 ? 63 : count_leading_zeros(r);

\begin{aligned} l'' &= l·2^n - \floor{l·2^n} = \mod{\frac{\b}{2^{64}⁻^n}}_1 \\ h'' &= h·2^n - \floor{l·2^n} = \frac{\b + \r + 1}{2^{64}⁻^n} - \floor{\frac{\b}{2^{64}⁻^n}}\\ \end{aligned}

Where \mod{x}_1 = x - \floor{x} is the fractional part operator.

\begin{aligned} \b'' &= \ceil{l'' · 2^{64}} \\ &= \ceil{\mod{\frac{\b}{2^{64}⁻^n}}_1 · 2^{64}} \\ &= \ceil{\mod{\frac{\b}{2^{64}⁻^n}_1 · 2^{64}}_{2^{64}}} \\ &= \mod{\b · 2^n}_{2^{64}} \end{aligned}

Where \mod{x}_{2^{64}} = x \;\mathrm{mod}\; 2^{64} is the modular operator. Since \b · 2^n is strictly integer, the ceiling has no effect and the operation reduces to a simple right shift:

b <<= n;

\begin{aligned} \r'' &= \floor{h'' · 2^{64}} - \ceil{l'' · 2^{64}} - 1 \\ &= \floor{h'' · 2^{64}} - \b'' - 1 \\ &= \floor{\p{\frac{\b + \r + 1}{2^{64}⁻^n} - \floor{\frac{\b}{2^{64}⁻^n}}} · 2^{64}} - \b'' - 1 \\ &= \floor{\p{\b + \r + 1}·2^n - \floor{\frac{\b}{2^{64}⁻^n}}· 2^{64} } - \b'' - 1 \\ \end{aligned}

\begin{aligned} &= \floor{\p{\frac{\b·2^n}{2^{64}} - \floor{\frac{\b·2^n}{2^{64}}}}· 2^{64} + \p{\r + 1}·2^n} - \b'' - 1 \\ &= \floor{\mod{\b·2^n}_{2^{64}} + \p{\r + 1}·2^n} - \b'' - 1 \\ &= \floor{\b'' + \p{\r + 1}·2^n} - \b'' - 1 \\ &= \r·2^n + 2^n - 1 \\ \end{aligned}

This is essentially shifting \r to the right n places while shifting in ones. We don't need to worry about overflow here because \r is strictly less than 2^{64-n}.

r <<= n;
r += (1UL << n) - 1;

Appendix: 128 Bit arithmetic

pair<uint64_t, uint64_t> mul128_emu(uint64_t a, uint64_t b)
{
	using uint64_t;
	const uint64_t u1 = (a & 0xffffffff);
	const uint64_t v1 = (b & 0xffffffff);
	uint64_t t = (u1 * v1);
	const uint64_t w3 = (t & 0xffffffff);
	uint64_t k = (t >> 32);
	
	a >>= 32;
	t = (a * v1) + k;
	k = (t & 0xffffffff);
	uint64_t w1 = (t >> 32);
	
	b >>= 32;
	t = (u1 * b) + k;
	k = (t >> 32);
	
	const uint64_t h = (a * b) + w1 + k;
	const uint64_t l = (t << 32) + w3;
	return make_pair(h, l);
}

pair<uint64_t, uint64_t> add128(uint64_t h, uint64_t l, uint64_t n)
{
	l += n;
	h += (l < n) ? 1 : 0;
	return make_pair(h, l);
}

Remco Bloemen
Math & Engineering
https://2π.com