Perfect ending for entropy coding

Remco Bloemen

Finitely odd numbers

Here we use the ideas of Matt Timmermans and David A. Scott.

In the above, we derived a method for generating an infinite stream of bits representing a real number that encodes our symbols. If the stream of symbols is finite, we want to end the stream of bits. We do this by picking a so called finitely odd number, these numbers:

0.000000000000000.100000000000000.010000000000000.110000000000000.001000000000000.011000000000000.101000000000000.111000000000000.000100000000000.000100 \begin{aligned} &\mathtt{0.00000000000000}\ldots \\ &\mathtt{0.10000000000000}\ldots \\ &\mathtt{0.01000000000000}\ldots \\ &\mathtt{0.11000000000000}\ldots \\ &\mathtt{0.00100000000000}\ldots \\ &\mathtt{0.01100000000000}\ldots \\ &\mathtt{0.10100000000000}\ldots \\ &\mathtt{0.11100000000000}\ldots \\ &\mathtt{0.00010000000000}\ldots \\ &\phantom{\mathtt{0.000100}}\vdots \end{aligned}

Note that they are not sorted as ordinary real numbers, they are rather sorted in a form of shortlex order. This is important to us because we want to encode the stream of symbols as the shortest possible bit string.

We store the first available finitely odd number that uniquely encodes the sequence of symbols. This means the number we pick must be in the final interval. But this number will also be in the final interval of all the prefixes of our current symbol sequence. To solve this problem we do the following:

Ending a bit streams

At any point in encoding, the written bit stream can be sorted in four states:

00+11+ \begin{aligned} \ldots \mathtt{0}^- & \\ \ldots \mathtt{0}^+ & \\ \ldots \mathtt{1}^- & \\ \ldots \mathtt{1}^+ & \\ \end{aligned}

Where +^+ denotes that we can add a carry and ^- means we can not. The empty bit stream, the initial state, can be considered part of 1\mathtt{1}^- .

Insert state diagram.

Besides any potential bits to output, we can also decide to trigger the carry or not. To denote endings we leave off the infinite tail of zeros and prefix a 1.\mathtt{1.} if we want to trigger the carry and 0.\mathtt{0.} if we don’t. The ordered sequence of valid endings for each state are:

0\mathtt{0}^- 0+\mathtt{0}^+ 1\mathtt{1}^- 1+\mathtt{1}^+
0.\mathtt{0.} 0.\mathtt{0.} 0.\mathtt{0.} 1.\mathtt{1.}
0.1\mathtt{0.1} 1.\mathtt{1.} 0.1\mathtt{0.1} 0.\mathtt{0.}
0.01\mathtt{0.01} 0.1\mathtt{0.1} 0.01\mathtt{0.01} 0.1\mathtt{0.1}
0.11\mathtt{0.11} 1.1\mathtt{1.1} 0.11\mathtt{0.11} 1.1\mathtt{1.1}
0.001\mathtt{0.001} 0.01\mathtt{0.01} 0.001\mathtt{0.001} 0.01\mathtt{0.01}
0.011\mathtt{0.011} 0.11\mathtt{0.11} 0.011\mathtt{0.011} 0.11\mathtt{0.11}
0.101\mathtt{0.101} 1.01\mathtt{1.01} 0.101\mathtt{0.101} 1.01\mathtt{1.01}
0.111\mathtt{0.111} 1.11\mathtt{1.11} 0.111\mathtt{0.111} 1.11\mathtt{1.11}
0.0001\mathtt{0.0001} 0.001\mathtt{0.001} 0.0001\mathtt{0.0001} 0.001\mathtt{0.001}
\;\;\vdots \;\;\vdots \;\;\vdots \;\;\vdots

The situation for 0\mathtt{0}^- and 1\mathtt{1}^- are identical. The sequence for 0+\mathtt{0}^+ and 1+\mathtt{1}^+ only differs in the first entry. With the exception of the first two items in 1+\mathtt{1}^+ , these sequences are simply the finitely odd numbers, directly or shifted to the left once.

The endings also need to fall in the range [h,l)\left[{h, l}\right), further trimming the set of possibilities.

Updating and pruning the reserved set

The set EE can currently only grow, and does so with one entry for every symbol. As we start outputting bits and carries we need to update the set. Assume the set only contains valid entries, then for each of the ten possible transitions we do the following:

  1. Writing a carry, 0+1\mathtt{0}^+ \rightarrow \mathtt{1}^- :

    0+\mathtt{0}^+ \rightarrow 1\mathtt{1}^-
    0.\mathtt{0.}
    1.\mathtt{1.} 0.\mathtt{0.}
    0.1\mathtt{0.1}
    1.1\mathtt{1.1} 0.1\mathtt{0.1}
    0.01\mathtt{0.01}
    0.11\mathtt{0.11}
    1.01\mathtt{1.01} 0.01\mathtt{0.01}
    1.11\mathtt{1.11} 0.11\mathtt{0.11}
    0.001\mathtt{0.001}
    \;\;\vdots \;\;\vdots
  2. Writing a carry, 1+0\mathtt{1}^+ \rightarrow \mathtt{0}^- :

    1+\mathtt{1}^+ \rightarrow 0\mathtt{0}^-
    1.\mathtt{1.} 0.\mathtt{0.}
    0.\mathtt{0.}
    0.1\mathtt{0.1}
    1.1\mathtt{1.1} 0.1\mathtt{0.1}
    0.01\mathtt{0.01}
    0.11\mathtt{0.11}
    1.01\mathtt{1.01} 0.01\mathtt{0.01}
    1.11\mathtt{1.11} 0.11\mathtt{0.11}
    0.001\mathtt{0.001}
    \;\;\vdots \;\;\vdots
  3. Writing a zero, 00+\mathtt{0}^- \rightarrow \mathtt{0}^+ :

    0\mathtt{0}^- \rightarrow 0+\mathtt{0}^+
    0.\mathtt{0.} 0.\mathtt{0.}
    0.1\mathtt{0.1}
    0.01\mathtt{0.01} 0.1\mathtt{0.1}
    0.11\mathtt{0.11}
    0.001\mathtt{0.001} 0.01\mathtt{0.01}
    0.011\mathtt{0.011} 0.11\mathtt{0.11}
    0.101\mathtt{0.101}
    0.111\mathtt{0.111}
    0.0001\mathtt{0.0001} 0.001\mathtt{0.001}
    \;\;\vdots \;\;\vdots
  4. Writing a zero, 0+0+\mathtt{0}^+ \rightarrow \mathtt{0}^+ :

    0+\mathtt{0}^+ \rightarrow 0+\mathtt{0}^+
    0.\mathtt{0.} 0.\mathtt{0.}
    1.\mathtt{1.}
    0.1\mathtt{0.1} 1.\mathtt{1.}
    1.1\mathtt{1.1}
    0.01\mathtt{0.01} 0.1\mathtt{0.1}
    0.11\mathtt{0.11} 1.1\mathtt{1.1}
    1.01\mathtt{1.01}
    1.11\mathtt{1.11}
    0.001\mathtt{0.001} 0.01\mathtt{0.01}
    \;\;\vdots \;\;\vdots
  5. Writing a zero, 10+\mathtt{1}^- \rightarrow \mathtt{0}^+ :

    1\mathtt{1}^- \rightarrow 0+\mathtt{0}^+
    0.\mathtt{0.} 0.\mathtt{0.}
    0.1\mathtt{0.1} 1.\mathtt{1.}
    0.01\mathtt{0.01} 0.1\mathtt{0.1}
    0.11\mathtt{0.11} 1.1\mathtt{1.1}
    0.001\mathtt{0.001} 0.01\mathtt{0.01}
    0.011\mathtt{0.011} 0.11\mathtt{0.11}
    0.101\mathtt{0.101} 1.01\mathtt{1.01}
    0.111\mathtt{0.111} 1.11\mathtt{1.11}
    0.0001\mathtt{0.0001} 0.001\mathtt{0.001}
    \;\;\vdots \;\;\vdots
  6. Writing a zero, 1+0+\mathtt{1}^+ \rightarrow \mathtt{0}^+ :

    1+\mathtt{1}^+ \rightarrow 0+\mathtt{0}^+
    1.\mathtt{1.}
    0.\mathtt{0.} 0.\mathtt{0.}
    0.1\mathtt{0.1} 1.\mathtt{1.}
    1.1\mathtt{1.1}
    0.01\mathtt{0.01} 0.1\mathtt{0.1}
    0.11\mathtt{0.11} 1.1\mathtt{1.1}
    1.01\mathtt{1.01}
    1.11\mathtt{1.11}
    0.001\mathtt{0.001} 0.01\mathtt{0.01}
    \;\;\vdots \;\;\vdots
  7. Writing a one, 01\mathtt{0}^- \rightarrow \mathtt{1}^- :

    0\mathtt{0}^- \rightarrow 1\mathtt{1}^-
    0.\mathtt{0.}
    0.1\mathtt{0.1} 0.\mathtt{0.}
    0.01\mathtt{0.01}
    0.11\mathtt{0.11} 0.1\mathtt{0.1}
    0.001\mathtt{0.001}
    0.011\mathtt{0.011}
    0.101\mathtt{0.101} 0.01\mathtt{0.01}
    0.111\mathtt{0.111} 0.11\mathtt{0.11}
    0.0001\mathtt{0.0001}
    \;\;\vdots \;\;\vdots
  8. Writing a one, 0+1+\mathtt{0}^+ \rightarrow \mathtt{1}^+ :

    0+\mathtt{0}^+ \rightarrow 1+\mathtt{1}^+
    0.\mathtt{0.}
    1.\mathtt{1.} 1.\mathtt{1.}
    0.1\mathtt{0.1} 0.\mathtt{0.}
    1.1\mathtt{1.1}
    0.01\mathtt{0.01}
    0.11\mathtt{0.11} 0.1\mathtt{0.1}
    1.01\mathtt{1.01} 1.1\mathtt{1.1}
    1.11\mathtt{1.11}
    0.001\mathtt{0.001}
    \;\;\vdots \;\;\vdots
  9. Writing a one, 11\mathtt{1}^- \rightarrow \mathtt{1}^- :

    1\mathtt{1}^- \rightarrow 1\mathtt{1}^-
    0.\mathtt{0.}
    0.1\mathtt{0.1} 0.\mathtt{0.}
    0.01\mathtt{0.01}
    0.11\mathtt{0.11} 0.1\mathtt{0.1}
    0.001\mathtt{0.001}
    0.011\mathtt{0.011}
    0.101\mathtt{0.101} 0.01\mathtt{0.01}
    0.111\mathtt{0.111} 0.11\mathtt{0.11}
    0.0001\mathtt{0.0001}
    \;\;\vdots \;\;\vdots
  10. Writing a one, 1+1+\mathtt{1}^+ \rightarrow \mathtt{1}^+ :

    1+\mathtt{1}^+ \rightarrow 1+\mathtt{1}^+
    1.\mathtt{1.} 1.\mathtt{1.}
    0.\mathtt{0.}
    0.1\mathtt{0.1} 0.\mathtt{0.}
    1.1\mathtt{1.1}
    0.01\mathtt{0.01}
    0.11\mathtt{0.11} 0.1\mathtt{0.1}
    1.01\mathtt{1.01} 1.1\mathtt{1.1}
    1.11\mathtt{1.11}
    0.001\mathtt{0.001}
    \;\;\vdots \;\;\vdots

We observe that the transitions preserve the order!