Someone else linked that elsewhere in the comments and while it's certainly a nice visual it seems like it's not accurately portraying the paper. Isn't the grid supposed to have a weird alignment that depends on the bit depth? And there's supposed to be a second quantization step involving the residual.
Fair point. I've updated the animation to address this. The grid now uses the correct non-uniform centroids (optimal for the arcsine distribution in 2D), so you'll see grid lines cluster near the edges where unit-circle coordinates actually concentrate, rather than being evenly spaced. The spacing does change with bit depth.
On the second quantization step: the paper's inner-product variant uses (b-1) bits for the MSE quantizer shown here, then applies a 1-bit QJL (Quantized Johnson-Lindenstrauss) encoding of the residual to make dot-product estimates unbiased. I chose to omit QJL from the animation to keep it digestible as a visual, but I've added a note calling this out explicitly.
It looks nice! Fair enough about QJL - it seems to be nothing more than an unbiasing measure anyway.
I'm not sure if it's my own misunderstanding or if the paper [0] has something of an error. Section 3.1 starts out to the effect "let x be on the unit hypersphere" (but I'm fairly certain it's actually not). Neither algorithm 1 nor algorithm 2 show a normalization step prior to rotating x. Algorithm 2 line 8 shows that the scalar returned is actually the magnitude of the residual without accounting for QJL.
Anyway I'm pretty sure the authors inadvertently omitted that detail which really had me confused for a while there.
IIUC, The paper's notation S^(d-1) means the unit sphere in R^d (e.g., the familiar unit circle is S^1 living in R^2). So, i think, x in the algorithm is already a unit vector.
Reference:
Section 2:Preliminaries
...
We use the notation S^d−1 to denote the hypersphere in R^d of radius 1.
Section 3.1
Let x ∈ S^d−1 be a (worst-case) vector on the unit sphere in dimension d.
Right but in reality IIUC w ∈ R^d and it's x = w / ||w|| ∈ S^(d-1) and then given r = x - Qmse^-1( Qmse( x ) ) the scalar you use is derived as ||r|| (I'm missing a couple subscript twos there I think).
I was primarily aiming to confirm my understanding given the author's omission but also the scalar is subtly different than in your linked explanation (although conceptually equivalent).