[ot][crazy][spam] Notes: Matmul Acceleration

Sat Jun 25 03:38:05 PDT 2022

plan: review source implementing this to understand PQ 1-4 better.
comparing with sources reduces errors in exposure to new material.

Encoding Function g(A) detail:
the sequence of indices of the K-means centroids in a are called a's
_encoding_. the K centroids are called a _codebook_. a is considered
to be composed of _subvectors_ among C [i think].

Table Construction h(B) detail
- K <= 16 and 8-bit quantized lookup tables commonly known as offering
enormous speedups compared to other choices. 8-bit integers provide
for more parallelism in SIMD.
- 8-bit quantization done by subtracting minimums per table and
linearly rescaling. maximum per-table entropy kept <= 255. invertible
affine transform. see: appendix A

Aggregation f(,) detail:

Summation of selected encodings with b, rather than original multiplication.
==
equation makes it kind of look like they basically precalculate a
bunch of dot products with near data, and select from among these
precalculated dot products to form the result matrix.
this may be all that is going on.
the approach seems it would inform well the kind of information
processing that is happening, to an algorithm dev. future work might
imply that transformer models that use matrices can be replaced by
lookup tables and bit encodings, or trees of branch conditions with
names that are humanly meaningful.
i think apple released some research recently replacing a large part
of transformers with something simpler.
==