A~: original training set with statistical nearness to A 3 Background - Product Quantization - PQ is classic algorithm for approximating inner products and euclidean distances - commonly used for tasks like this a * b is approximated by a^ * b, where a^ has a special structure for quick computation, and is close to a. a^ is formed by "concatenating learned prototypes in disjoint subspaces" dot products are precomputed between b and the a^ prototypes, and reused for many a near a^. so a must be from a set of vectors that are all near some known vector a^; precomputations are made with a^ . a is A's rows and b is B's columns. a is transposed. 1. Prototype Learning K-means is used C times to cluster the rows of A~. The clusters become C sets of K prototypes. 2. Encoding Function g(a) The closest prototype to a is picked for each subspace. These are stored as integer indices using C log2(K) bits. 3. Table Construction h(B) Each subspace precomputes or caches the dot product between b and each prototype. This makes C lookup tables of size K. 4. Aggregation f(,) Above indices and tables are used to lookup a * b in each subspace: an estimated partial. The results are then summed across all C subspaces. --- The next section elaborates on the above 4 steps in detail with summation formulae. I'll probably take a bit to collect my thoughts. They basically described almost the entire algorithm in those 4 points, almost enough for implementation. Maybe I'll look for any missing bits, or information to help one think about generalising it to other operations. There's a concern that this biases the data toward a^ and A~. I wasn't expecting that. Very tuned toward having well-sampled a known space of data. Halting use and updating a^ and A~ and C and the prototypes with newly found outliers, and ensuring these outliers have prototypes, would be very important for keeping results accurate. I infer this relates to the theoretical quality guarantees.