[ot][crazy][spam] Notes: Matmul Acceleration

25 Jun 2022

      Goal: Take implementation-oriented notes on
https://arxiv.org/pdf/2106.10860.pdf .

My understanding of the approach is that it lossily compresses the
data into a smaller representation that contains most of the big ends
of the information, combines in a way that involves no add-multiplies,
then decompresses it and produces a batch matmul result.

The goal is to understand the derivation and implementation enough to
daydream about implementing a convolution operation rather than matrix
multiplication. [multiple people are working on convolution; i don't
know whether i will find their results]

Planning to skip the abstract, then start through the intro.

1 Introduction

- Approximate Matrix Multiplication (AMM)
- assumptions: tall, dense, held in a single address space
- A is large data, B is linear operator such as model weights
- nonlinear preprocessing reduces problem to table lookups
- no multiply-adds if B is constant
- a family of quantizers used rather than a single expensive on
- 3 parts:
1. high-speed quantization functions
2. high-speed low-bitwidth stable summation algorithm
3. matmul algorithm using 1&2 with theoretical quality guarantees

1.1 Problem Formulation

A: NxD
B: DxM

N >> D >= M

tau: computation time bound

Problem formulation to minimize error epsilon for computation time bound tau:
f(g(A), h(B)) * alpha + Beta - AB

[interruption]

Undiscussed Horrific Abuse, One Victim of Many

Undiscussed Horrific Abuse, One Victim of Many

Undiscussed Horrific Abuse, One Victim of Many

Undiscussed Horrific Abuse, One Victim of Many

Undiscussed Horrific Abuse, One Victim of Many

Undiscussed Horrific Abuse, One Victim of Many

Undiscussed Horrific Abuse, One Victim of Many

tags

participants (1)