[ot][spam][crazy] notes/daydreams can maddness do convolution

Sat Jun 25 04:39:09 PDT 2022

i was reading looking at this wonky algorithm i found online and
pursued a little in discord
it's biggest bottleneck is a call to conv1d(). the author
cuda-accelerated it by hand; they are new to cuda.

thinking about conv1d via this mithral/maddness stuff, dunno.

in a matrix, one combines columns of A with rows of B or something like that.
so, the algorithm organizes a bunch of columns and rows, and I think
it just precalculates their dotproducts, and then does something
equivalent to interpolating the precalculations to do future stuff.

if that's true, there's likely a basic analogy with conv1d if i can
move the concepts through my mind. if it's not true, those concepts
will still do some of the process work for whatever the case actually
is. not sure what happened to the multiplication step of the
interpolation, if that's what it is, but it seems it has been sorted
out.

with convolution of 1d data, basically there a bunch of dot products
taken between sequences that change in length.
[ 1 2 3 4 ... 49 ] convolved with a kernel with indices [ 1 2 3 ... 7 ]
i think it produces many dot products, all of length 7, along the
length of the data.

given the conv1d is performed inside a trained model, the data can
indeed be collected from the training.
basically, the kernel is roughly a matrix, with all the rows the same.

i'm guessing you could reuse the mithral/maddness approach with
convolution by turning the convolution kernel into a matrix, and
extracting all the sequential chunks of the data to process.

it's a naive approach, since the data is so repetitive. but it woudl
work well enough to consider. to consider it better, one might think
of how each data item is multiplied by every item in the kernel, and
summed with its neighbor item.

it's interesting to me that in the research they found they could
lookup faster than multiply. maybe relates to how near the data is.
i've probably often worked with fragmented memory.