[spam][crazy][personal] karl activity update

Mon Dec 27 12:18:39 PST 2021

hi

my recent project has been https://github.com/xloem/mempickle .  newer
work in the wip branch.  if past patterns hold i'll likely halt work
after sharing it, dunno.

basically it's the result of me finding that i could work with
transformer models (mainstream AI) with some success, but being stuck
on tiny devices like a cellphone and a raspberry pi.  the primary
function of the code is to convert models into a format that can be
memory mapped to disk, so they can be run on tiny devices rather than
datacenters.  if i kept working on it i would spend more time mostly
speeding them up and shrinking them, also sending work upstream and
making things more compatible, maybe make a cell phone app, etc.

if you're not familiar, there are models out there like GPT-J-6B and
T0pp that can basically pretend to be a human being convincingly, meet
arbitrary instructions, or do other exotically impressive things if
you know how to use them, but they are 12-80 GB large and pass through
_both_ system and graphics ram as a single large chunk when run, so
nobody is using them but researchers, corps, and obsessed hobbyists.

by memory mapping them to disk they can be run on a device without
RAM, using the cpu's memory mapper.  this is still very slow, on my
raspberry pi gpt-j-6b was generating 10 essays in parallel at a rate
of about 30min/10words.  [on the cpu. raspberry pis have a gpu that
could accelerate it, support is unimplemented atm.]  the algorithm
processes data in large sequential blocks that are loaded from disk
(each one roughly being a single matrix multiplication), so 10 in
parallel doesn't immediately translate to 1 sequentially 10 times
faster at all, without more algorithmic work, because it takes the
same amount of time to load the matrix data regardless of the number
of multiplies made.

- .ptmap is an ad-hoc format for memory mapping the data (the existing
format is zip compressed so harder to map).  the tool can convert from
the normal format
- i've uploaded some cutting edge models converted to my format to
https://huggingface.co/baffo32 .  not all of those are uploaded yet as
i'm on residential internet on a raspberry pi that usually crashes
before upload completes, and i have bursts of amnesia throughout the
day where i forget what i am doing and planning.
- the code includes some hacks to successfully work around crashes
facebook's ai library demonstrates on a raspberry pi due to some math
instructions not being supported.  it's better to rebuild the library
to fix these but i haven't managed to get the build to finish.  as of
yesterdayish the workarounds in the wip branch include gradient
calculation so new models can also be slowly trained on a raspberry pi
too.
- i also found weights on some of the models were fragmented and i
defragmented them and sorted them based on use which does improve
speed for some models especially on my rotating media.  this is done
if 'follow forward calls' is enabled in the wip branch which also
attempts to preload weights from disk before they are needed based on
the order of use.  the progress output in that code is buggy; i was
drafting it while coding it to work through the issues and it would be
more organised to be redone.
- example.py in the repo generates some text using the old and small
gpt2 model, which can do it on a raspberry pi fast enough to watch it
happen enjoyable, imo.
- to use a larger model in example.py, you can likely replace
'baffo32/gpt2-ptmap' with 'baffo32/gpt-j-6B-ptmap' and get more
intelligent text at a much slower rate.  note: i have not tested this.
my test setup is different and i may have made further changes to get
success.