hi my recent project has been https://github.com/xloem/mempickle . newer work in the wip branch. if past patterns hold i'll likely halt work after sharing it, dunno. basically it's the result of me finding that i could work with transformer models (mainstream AI) with some success, but being stuck on tiny devices like a cellphone and a raspberry pi. the primary function of the code is to convert models into a format that can be memory mapped to disk, so they can be run on tiny devices rather than datacenters. if i kept working on it i would spend more time mostly speeding them up and shrinking them, also sending work upstream and making things more compatible, maybe make a cell phone app, etc. if you're not familiar, there are models out there like GPT-J-6B and T0pp that can basically pretend to be a human being convincingly, meet arbitrary instructions, or do other exotically impressive things if you know how to use them, but they are 12-80 GB large and pass through _both_ system and graphics ram as a single large chunk when run, so nobody is using them but researchers, corps, and obsessed hobbyists. by memory mapping them to disk they can be run on a device without RAM, using the cpu's memory mapper. this is still very slow, on my raspberry pi gpt-j-6b was generating 10 essays in parallel at a rate of about 30min/10words. [on the cpu. raspberry pis have a gpu that could accelerate it, support is unimplemented atm.] the algorithm processes data in large sequential blocks that are loaded from disk (each one roughly being a single matrix multiplication), so 10 in parallel doesn't immediately translate to 1 sequentially 10 times faster at all, without more algorithmic work, because it takes the same amount of time to load the matrix data regardless of the number of multiplies made. - .ptmap is an ad-hoc format for memory mapping the data (the existing format is zip compressed so harder to map). the tool can convert from the normal format - i've uploaded some cutting edge models converted to my format to https://huggingface.co/baffo32 . not all of those are uploaded yet as i'm on residential internet on a raspberry pi that usually crashes before upload completes, and i have bursts of amnesia throughout the day where i forget what i am doing and planning. - the code includes some hacks to successfully work around crashes facebook's ai library demonstrates on a raspberry pi due to some math instructions not being supported. it's better to rebuild the library to fix these but i haven't managed to get the build to finish. as of yesterdayish the workarounds in the wip branch include gradient calculation so new models can also be slowly trained on a raspberry pi too. - i also found weights on some of the models were fragmented and i defragmented them and sorted them based on use which does improve speed for some models especially on my rotating media. this is done if 'follow forward calls' is enabled in the wip branch which also attempts to preload weights from disk before they are needed based on the order of use. the progress output in that code is buggy; i was drafting it while coding it to work through the issues and it would be more organised to be redone. - example.py in the repo generates some text using the old and small gpt2 model, which can do it on a raspberry pi fast enough to watch it happen enjoyable, imo. - to use a larger model in example.py, you can likely replace 'baffo32/gpt2-ptmap' with 'baffo32/gpt-j-6B-ptmap' and get more intelligent text at a much slower rate. note: i have not tested this. my test setup is different and i may have made further changes to get success.