[ml] langchain runs local model officially
Undescribed Horrific Abuse, One Victim & Survivor of Many
gmkarl at gmail.com
Thu Apr 6 12:33:32 PDT 2023
also llama.cpp is better in many ways
but in python with huggingface accelerate and the transformers package
it will spread between gpu and cpu ram, giving more total ram, if you
pass device_map='auto', and it will use fast mmap loading if you use a
safetensors model
note that huggingface's libs do tend to be somewhat crippled
user-focused things, maybe why i know them
More information about the cypherpunks
mailing list