6 Apr
2023
6 Apr
'23
7:33 p.m.
also llama.cpp is better in many ways but in python with huggingface accelerate and the transformers package it will spread between gpu and cpu ram, giving more total ram, if you pass device_map='auto', and it will use fast mmap loading if you use a safetensors model note that huggingface's libs do tend to be somewhat crippled user-focused things, maybe why i know them