7 Jan
2023
7 Jan
'23
8:16 a.m.
i think it was because i exhausted gpu ram in that torch session this fits on a 2GB gpu (and gives poor resullts, which is expected): import transformers, torch bloomz = transformers.pipeline('text-generation', 'bigscience/bloomz-560m') bloomz.model.to(torch.bfloat16) bloomz.model.to('cuda') bloomz.device = bloomz.model.device There are some code around for finding the largest model that fits in available ram. Some of mine is at https://github.com/xloem/test_matrix_bot/blob/main/module_rwkv.py . It uses this: MEMORY_BOUND = min(torch.cuda.mem_get_info()[0], psutil.virtual_memory().available // 2) You then multiply the parameter count of each model by the datatype and compare to the memory bound.