ok it doesn't download all the files, looks like the total size is about 12g, 3 files. 1607
now it fails to run because it tries to allocate 8GB on my 2GB gpu. maybe i can make it do cpu-only 1608 but i worry i'm out of posts for today
i seem to be okay so far for today, no bounce messages yet :s might have only a few left 1608 1609 hrm and https://health.petals.dev shows no available providers or working models for me :s 1610 1624 ok i debugged it and it's not trying to put it on my gpu, it's trying to put in my normal ram, and it fails to allocate 8GB of normal ram. it looks like the 8GB weight is the 'lm_head' which i think is the linear layer at the bottom of the model that converts the word property logits to logits that predict the next token. comment on linear layer, which is used a lot "linear layer" seems to be a machine-learning-neural-networks term for "matrix". it's just a plain matrix that takes N inputs and produces M outputs (or vice versa) and has N*M floats in it. so it's incredibly simple and changing it is very easy because all it is doing is taking a dot product with a constant vector to produce each output. 1625 considering this a little, i realize this shows another bug in the use of safetensors: it shouldn't need to reallocate data that is mmap'd. so i guess i'd better check why this is happening. 1626 the code is trying to convert the bfloat16 weights into float32. this could lose a lot of information imo bfloat16 is way better at representing log ranges. but that's why it allocates all the ram. i wonder why it's float32ing it. oops i was thinking float16. but float32 doubles the size. i gotta pee! 1633 1627 srcline is /home/user/.local/lib/python3.12/site-packages/transformers/modeling_utils.py(899)_load_state_dict_into_meta_model() transformers==4.43.1 here's the backtrace: /home/user/.local/lib/python3.12/site-packages/petals/utils/auto_config.py(79)from_pretrained() -> return super().from_pretrained(model_name_or_path, *args, revision=revision, **kwargs) /home/user/.local/lib/python3.12/site-packages/petals/utils/auto_config.py(52)from_pretrained() -> return proper_cls.from_pretrained(model_name_or_path, *args, **kwargs) /home/user/.local/lib/python3.12/site-packages/petals/client/from_pretrained.py(31)from_pretrained() -> return super().from_pretrained(model_name_or_path, *args, low_cpu_mem_usage=low_cpu_mem_usage, **kwargs) /home/user/.local/lib/python3.12/site-packages/transformers/modeling_utils.py(3903)from_pretrained() -> ) = cls._load_pretrained_model( /home/user/.local/lib/python3.12/site-packages/transformers/modeling_utils.py(4377)_load_pretrained_model() -> new_error_msgs, offload_index, state_dict_index = _load_state_dict_into_meta_model(
/home/user/.local/lib/python3.12/site-packages/transformers/modeling_utils.py(899)_load_state_dict_into_meta_model()->None -> param = param.to(old_param.dtype)