The full weight files weigh in at 308033580802 bytes (286.88 GiB). The slim weight files, which usually means precision is reduced to float16 (sometimes float8), weight in at 41112854242 bytes (38.29 GiB).
Just a note that I might be wrong here about what full and slim mean.
Traditionally the entire model is loaded into VRAM to evaluate it, although it can also be streamed in and out or distributed across multiple machines with some hacks. There is additional overhead than just the weights, and significantly additional overhead if the model is further being trained for a specific task.
Can also add that people have been training models on low-end hardware by tracing and training only a subset of the parameters at once. Traditionally all are trained at once. Systems also support a form of checkpointing that discards and regenerates the derivatives when needed, as I've mentioned in a spamlog somewhere.