25 Jan
2022
25 Jan
'22
9:40 a.m.
- a large T5 model could be tpu compiled on colab notebooks by calling pmap() on individual blocks rather than the whole model - much larger models could be trained by masking the training weights to reduce autograd memory load as has been done for at-home training of large text generation models