below is as far as i got. when i tried running it on xsum, i ran out of gpu ram. i imagine there are a number of ways to address that. i'm thinking a simplest approach might be to use a smaller model, even if it doesn't have the long context support. another idea is to produce a shorter set of encodings of the data. this might mesh with other uses. i didn't try using the model in a forward manner, so i don't know whether it actually succeeded for sure. # training git clone https://github.com/xloem/adapter-transformers --branch=longt5 pip3 install ./adapter-transformers for ((ct=0; ct<4; ct++)); do echo a,b >> test.csv; done python3 adapter-transformers/examples/pytorch/summarization/run_summarization.py --model_name_or_path google/long-t5-tglobal-base --do_train --do_eval --output_dir test --per_device_train_batch_size=1 --per_device_eval_batch_size=1 --overwrite_output_dir --predict_with_generate --train_file test.csv --validation_file test.csv --train_adapter True --num_train_epochs 100