from wrong thread, below. i got adapter finetuning to run without crashing by simply using the existing script. $ cd adapter-transformers/examples/pytorch/text-classification $ { echo input,label; for ((b=0; b<16; b++)); do echo -e 'one,1\ntwo,2\nthree,3'; done; } > test.csv $ python3 run_glue.py --model_name_or_path bert-base-uncased --do_train --do_eval --max_seq_length 128 --per_device_train_batch_size 2 --learning_rate 1e-4 --num_train_epochs 10.0 --train_adapter --adapter_config pfeiffer --output_dir test_output --overwrite_output_dir --train_file test.csv --validation_file test.csv presumably the loss is decreasing as it trains. the parameters are all from the example in their docs. i dropped the batch size to quickly make it run on my old gpu.