This is nvidia-oriented research on using less memory with more accuracy and speed when making new equally powerful transformer models for new tasks off of existing large pretrained ones, on lower end systems and with less data. [1]https://arxiv.org/abs/2206.06522 [2]https://github.com/ylsung/Ladder-Side-Tuning We propose Ladder Side-Tuning (LST), a new parameter-efficient transfer learning (PETL) technique that reduces training memory requirements by more substantial amounts. Unlike existing PETL methods that insert additional parameters inside b ackbone networks, we train a ladder side network, a small and separate network t hat takes intermediate activations as input via shortcut connections (ladders) f rom backbone networks and makes predictions. On both GLUE and VL tasks, LST saves 2.7x more memory than other PETL methods. T o further show the advantage of this better memory efficiency, we also apply LST to larger T5 models (T5-large, T5-3B), attaining better GLUE performance than f ull fine-tuning and other PETL methods. References 1. https://arxiv.org/abs/2206.06522 2. https://github.com/ylsung/Ladder-Side-Tuning