[data science] Ladder Side-Tuning for Parameter and Memory Efficient Transfer Learning

16 Jun 2022

      This is nvidia-oriented research on using less memory with more accuracy
and speed when making new equally powerful transformer models for new tasks
off of existing large pretrained ones, on lower end systems and with less
data.

https://arxiv.org/abs/2206.06522

https://github.com/ylsung/Ladder-Side-Tuning

We propose Ladder Side-Tuning (LST), a new parameter-efficient
transfer learning (PETL) technique that reduces training memory
requirements by more substantial amounts. Unlike existing PETL methods
that insert additional parameters inside backbone networks, we train a
ladder side network, a small and separate network that takes
intermediate activations as input via shortcut connections (ladders)
from backbone networks and makes predictions.

On both GLUE and VL tasks, LST saves 2.7x more memory than other PETL
methods. To further show the advantage of this better memory
efficiency, we also apply LST to larger T5 models (T5-large, T5-3B),
attaining better GLUE performance than full fine-tuning and other PETL
methods.

Undiscussed Horrific Abuse, One Victim of Many

tags

participants (1)