[data science] Ladder Side-Tuning for Parameter and Memory Efficient Transfer Learning

Undiscussed Horrific Abuse, One Victim of Many gmkarl at gmail.com
Thu Jun 16 03:57:48 PDT 2022


This is nvidia-oriented research on using less memory with more accuracy
and speed when making new equally powerful transformer models for new tasks
off of existing large pretrained ones, on lower end systems and with less
data.

https://arxiv.org/abs/2206.06522

https://github.com/ylsung/Ladder-Side-Tuning

We propose Ladder Side-Tuning (LST), a new parameter-efficient
transfer learning (PETL) technique that reduces training memory
requirements by more substantial amounts. Unlike existing PETL methods
that insert additional parameters inside backbone networks, we train a
ladder side network, a small and separate network that takes
intermediate activations as input via shortcut connections (ladders)
from backbone networks and makes predictions.

On both GLUE and VL tasks, LST saves 2.7x more memory than other PETL
methods. To further show the advantage of this better memory
efficiency, we also apply LST to larger T5 models (T5-large, T5-3B),
attaining better GLUE performance than full fine-tuning and other PETL
methods.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: text/html
Size: 1347 bytes
Desc: not available
URL: <https://lists.cpunks.org/pipermail/cypherpunks/attachments/20220616/75f15c23/attachment.txt>


More information about the cypherpunks mailing list