[ot][spam][crazy][data] transformer model 'attention' improvement

Wed Jan 26 07:04:22 PST 2022

the current mainstream model for very long sequences appears to be
bigbird, and there is a pretrained model for long document
summarization: https://github.com/google-research/bigbird
https://huggingface.co/docs/transformers/model_doc/bigbird

on to perceiver.  i'm thinking of actually adding a configuration
directive to huggingface for using efficient attention, and opening a
pull request if there isn't one already, to see what they say.