26 Jan
2022
26 Jan
'22
3:04 p.m.
the current mainstream model for very long sequences appears to be bigbird, and there is a pretrained model for long document summarization: https://github.com/google-research/bigbird https://huggingface.co/docs/transformers/model_doc/bigbird on to perceiver. i'm thinking of actually adding a configuration directive to huggingface for using efficient attention, and opening a pull request if there isn't one already, to see what they say.