[ot][spam][crazy][data] transformer model 'attention' improvement
k
gmkarl at gmail.com
Wed Jan 26 07:04:22 PST 2022
the current mainstream model for very long sequences appears to be
bigbird, and there is a pretrained model for long document
summarization: https://github.com/google-research/bigbird
https://huggingface.co/docs/transformers/model_doc/bigbird
on to perceiver. i'm thinking of actually adding a configuration
directive to huggingface for using efficient attention, and opening a
pull request if there isn't one already, to see what they say.
More information about the cypherpunks
mailing list