Draft summary of PerceiverForMaskedLM: 1. PerceiverTextPreprocessor: Inputs -> Embeddings 2A. PerceiverEncoder: Embeddings + Weights + Attention mask -> PerdeiverAttention(is_cross_attention = True) -> Hidden states 2B. PerceiverEncoder: Hidden states + Attention mask -> layers of PerceiverAttention(is_cross_attention = False) -> Hidden states 3. PerceiverBasicDecoder: Hidden states + Weights + Attention mask -> PerceiverAttention(is_cross_attention = True) -> Decoded embeddings There's an additional Head mask that may be usable to alter properties of the model on a per-layer basis. 4. PerceiverEmbeddingDecoder: Decoded embeddings -> Log probabilities This is a helpful summary of perceiver models after looking through the code to have an idea where to go to engage parts further.