let's check out perceiver's masked language modeling architecture just a little bit PerceiverForMaskedLM (in github.com/transformers/transformers file src/transformers/models/perceiver/modeling_perceiver.py ) i'm skipping down to the implementation of the forward() function, as this will give me a quick description of what the model does with its parts. summary in pseudocode: outputs = self.perceiver(inputs) logits = self.embedding_decoder(selects output part depending on flag) loss = crossentropyloss(logits, labels) return logits and loss ok so it basically has a 'perceiver' member that does the bulk work, and an 'embedding decoder' that postprocesses the data. so i flip up to __init__ when these objects will be created self.perceiver = PerceiverModel( input_preprocessor = PerceiverTextPreprocessor(), decoder = PerceiverBasicDecoder(lots of dimension information) ) self.embedding_decoder = PerceiverEmbeddingDecoder() so it looks like there are 4 parts here the information likely flows through. 1. Maybe PerceiverTextPreprocessor processes the input 2. Maybe PerceiverModel processes the preprocessed data 3. Maybe PerceiverBasicDecoder post-processes the data 4. Finally PerceiverEmbeddingDecoder likely converts high-dimensioned outputs to simple byte probabilities Next is to look at PerceiverModel's forward function to see how important the non-parameterised behavior of the class is.