Figuring out Step 4: Embedding Decoding logits = self.embedding_decoder(outputs, embedding_layer = perceiver.input_preprocessor.embeddings) The .embeddings proeprty of PerceiverTextPreprocessor (the input_preprocessor) is the matrix of tokens or bytes to embedding vectors. Without the position data. # PerceiverEmbeddingDecoder.forward: def forward(self, hidden_states, embedding_layer): batch_size, seq_len, d_model = hidden_states.shape output = torch.matmul(hidden_states.reshape([-1, d_model]), embedding_layer.weight.T) # Flatten batch dim output = output + self.bias return output.reshape([batch_size, seq_len, self.vocab_size]) Basically, the embedding decoder multiplies its input by the transpose of the embeddings and adds a trainable bias. I'm curious about how transposing the embedding weights undoes their indexing property, but it hopefully follows a mathematical meaning of log probability and matrix multiplication. Step 4: Postprocessing Decoder outputs -> EmbeddingDecoder -> Matrix of log probability vectors for each possible output token