[spam] [personal] perceiver model notes

Tue Jan 18 07:40:39 PST 2022

Figuring out Step 4: Embedding Decoding

logits = self.embedding_decoder(outputs, embedding_layer =
perceiver.input_preprocessor.embeddings)

The .embeddings proeprty of PerceiverTextPreprocessor (the
input_preprocessor) is the matrix of tokens or bytes to embedding
vectors.  Without the position data.

# PerceiverEmbeddingDecoder.forward:
    def forward(self, hidden_states, embedding_layer):
        batch_size, seq_len, d_model = hidden_states.shape
        output = torch.matmul(hidden_states.reshape([-1, d_model]),
embedding_layer.weight.T)  # Flatten batch dim
        output = output + self.bias

        return output.reshape([batch_size, seq_len, self.vocab_size])

Basically, the embedding decoder multiplies its input by the transpose
of the embeddings and adds a trainable bias.

I'm curious about how transposing the embedding weights undoes their
indexing property, but it hopefully follows a mathematical meaning of
log probability and matrix multiplication.

Step 4: Postprocessing

Decoder outputs -> EmbeddingDecoder -> Matrix of log probability
vectors for each possible output token