[spam] [personal] perceiver model notes
k
gmkarl at gmail.com
Tue Jan 18 07:40:39 PST 2022
Figuring out Step 4: Embedding Decoding
logits = self.embedding_decoder(outputs, embedding_layer =
perceiver.input_preprocessor.embeddings)
The .embeddings proeprty of PerceiverTextPreprocessor (the
input_preprocessor) is the matrix of tokens or bytes to embedding
vectors. Without the position data.
# PerceiverEmbeddingDecoder.forward:
def forward(self, hidden_states, embedding_layer):
batch_size, seq_len, d_model = hidden_states.shape
output = torch.matmul(hidden_states.reshape([-1, d_model]),
embedding_layer.weight.T) # Flatten batch dim
output = output + self.bias
return output.reshape([batch_size, seq_len, self.vocab_size])
Basically, the embedding decoder multiplies its input by the transpose
of the embeddings and adds a trainable bias.
I'm curious about how transposing the embedding weights undoes their
indexing property, but it hopefully follows a mathematical meaning of
log probability and matrix multiplication.
Step 4: Postprocessing
Decoder outputs -> EmbeddingDecoder -> Matrix of log probability
vectors for each possible output token
More information about the cypherpunks
mailing list