[spam] [personal] perceiver model notes

Tue Jan 18 06:48:24 PST 2022

I typed Step 2 almost fully out here, but the browser window left and
it has disappeared.

Anyway, in Step 2 the input data Embedding Vectors are fed into
PerceiverEncoder as "inputs".

PerceiverEncoder mutates them using "cross attention" with the passed
"hidden states" which appear to be the "embedding" property of
PerceiverModel, left to be trainable in PerceiverForMaskedLM, then
passes them through a sequence of "self attention" layers.

Step 2: Encoding

Embedding vectors + attention mask => PerceiverEncoder.cross_attention
-> loop: [hidden states -> PerceiverEncoder.self_attends -> hidden
states] -> hidden states
l
Hidden states are just tensors i.e. ndimensional arrays of numbers.

Simplification:

Masked embedding vectors -> PerceiverEncoder Cross Attention with
Embedding parameters -> PerceiverEncoder Self Attention stacks ->
Encoded hidden states

But really the masking is done inside PerceiverEncoder.