[spam] [personal] perceiver model notes

Thu Jan 6 07:24:15 PST 2022

- perceiver was made by google; it's a specific attention tweak to
encoder/decoder transformer, i think
- perceiver is now in huggingface in src/transformers/models/perceiver

some of the classes, for the second half of the pipeline, also preprocessors

PerceiverAbstractDecoder: decoder base class
PerceiverMultimodalDecoder: decodes model results into many forms of
simultaneous results
PerceiverBasicVideoAutoencodingDecoder: for models that produce video
PerceiverOpticalFlowDecoder: for models that produce optical flow information
PerceiverClassificationDecoder: for models that produce labels from a fixed set
PerceiverBasicDecoder: a basic decoder for producing data, using cross attention
PerceiverProjectionDecoder: a decoder that does not use cross attention

Conv2DDownsample: downsamples data 4x using torch.nn.Conv2d and padding

PerceiverAbstractPositionEncoding: base class for position encoding
PerceiverTrainablePositonEncoding: position encoding that is trained
PerceiverFourierPositionEncoding: position encoding that produces
normal (fourier sinusoid) position embeddings based on channel
people say that without position embeddings, the channel on which data
is received is not used as information

AbstractPreprocessor: preprocessor base class
PerceiverTextPreprocessor: an embedding encoder for perceiver
PerceiverEmbeddingDecoder: an embedding decoder for perceiver
embeddings are matrices that convert between integer ids (words) and
n-dimensional vectors (points in meaning space)

PerceiverMultimodalPreprocessor: converts many kinds of data to a
single group of input
PerceiverAudioPreprocessor: converts audio to transformer input
PerceiverOneHotPreprocessor: adds dummy index dimension to input
PerceiverImagePreprocessor: convert image to transformer input,
performs significant transformation

PerceiverMultimodalPostprocessor: unconverts data into different kinds
of postprocessed data
PerceiverProjectionPostprocessor: uses linear combination to
downsample data [training prevents information loss]
PerceiverAudioPostprocessor: downsampling for audio features
PerceiverClassificationPostprocessor: downsampling for classification
log probs to a set of labels