- perceiver was made by google; it's a specific attention tweak to encoder/decoder transformer, i think - perceiver is now in huggingface in src/transformers/models/perceiver some of the classes, for the second half of the pipeline, also preprocessors PerceiverAbstractDecoder: decoder base class PerceiverMultimodalDecoder: decodes model results into many forms of simultaneous results PerceiverBasicVideoAutoencodingDecoder: for models that produce video PerceiverOpticalFlowDecoder: for models that produce optical flow information PerceiverClassificationDecoder: for models that produce labels from a fixed set PerceiverBasicDecoder: a basic decoder for producing data, using cross attention PerceiverProjectionDecoder: a decoder that does not use cross attention Conv2DDownsample: downsamples data 4x using torch.nn.Conv2d and padding PerceiverAbstractPositionEncoding: base class for position encoding PerceiverTrainablePositonEncoding: position encoding that is trained PerceiverFourierPositionEncoding: position encoding that produces normal (fourier sinusoid) position embeddings based on channel people say that without position embeddings, the channel on which data is received is not used as information AbstractPreprocessor: preprocessor base class PerceiverTextPreprocessor: an embedding encoder for perceiver PerceiverEmbeddingDecoder: an embedding decoder for perceiver embeddings are matrices that convert between integer ids (words) and n-dimensional vectors (points in meaning space) PerceiverMultimodalPreprocessor: converts many kinds of data to a single group of input PerceiverAudioPreprocessor: converts audio to transformer input PerceiverOneHotPreprocessor: adds dummy index dimension to input PerceiverImagePreprocessor: convert image to transformer input, performs significant transformation PerceiverMultimodalPostprocessor: unconverts data into different kinds of postprocessed data PerceiverProjectionPostprocessor: uses linear combination to downsample data [training prevents information loss] PerceiverAudioPostprocessor: downsampling for audio features PerceiverClassificationPostprocessor: downsampling for classification log probs to a set of labels