PerceiverEncoder: perceiver_encoder
BasicDecoder: basic_decoder
EmbeddingDecoder: embedding_decoder
so 5 objects at the root level that map to huggingface objects in order.
i'll try to instantiate a huggingface model with the same
configuration and compare the names:
unclear configuration values, unsure how to map:
encoder.z_index_dim=256
encoder.num_z_channels=d_latents
config values needing setting from the transformers.PerceiverConfig() defaults:
config.qk_channels = 8 * 32
config.v_channels = config.d_latents
in google's example, the decoder is configurable:
output_num_channels = config.d_latents
position_encoding_type = 'trainable'
output_index_dims = config.max_position_embeddings
num_z_channels = config.d_latents
qk_channels = 8*32
v_channels = config.d_model
num_heads = 8
final_project = False
trainable_position_encoding_kwargs={num_channels: config.d_model}
huggingface likely copied google's code, so i may be able to just
instantiate a model and have it look like google's example already
On 1/20/22, k
pip3 install git+https://github.com/deepmind/dm-haiku
import pickle; params = pickle.load(open('language_perceiver_io_bytes.pickle','rb')) params.keys()
the haiku weights look very similar to pytorch weights. a dictionary, where each key is a path to tensor in a model. they likely map 1-to-1 if the architectures are the same.
I typed a bunch more but lost it in more spasms. The above text came back when I reopened the last closed tab.
Parameter name guesses from reviewing the apply_perciever function that defines the model in masked_language_modelling.ipynb using pip3 install jupytext:
huggingface's TextPreprocessor: 'embed' and 'trainable_position_encoding'