adapters do indeed support training embeddings. it is a parameter passed when enabling adapter training: https://github.com/adapter-hub/adapter-transformers/pull/245/files#diff-b31f... it looks like the trained embeddings are not then saved nor used unless additional functions are called to save and use them. another option would be using the vanilla tokenizers and simply replacing missing tokens with unique strings. this would keep ram usage down, but the training would not be quite as powerful since embeddings are not included, and it would make it hard to process data containing the missing strings. i'm thinking the vanilla tokenizer might be the way to go for me; to reduce the areas for delay and error. additionally the frankensteined tokenizers have begun decreasing their loss :S