---- I think I can roughly summarise that if it is, it's not going to be a big return to learn it, since other model parts would train a different way. I'm now interested in brainstorming another approach. Something to replace the tokenizer. We could use the model's existing output to train the other approach, and then finetune it around the desired result. Time to move to the other thread. Overstayed this one a little. --- editing the above for new thread: I think I can roughly summarise that if the output tokenizer is trainable, it's not going to be a big return to learn it, since other model parts would train a different way. I'm now interested in brainstorming another approach. Something to replace the tokenizer. We could use the model's existing output to train the other approach, and then finetune it around the desired result. ---