2 Apr
2022
2 Apr
'22
12:55 p.m.
I'm basically thinking of this: I'd train a tokenizer around the expected data (small sentences and a ton of logo code), replace the old tokenizer with the new, and then finetune the model until the whole system had the same behavior. I'd like to include more goals in the approach, but it's so satisfying to come through the difficulty, that it might be fine with just that one for now.