1 Jan
2022
1 Jan
'22
10:35 p.m.
i'm looking at https://github.com/huggingface/transformers/blob/master/examples/flax/summar... , which is for flax as a summarization task, and noting that the decoder input ids are the labels shifted by one. i'm thinking that summarization is basically the same as translation: seq2seq. don't really know. here's their flax loss function: https://github.com/huggingface/transformers/blob/master/examples/flax/summar... and at https://github.com/huggingface/transformers/blob/master/examples/flax/summar... i'm thinking that the labels are _not_ passed to the model ('pop'), which lines up with not seeing the parameter in the source cod for the flax model