Re: [crazy][hobby][spam] Automated Reverse Engineering

1 Jan 2022

      i'm looking at https://github.com/huggingface/transformers/blob/master/examples/flax/summar...
, which is for flax as a summarization task, and noting that the
decoder input ids are the labels shifted by one.  i'm thinking that
summarization is basically the same as translation: seq2seq.  don't
really know.

here's their flax loss function:
https://github.com/huggingface/transformers/blob/master/examples/flax/summar...

and at https://github.com/huggingface/transformers/blob/master/examples/flax/summar...
i'm thinking that the labels are _not_ passed to the model ('pop'),
which lines up with not seeing the parameter in the source cod for the
flax model