16 Jan
2022
16 Jan
'22
12:39 p.m.
[after a number of psychotic breaks] the training loop runs now. it's likely not running very effectively. for the notebook to run right now, an uncommitted change is needed: # compute loss loss = optax.softmax_cross_entropy(logits, flax.training.common_utils.onehot(labels, logits.shape[-1])) - padding_mask = decoder_attention_mask + padding_mask = batch['decoder_attention_mask'] loss = (loss * padding_mask).sum() / padding_mask.sum()