Here's the snippet around greedy generation: # 10. run greedy search return self.greedy_search( input_ids, logits_processor=logits_processor, stopping_criteria=stopping_criteria, pad_token_id=pad_token_id, eos_token_id=eos_token_id, output_scores=output_scores, return_dict_in_generate=return_dict_in_generate, synced_gpus=synced_gpus, **model_kwargs, ) Then: print(inspect.getsource(s2t.model.greedy_search)) This chunk is inside a while True loop, and is what's of interest: # forward pass to get next token outputs = self( **model_inputs, return_dict=True, output_attentions=output_attentions, output_hidden_states=output_hidden_states, ) ... next_token_logits = outputs.logits[:, -1, :] this plane of "logits" is the log probabilities the model is guessing as coming next, for every token in its vocabulary.