12 Sep
2023
12 Sep
'23
12:39 a.m.
here: the input_ids and running score product are an order of magnitude smaller to store than needed to actually run the beams so you can actually cache a huge amount of them, and only run the highest probability ones it's quite efficient [they also share prefix sequences]