23 Sep
2023
23 Sep
'23
1:22 a.m.
https://huggingface.co/cerebras/btlm-3b-8k-base/discussions/25 Context length schedule and performance #25 by baffo32 - opened less than a minute ago Discussion
Hey,
I’m looking at your chart showing incredible performance improvement greatly extending the context length with a smaller portion of training at the end.
It’s quite notable most of the gains are in the untrained context lengths.
It looks to me like steadily increasing the context length throughout training could possibly flatline the chart, these relative gains are so big.
Has anyone tried training on steadily increasing context lengths?