[ot] 3B / 3GB quantized edge language model

Fri Sep 22 18:22:37 PDT 2023

https://huggingface.co/cerebras/btlm-3b-8k-base/discussions/25

Context length schedule and performance
#25

by baffo32 - opened less than a minute ago
Discussion

> Hey,
>
> I’m looking at your chart showing incredible performance improvement greatly extending the context length with a smaller portion of training at the end.
>
> It’s quite notable most of the gains are in the untrained context lengths.
>
> It looks to me like steadily increasing the context length throughout training could possibly flatline the chart, these relative gains are so big.
>
> Has anyone tried training on steadily increasing context lengths?