This is what I bumped into over the last 2 days.  I think i've also seen speech to text models in model zoos.  I have trouble navigating the internet around this.

https://github.com/PaddlePaddle/DeepSpeech
https://github.com/kaldi-asr/kaldi
https://cmusphinx.github.io/