looks like an analogue to stable diffusion for voice is whisperspeech which functions ok
https://colab.research.google.com/drive/1xxGlTbwBmaY6GKA24strRixTXGBOlyiw

“quick! to the batporter! the ginseng hives are modulating!” low quality, baseline voice