Ask HN: What's the best TTS engine you've heard?

knaik94 · on March 23, 2023

I have had some fun playing with TorToiSe TTS, which is mixed when it comes to being better than ElevenLabs. In small snippets it does sound better, but overall it does not. I mention it because it's openly available and runs locally. I didn't spend more than a weekend on it, and it's popular enough to have a small community collection of voices. You have to search for them, but they're small in size and it's zero shot generation. It's very similar to how stable diffusion felt when it first came out, a lot of trail and error and no consensus of the "right" answers.

The main reason why I liked it, even though the bad generations are really bad, is because you have full control of the training data set. I haven't kept up with it in a few weeks so I am sure there have been advances I'm not aware of.

https://git.ecker.tech/mrq/ai-voice-cloning

cloudking · on March 23, 2023

https://play.ht/ultra-realistic-voices/

https://beta.elevenlabs.io/

anthonyhn · on March 22, 2023

For offline/local TTS, Coqui TTS [0] is quite good. It's essentially a continuation of Mozilla's TTS engine that Mozilla stopped working on ~2 years ago (and IIRC it's largely the same team that worked on Mozilla TTS).

[0] https://github.com/coqui-ai/TTS

kylebuildsstuff · on March 24, 2023

Maybe https://beepbooply.com? I built it myself but it combines all voices from Microsoft, Google, and Amazon into a simple interface. I find it simple, fast, and cheap when doing voiceovers for my own content.

gulabjamuns · on March 23, 2023

Acapela Group's Peter voice is the best British voice I've come across.

Earlier you could just buy the voice pack for a reasonable amount, now they have complicated the purchase quite a bit.

gostsamo · on March 22, 2023

Check the Microsoft tts voices. They have them as an api service.

buggy6257 · on March 23, 2023

I have to second this. I was experimenting with a voice assistant and the Azure Cognitive Speech Service is by and far the most human like speech I found. I recently started looking into this topic again and tried Amazon Polly and it’s pretty embarrassingly bad even for their “neural” one. Only thing Polly has going for it is the ability to modify the speech with tags (Azure may have this now! Haven’t checked in a while) but that live modification only fully works with their crappier voices.

*EDIT:*. They also have a python library that’s pretty easy to use. It finally has an ARM build too finally…

*EDIT 2:* Just started tinkering with Azure again, and yep they support SSML for live modification, and it works with their best neural voice. Diving back in again, here we go...

qgin · on March 23, 2023

Google Cloud's new Neural2 voices are pretty great. I think they may actually be the Tacotron voices but I can't say for sure.

tornato7 · on March 22, 2023

The Python TTS package implements Tacotron, or so they claim. I haven't been able to get that package to work for myself recently!