Everyone race to build an ASR toolkit these days, this one is a good candidate actually compared to others. A good modern technology with subword pieces, parallel training, fast decoding. Very competitive decoding accuracy.
Disadvantages are:
1) A bit disorgranized codebase with directly imported fairseq
2) No online decoding in design which is a must for real-world applications.
2) Mozilla DeepSpeech - very lightweight technology, no real accuracy and speed.
3) nvidia/NEMO - potentially good performance from GPU experts, but not clear how it will develop in the future
4) speechbrain - just announced, no real code
5) facebook/wav2letter - C++ codebase, not within general NN community
6) tensoflow/lingvo - a playground for Google guys, who uses tensorflow these days?
7) kaldi - good old one (if 7 years is old for you), still has very important features others do not have (semi-supervised learning, long alignment). But no Pytorch again, not very attractive for general NN community.
8) didi/delta - did anyone try it at all?
9) PaddlePaddle/DeepSpeech - very old technology too, but Baidu releases very good models trained on their proprietary data
It more depends on features you already implemented. Check the arxiv file, if you do not have all those features already (lookahead lm, proper sentencepiece, label smoothing), consider this espresso.
I think the question was more on what framework should be preferred to tensorflow now?
Or at least that's my question - as someone who has taken an interest in this area since the deepdream days, but is only now considering diving in fully, what platforms should I be looking at, if not tensorflow?
Entirely OT, but I'm getting a bit tired of the coffee-based naming. Some other software named Espresso:
ESPResSo is a highly versatile software package for performing and analyzing scientific Molecular Dynamics many-particle simulations of coarse-grained atomistic or bead-spring models as they are used in soft matter research in physics, chemistry and molecular biology.
Espresso, for people who make delightful, innovative and fast websites — in an app to match. Espresso helps you write, code, design, build and publish with flair and efficiency.
Quantum ESPRESSO is an integrated suite of Open-Source computer code for electronic structure calculations and materials modeling at the nanoscale.
Tech company names in general have completely lost the plot.
Looking ANYTHING up in search engines these days is completely and utterly derailed by tech companies (/programming languages/frameworks/etc) who insist on naming themselves after common words used in everyday language.
Go, Espresso, Vanilla, Box, Square, Stripe, Express, Next, Angular, Feather, Mint, it goes on forever.
Each one insists on branding themselves just with that word, using it on its own and poisoning search results and online content the world over.
I'm pretty sure that is affected by the search engine (google?) knowing enough about you that you're probably interested in programming languages/frameworks.
But having said that I am suddenly hit with the urge to make something popular and useful that I will name something like God, Sex, or Pizza.
Disadvantages are:
1) A bit disorgranized codebase with directly imported fairseq
2) No online decoding in design which is a must for real-world applications.
Other toolkits:
1) ESPnet - crazy dual chainer/pytorch backend, pretty slow from beginning, otherwise good.
2) Mozilla DeepSpeech - very lightweight technology, no real accuracy and speed.
3) nvidia/NEMO - potentially good performance from GPU experts, but not clear how it will develop in the future
4) speechbrain - just announced, no real code
5) facebook/wav2letter - C++ codebase, not within general NN community
6) tensoflow/lingvo - a playground for Google guys, who uses tensorflow these days?
7) kaldi - good old one (if 7 years is old for you), still has very important features others do not have (semi-supervised learning, long alignment). But no Pytorch again, not very attractive for general NN community.
8) didi/delta - did anyone try it at all?
9) PaddlePaddle/DeepSpeech - very old technology too, but Baidu releases very good models trained on their proprietary data