Open Source Alert: Very excited to announce we are open sourcing Vakyash, a speech recognition framework to democratize speech recognition in Indic Languages.
Some key features:
1. End to end training and experimentation platform built on top of @facebookai Wav2Vec 2.0.
2. State of the art pretrained and finetuned models in 8 Indic languages including some low resource languages.
(Hindi, Indian English, Kannada, Marathi, Odia, Tamil, Telugu and Gujarati)
3. KenLM based language models including text data for all the above languages
4. Intelligent data pipelines to generate training data for any end to end speech recognition framework (recipes include language identification, speaker clustering and gender identification)
5. Inference service to host models using wav2vec 2.0 in real time and in batch mode.