Hello đź‘‹,

Thanks for visiting my page.

People usually address me as Praveen. I am currently a Ph.D. student advised by Prof. Mitesh Khapra in the Department of Computer Science and Engineering, AI4Bharat, IIT Madras. I completed my B.Tech in Computer Science and Engineering in IIT Dharwad, in 2021.

Researchđź’ˇ

I'm currently working on building Neural Speech Synthesis systems with a focus on enabling them to sound more natural, more intelligible, and have rich prosody while supporting multiple voices and styles, for the major languages spoken across India.

In 2023, as a first step, we open-sourced SOTA TTS for 13 Indian languages (demo).

In 2024, we focused on (i) making our models more intelligible (IndicOOV, ELAICHI) , (ii) releasing the first-ever high-quality multilingual expressive dataset (Rasa) for any Indian language, and (iii) unlocking a massive speech corpus covering over 10,496 speakers (IndicVoices-R) that enables scaling Indian TTS!

In 2025, we scale our HQ studio recording efforts to 750 hours, covering 28 speakers across 16 Indian languages (Rasa🤗). We release the first instruction-following dataset (RASMALAI) for Text-Instruct TTS in Indian languages and a multilingual extension of Parler-TTS developed in collaboration with HuggingFace (Indic Parler-TTS). We then release (IndicF5) which demonstrates how English-pretrained TTS models can be adapted to Indian languages to achieve near-human polyglot speech and enable voice, style, and zero-resource synthesis.

By 2026, Rasa expanded further to cover all 22 Indian languages, 40+ speakers, and over 1000 hours of expressive speech, enabling large-scale, high-quality TTS across India.

As we improve our data and models, we note that TTS evaluations remain tricky. In our MUSHRA studies, we find biases—like reference matching and judgment ambiguity—that can distort progress tracking, and we propose refined variants to mitigate them (Rethinking MUSHRA). While subjective assessments are good indicators of relative progress, they don’t test TTS systems in their real-world use - in isolation, where the goal is to sound human. We introduce the Human Fooling Rate (HFR) metric and a fine-grained variant for deployment-centric tracking with actionable insights.

News⚡

  • Feb 2026 — Grateful to have helped train and release Sarvam Dub and Bulbul V3 as part of the incredible Speech team at Sarvam AI. It is a privilege to see our models operate at national scale, empowering Mann Ki Baat dubbing and having enabled the live translation of the Union Budget (2026), while establishing a new State-of-the-Art in India-first Text-to-Speech.
  • Nov 2024 — Super grateful to receive the Google PhD Fellowship in 2024. Thank you !
  • Nov 2024 — Recognized as an Outstanding Reviewer in EMNLP 2024 .

Publications

My undergraduate journey has been detailed here.