Thanks for visiting my page.
People usually address me as Praveen. I am currently a Ph.D. student advised by
Prof. Mitesh Khapra
in the Department of Computer Science and Engineering,
AI4Bharat,
IIT Madras.
I completed my B.Tech in Computer Science and Engineering in
IIT Dharwad,
in 2021.
I'm currently working on building Neural Speech Synthesis systems with a focus
on enabling them to sound more natural, more intelligible, and have rich prosody while
supporting multiple voices and styles,
for the major languages spoken across India.
In 2023, as a first step, we open-sourced SOTA TTS for 13 Indian languages
(demo).
In 2024, we focused on (i) making our models more intelligible
(IndicOOV,
ELAICHI)
, (ii) releasing the first-ever high-quality multilingual expressive dataset
(Rasa)
for any Indian language, and (iii) unlocking a massive speech corpus covering over 10,496 speakers
(IndicVoices-R)
that enables scaling Indian TTS!
In 2025, we scale our HQ studio recording efforts to 750 hours, covering 28 speakers across 16
Indian languages
(Rasa🤗).
We release the first instruction-following dataset
(RASMALAI)
for Text-Instruct TTS in Indian languages and a multilingual extension of Parler-TTS developed in
collaboration with HuggingFace
(Indic
Parler-TTS).
We then release
(IndicF5)
which demonstrates how English-pretrained TTS models can be adapted to
Indian languages to achieve near-human polyglot speech and enable voice, style, and zero-resource
synthesis.
By 2026,
Rasa
expanded further to cover all 22 Indian languages, 40+ speakers, and over 1000 hours
of expressive speech, enabling large-scale, high-quality TTS across India.
As we improve our data and models, we note that TTS evaluations remain tricky.
In our MUSHRA studies, we find biases—like reference matching and judgment ambiguity—that can
distort progress tracking, and
we propose refined variants to mitigate them
(Rethinking MUSHRA).
While subjective assessments are good indicators of relative progress, they don’t test TTS systems
in their real-world use - in isolation, where the goal is to sound human. We introduce the
Human Fooling Rate
(HFR)
metric and a fine-grained variant for deployment-centric tracking with actionable insights.
My undergraduate journey has been detailed
here.