Project Vaani

India's Largest Open-Source Speech Dataset

Built from the ground up, across India's villages, towns, and cities. Project Vaani captures real voices — from tribal belts to urban metros, across 165 districts, surfacing dozens of languages that existing datasets have never touched.

Open-sourced via Bhashini and Hugging Face, it gives developers, researchers, and public innovators a foundation for building AI in education, health, governance, and beyond — tools that can genuinely reach India's population.