Supporting 600+ languages, voice design, voice cloning, natural speech, and ultra-fast inference
16GB RAM recommended. 15GB+ storage recommended.
macOS 15+: Supports both Intel and M-series chips.
Windows 10/11: Intel/AMD GPUs supported, NVIDIA GPU recommended.
Note: For NVIDIA GPUs, install a newer driver.OmniVoice is an open-source, massively multilingual zero-shot text-to-speech (TTS) system developed by the k2-fsa team. The core team consists of key original developers behind the well-known open-source speech project Kaldi, led by Dr. Daniel Povey, who currently serves as Chief Speech Scientist at Xiaomi. The project is strongly supported by Xiaomi AI Lab and represents an important open-source achievement in Xiaomi’s intelligent speech technology research.
OmniVoice is designed for a wide range of scenarios including global voice content generation, multilingual intelligent interaction, accessibility narration, video dubbing, virtual human voice synthesis, and dialect & low-resource language content creation. It addresses limitations of traditional TTS systems such as limited language support, poor performance on low-resource languages, unnatural timbre, and slow inference speed. Built on a novel architecture that combines diffusion models with language models, OmniVoice delivers highly natural speech synthesis while achieving extremely fast inference, with an RTF as low as 0.025—meaning it can generate audio about 40 times faster than real time, making it suitable for high-concurrency, low-latency industrial deployment.
One of its most remarkable advantages is its extensive language coverage, supporting more than 600 languages and dialects worldwide, ranging from major global languages to many low-resource and regional varieties, such as:
All languages are supported in a zero-shot manner—no additional fine-tuning is required for any language, and natural speech can be synthesized directly from text input.
In terms of advanced voice capabilities, OmniVoice provides a rich set of professional features:
In summary, OmniVoice is one of the most comprehensive open-source multilingual TTS systems available today, offering exceptional language coverage, high audio quality, fast inference, and rich functionality, making it valuable for both academic research and real-world industrial applications.