Install SoulX-Podcast Locally

SoulX-Podcast is an open-source project developed by the Soul AI team, designed to transform text scripts into realistic, high-fidelity podcast-style audio. Think of it as an “AI Podcast Studio”: just input a dialogue script, and it automatically assigns voices to different speakers, adds natural intonations, laughter, sighs, and other expressive elements, generating long-form, multi-turn conversational audio that sounds remarkably human.

It excels not only in single-speaker narration (like audiobooks) but especially in creating multi-speaker, multi-turn dialogues—such as talk shows, interviews, or casual chats—making the output incredibly natural and lifelike.

Key Features & Capabilities

Multi-Speaker Dialogue Generation: Supports dynamic turn-taking between multiple characters, simulating real podcast interactions.
Multilingual & Dialect Support: Works with Mandarin, English, and several Chinese dialects including Sichuanese, Henanese, and Cantonese, enabling culturally rich content.
Zero-Shot Voice Cloning: Generate speech in a specific voice using just a few seconds of reference audio—no training required.
Paralinguistic Controls: Add expressive elements like laughter, sighs, pauses, and emphasis to enhance realism.
Long-Form Speech Synthesis: Capable of generating extended podcast episodes from long scripts.

System Requirements

Minimum 8GB RAM. 21GB+ storage recommended.
macOS 15+: Supports both Intel and M-series chips.
Windows 10/11: Intel/AMD GPUs supported, NVIDIA GPU recommended.
Note: If you're using an NVIDIA GPU, install NVIDIA Drivers to enable GPU acceleration.

How to use

Dialect Format:

If you want to generate speech in dialect, you can use the following options. If you don't want to use dialect, do not include dialect markers.

Sichuan: Sichuan dialect
Henan: Henan dialect
Yue: Cantonese, Yue dialect

Paralinguistic Controls (Tone, Emotion)

laughter: Laughing sound
sigh: Sighing sound
coughing: Coughing sound
breathing: Breathing sound
throat_clearing: Throat Clearing sound

Examples

Below are examples of using dialects and paralinguistic controls, where [S1], [S2] represent specific speakers, <|Sichuan|> indicates the dialect used, and <|sigh|> represents a sigh:

[S1]<|Sichuan|>Oh no, this is reversed! <|laughter|>  
[S2]<|Henan|>I was just worried you might have trouble on the way! <|sigh|>

Find SoulX-Podcast in LM Downloader

Open LM Downloader, then click the "Local Apps" in the left menu. You could see SoulX-Podcast in the app list.

Click the SoulX-Podcast icon to go to the introduction page.

Click the Install Button，the install window opens. If you already have SoulX-Podcast installed, don't worry, this can be treated as an update to SoulX-Podcast and won't affect the models you've previously downloaded.

Close this window after the installation is complete.

Run IndexTTS2

Computers with NVIDIA graphics cards and properly installed drivers can use GPU acceleration. If the VRAM is insufficient but the system RAM is ample, you can disable GPU acceleration and use the CPU for generation.
If you want to create a dialect podcast, you can select the "Dialect Model" option.

On the application details page, click the Run button on the right to open the execution window.

Upon successful launch, your browser will open automatically.

Install SoulX-Podcast Locally ​

Key Features & Capabilities ​

System Requirements ​

How to use ​

Dialect Format: ​

Paralinguistic Controls (Tone, Emotion) ​

Examples ​

Find SoulX-Podcast in LM Downloader ​

Run IndexTTS2 ​