Install SoulX-Podcast Locally 
SoulX-Podcast is an open-source project developed by the Soul AI team, designed to transform text scripts into realistic, high-fidelity podcast-style audio. Think of it as an “AI Podcast Studio”: just input a dialogue script, and it automatically assigns voices to different speakers, adds natural intonations, laughter, sighs, and other expressive elements, generating long-form, multi-turn conversational audio that sounds remarkably human.
It excels not only in single-speaker narration (like audiobooks) but especially in creating multi-speaker, multi-turn dialogues—such as talk shows, interviews, or casual chats—making the output incredibly natural and lifelike.
Key Features & Capabilities 
- Multi-Speaker Dialogue Generation: Supports dynamic turn-taking between multiple characters, simulating real podcast interactions.
 - Multilingual & Dialect Support: Works with Mandarin, English, and several Chinese dialects including Sichuanese, Henanese, and Cantonese, enabling culturally rich content.
 - Zero-Shot Voice Cloning: Generate speech in a specific voice using just a few seconds of reference audio—no training required.
 - Paralinguistic Controls: Add expressive elements like laughter, sighs, pauses, and emphasis to enhance realism.
 - Long-Form Speech Synthesis: Capable of generating extended podcast episodes from long scripts.
 
System Requirements 
- Minimum 8GB RAM. 21GB+ storage recommended.
 - macOS 15+: Supports both Intel and M-series chips.
 - Windows 10/11: Intel/AMD GPUs supported, NVIDIA GPU recommended.
 - Note: If you're using an NVIDIA GPU, install NVIDIA Drivers to enable GPU acceleration.
 
How to use 
Dialect Format: 
If you want to generate speech in dialect, you can use the following options. If you don't want to use dialect, do not include dialect markers.
- Sichuan: Sichuan dialect
 - Henan: Henan dialect
 - Yue: Cantonese, Yue dialect
 
Paralinguistic Controls (Tone, Emotion) 
- laughter: Laughing sound
 - sigh: Sighing sound
 
Examples 
Below are examples of using dialects and paralinguistic controls, where [S1], [S2] represent specific speakers, <|Sichuan|> indicates the dialect used, and <|sigh|> represents a sigh:
[S1]<|Sichuan|>Oh no, this is reversed! <|laughter|>  
[S2]<|Henan|>I was just worried you might have trouble on the way! <|sigh|>Find SoulX-Podcast in LM Downloader 
Open LM Downloader, then click the "Local Apps" in the left menu. You could see SoulX-Podcast in the app list.
Click the SoulX-Podcast icon to go to the introduction page.
Click the Install Button,the install window opens. If you already have SoulX-Podcast installed, don't worry, this can be treated as an update to SoulX-Podcast and won't affect the models you've previously downloaded.
Close this window after the installation is complete.
Run IndexTTS2 
- Computers with NVIDIA graphics cards and properly installed drivers can use GPU acceleration. If the VRAM is insufficient but the system RAM is ample, you can disable GPU acceleration and use the CPU for generation.
 - If you want to create a dialect podcast, you can select the "Dialect Model" option.
 
On the application details page, click the Run button on the right to open the execution window.
Upon successful launch, your browser will open automatically.