Skip to content
pyVideoTrans

pyVideoTrans

Automatically handles video translation, subtitle generation and dubbing

Features

Open SourceVideoTranslation

System Requirements

16GB RAM recommended. 15GB+ storage recommended.
macOS 15+: Supports both Intel and M-series chips.
Windows 10/11: Intel/AMD GPUs supported, NVIDIA GPU recommended.
Note: For NVIDIA GPUs, install a newer driver.

Introduction

1. Project Overview

pyvideotrans is a powerful open-source tool for video translation, audio transcription, and text-to-speech (TTS) synthesis. Developed by the developer jianchang512, it is licensed under the GPL-3.0 license. The source code and pre-packaged version are available on its GitHub repository (https://github.com/jianchang512/pyvideotrans), with official documentation hosted at pyvideotrans.com. Designed to enable seamless cross-language video conversion, it automatically handles subtitle generation, translation, dubbing, and audio-video merging—no complex operations required, making it accessible even for beginners.

2. Core Features

  1. Fully Automatic Video/Audio Translation: Upload a video or audio file containing human speech, and the tool will automatically recognize the speech, generate subtitles in the source language, translate them into the target language, create dubbed audio that matches the lip movements, and finally merge the new audio and subtitles into the original video. The entire process of "translation + dubbing + subtitle embedding" is completed in one step.
  2. Audio/Video to Subtitles: Batch process files to accurately convert human speech in videos or audios into SRT subtitle files with precise timestamps, eliminating the need for manual adjustment.
  3. Text-to-Speech (TTS): Convert text or SRT subtitles into natural-sounding speech using multiple high-quality TTS channels. The voice output is close to human speech, suitable for video dubbing or standalone audio generation.
  4. SRT Subtitle Translation: Batch translate existing SRT subtitle files while preserving original timestamps and formatting. It also supports multiple bilingual subtitle styles (e.g., source language and target language displayed simultaneously).
  5. Real-Time Speech-to-Text: Enable microphone input to convert spoken content into text in real time, ideal for meeting recordings or live subtitle generation.

3. Underlying Technology & Dependencies

The tool’s core capabilities rely on a range of mature open-source projects and technical frameworks to ensure stability and efficiency:

  • Core dependencies: ffmpeg (audio/video processing), PySide6 (GUI development), pydub (audio manipulation);
  • Automatic Speech Recognition (ASR): Integrates models such as openai-whisper, faster-whisper, and sherpa-onnx, supporting local deployment with high recognition accuracy;
  • Translation channels: Supports multiple channels including Microsoft Translator (free) for cross-language translation needs;
  • Dubbing engines (TTS): Includes Edge-TTS (free) and other engines, offering a variety of voice role options;
  • Other technologies: ctranslate2 (model acceleration, compatible with CUDA 12.x for GPU acceleration), rubberband (audio speed adjustment and alignment), libsndfile (audio file handling), etc.

4. Supported & Unsupported Scenarios

  • Supported: Any audio or video file containing human speech (regardless of whether it has embedded subtitles);
  • Unsupported: Videos with only background music and no human speech, or videos with only hardcoded subtitles (directly embedded in the video frame, unextractable) and no speech (the tool cannot extract hardcoded subtitles from video frames).