Skip to content
MiniMax-Remover

MiniMax-Remover

Taming Bad Noise for Effective Video Object Removal

Features

Open SourceVideo

Screenshots

MiniMax-Remover screenshot 1
MiniMax-Remover screenshot 2
MiniMax-Remover screenshot 3

System Requirements

Minimum 16GB RAM. 12GB+ storage recommended.
Windows 10/11: NVIDIA GPU with 8GB+ VRAM required
Note: For NVIDIA GPUs, install a newer driver.

Introduction

MiniMax-Remover is a tool that can "erase" unwanted things from videos. For example, it can automatically remove accidental passers-by in your videos, clutter in the background, or watermarks and subtitles you want to get rid of—making the video look natural without any awkward gaps. **What can it be used for?** - **Video retouching magic**: Like photo editing, it processes videos to remove unwanted objects. Use it to erase unexpected people in travel videos or reflective items in conference videos. - **Content creation helper**: Instead of manual frame-by-frame editing, it processes videos in batches, saving tons of time for creators. - **Privacy protector**: Remove sensitive info like faces or license plates from videos to prevent privacy leaks. **Technical Framework**: A fast and efficient video object removal tool based on minimax optimization, structured in two stages: 1. **Stage 1**:Training a remover using a simplified DiT (Diffusion in Transformer) architecture. 2. **Stage 2**:Distilling a robust remover with CFG (Classifier-Free Guidance) removal and fewer inference steps. #### Core Functions and Features - **High Efficiency**:Requires only 6 inference steps without CFG, ensuring rapid processing. - **Superior Performance**:Seamlessly removes objects from videos and generates high-quality visual content with natural edge blending. - **Robustness**:Prevents the regeneration of undesired objects or artifacts in masked regions under varying noise conditions, ensuring stable outputs. #### Technical Advantages - **Two-stage Optimization**:Combines DiT architecture with CFG distillation to balance efficiency and performance. - **Lightweight Inference**:Reduces inference steps while maintaining high image quality, suitable for real-time or batch video processing. - **Wide Adaptability**:Supports object removal in various video scenarios, with strong adaptability to complex backgrounds and dynamic changes.