MinerU One-click PC Deployment Tool | One Click to Run AI on Your Own Computer

Features

Open SourceOCRPDF

System Requirements

16GB RAM recommended. 17GB+ storage recommended.
macOS 15+: Supports both Intel and M-series chips.
Windows 10/11: Intel/AMD GPUs supported, NVIDIA GPU recommended.
Note: For NVIDIA GPUs, install a newer driver.

Introduction

MinerU is an all-in-one open-source intelligent document parsing tool developed by the OpenDataLab team at Shanghai AI Laboratory. It dedicated to solving high-quality structured data extraction challenges in large model training, RAG systems, and knowledge base construction.

I. Core Features (User-Friendly)

Deep PDF Parsing: Automatically extracts text, tables, images, and formulas (converted to LaTeX), accurately identifies headings, paragraphs, and lists while preserving original layout; supports OCR for scanned PDFs and automatically filters redundant content like headers, footers, and footnotes.
Multi-Format Compatibility: Supports PDFs, PNG/JPEG images, EPUB/MOBI/DOCX e-books, and extracts clean main content from web pages.
Multilingual Support: OCR for 109+ languages, ideal for cross-border document processing.
Structured Output: One-click conversion to Markdown (with multimodal elements), JSON, and HTML, output follows human reading order for direct use by large models.
Lightweight & Efficient: 0.9B parameter model runs smoothly on consumer-grade GPUs, with fast inference and low deployment costs.
Scientific Data Capability: High-precision extraction of mathematical formulas, chemical molecular structures, and chemical reaction equations for scientific document parsing.

II. Use Cases

Large model training data cleaning and structuring
RAG systems and enterprise knowledge base construction
Academic papers, research reports, and financial statement parsing
Batch e-book conversion and web content extraction
Scanned document digitization and information extraction

III. Underlying Technologies

Vision-Language Models (VLM), LayoutLMv3 (layout analysis)
Custom YOLOv8 (formula detection) + UniMERNet (formula to LaTeX)
PaddleOCR (multilingual text recognition)
SGLang inference optimization, Native-Res ViT native high-resolution vision technology
Multi-module parsing architecture based on PDF-Extract-Kit

GitHubhttps://github.com/opendatalab/MinerU

LicenseAGPL-3.0 license