Skip to content
MinerU

MinerU

A lightweight, efficient open-source document parser that accurately converts PDFs, images, and e-books into Markdown/JSON

Features

Open SourceOCRPDF

System Requirements

16GB RAM recommended. 17GB+ storage recommended.
macOS 15+: Supports both Intel and M-series chips.
Windows 10/11: Intel/AMD GPUs supported, NVIDIA GPU recommended.
Note: For NVIDIA GPUs, install a newer driver.

Introduction

MinerU is an all-in-one open-source intelligent document parsing tool developed by the OpenDataLab team at Shanghai AI Laboratory. It dedicated to solving high-quality structured data extraction challenges in large model training, RAG systems, and knowledge base construction.

I. Core Features (User-Friendly)

  • Deep PDF Parsing: Automatically extracts text, tables, images, and formulas (converted to LaTeX), accurately identifies headings, paragraphs, and lists while preserving original layout; supports OCR for scanned PDFs and automatically filters redundant content like headers, footers, and footnotes.
  • Multi-Format Compatibility: Supports PDFs, PNG/JPEG images, EPUB/MOBI/DOCX e-books, and extracts clean main content from web pages.
  • Multilingual Support: OCR for 109+ languages, ideal for cross-border document processing.
  • Structured Output: One-click conversion to Markdown (with multimodal elements), JSON, and HTML, output follows human reading order for direct use by large models.
  • Lightweight & Efficient: 0.9B parameter model runs smoothly on consumer-grade GPUs, with fast inference and low deployment costs.
  • Scientific Data Capability: High-precision extraction of mathematical formulas, chemical molecular structures, and chemical reaction equations for scientific document parsing.

II. Use Cases

  • Large model training data cleaning and structuring
  • RAG systems and enterprise knowledge base construction
  • Academic papers, research reports, and financial statement parsing
  • Batch e-book conversion and web content extraction
  • Scanned document digitization and information extraction

III. Underlying Technologies

  • Vision-Language Models (VLM), LayoutLMv3 (layout analysis)
  • Custom YOLOv8 (formula detection) + UniMERNet (formula to LaTeX)
  • PaddleOCR (multilingual text recognition)
  • SGLang inference optimization, Native-Res ViT native high-resolution vision technology
  • Multi-module parsing architecture based on PDF-Extract-Kit