EdgeParse is a high-performance PDF-to-structured-data extraction engine written in Rust. It converts complex PDFs into clean, structured JSON, Markdown, or HTML in milliseconds without ML dependencies.

How fast is EdgeParse compared to other PDF parsers?

EdgeParse processes 40+ pages per second — 10 to 100× faster than Python-based alternatives like Docling or Marker. It achieves 0.026s average processing time per document.

What programming languages does EdgeParse support?

EdgeParse provides native bindings for Python (via PyO3), Node.js (via NAPI-RS), a standalone CLI binary, and can be used directly as a Rust library crate.

Does EdgeParse require GPU or ML models?

No. EdgeParse is a rule-based extraction engine with zero ML dependencies. No GPU, no Java, no Poppler, no Tesseract required. Just pip install edgeparse and go.

#1 Non-ML PDF Parser Leads the current benchmark · 83× faster than Docling · Zero dependencies

The PDF Engine for RAG Pipelines

Best published benchmark score without ML. 83× faster than Docling and 2× faster than OpenDataLoader. Zero GPU, zero OCR, zero JVM — just a 15 MB Rust binary with the best reported scores across reading order, tables, headings, paragraphs, text quality, and speed.

Get Started Star on GitHub

pip install edgeparse

0+ docs/sec

0% accuracy

0 ML dependencies

0 SDK languages

Works with

Python Node.js Rust CLI WebAssembly

Enterprise

EdgeParse for Enterprise

Production-grade PDF extraction for teams that need deployment control, data isolation, reliable operations, and a path from prototype to internal platform.

Self-Hosted

Deploy on your own infrastructure. Air-gapped environments, private clouds, or on-premises — EdgeParse runs wherever you need it.

Docker & Kubernetes ready
No external API calls
No data leaves your network

High Performance

Process thousands of PDFs per minute with constant, predictable resource usage. Rust-native speed, zero ML overhead.

40+ pages/second per core
10–100× faster than Python alternatives
Zero ML & GPU dependencies

Multi-Format Output

Structured JSON, Markdown, HTML, or plain text — integrate directly into your RAG pipeline, document workflow, or data lake.

JSON with bounding boxes
Table-aware Markdown
Clean HTML with headings

Taking PDF Extraction Into Production?

EdgeParse can support internal platforms, customer-facing document pipelines, and regulated deployments that need more than a prototype stack.

Enterprise Security

Support for regulated environments that need full data sovereignty, air-gapped deployment, and auditable processing pipelines.

Priority Support

Work directly with the team on architecture reviews, rollout plans, troubleshooting, and production issues.

Custom Integrations

Integrate into internal systems, custom deployment patterns, and proprietary workflows — Python, Node.js, Rust, CLI, or WebAssembly.

Talk to the Team

Apache 2.0 — no license lock-in

On-premise deployment — your data stays on your network

Zero ML/GPU/cloud dependency required

Quick start guide Install EdgeParse and run your first extraction in 60 seconds. Docker deployment Run EdgeParse in containers with pre-built images. Live demo Try EdgeParse in your browser — no install, no server, pure WebAssembly. Contact & support Talk to the team about architecture reviews, rollout planning, or custom integration work.