EdgeParse is a high-performance PDF-to-structured-data extraction engine written in Rust. It converts complex PDFs into clean, structured JSON, Markdown, or HTML in milliseconds without ML dependencies.

How fast is EdgeParse compared to other PDF parsers?

EdgeParse processes 40+ pages per second — 10 to 100× faster than Python-based alternatives like Docling or Marker. It achieves 0.026s average processing time per document.

What programming languages does EdgeParse support?

EdgeParse provides native bindings for Python (via PyO3), Node.js (via NAPI-RS), a standalone CLI binary, and can be used directly as a Rust library crate.

Does EdgeParse require GPU or ML models?

No. EdgeParse is a rule-based extraction engine with zero ML dependencies. No GPU, no Java, no Poppler, no Tesseract required. Just pip install edgeparse and go.

Changelog

All notable changes to EdgeParse are documented here.

The format follows Keep a Changelog and this project adheres to Semantic Versioning.

[0.1.1] — 2026-03-23

Fixed

Zero Clippy warnings across all crates
Corrected .gitignore to exclude build artifacts cleanly

Added

Comprehensive Rust doc comments on public API surface
Step-by-step tutorials for CLI, Python SDK, Node.js SDK, Rust library, and output formats
CI/CD publishing guide

Changed

Bumped version to 0.1.1 in all crates and SDK manifests

[0.1.0] — 2026-03-22

Added

Core extraction engine (edgeparse-core): Rust-native PDF-to-structured-data pipeline — no ML, no Java, no GPU
Python SDK (edgeparse): PyO3-based bindings, available on PyPI
Node.js SDK (edgeparse): NAPI-RS bindings, available on npm
CLI binary (edgeparse-cli): Zero-dependency binary for all major platforms
Rust library (edgeparse-core): First-class crate published to crates.io
Reading-order reconstruction for multi-column and sidebar layouts
Ruling-line and borderless table detection with cell-span merging
Heading and paragraph classification
AI safety filters (PII scrubbing, content flags)
Tagged PDF support (PDF/UA accessibility structure)
Output formats: JSON (full schema), Markdown, HTML, plain text
Benchmark suite comparing EdgeParse against Docling, Marker, pymupdf4llm, MinerU, MarkItDown, and LiteParse
Docker image for containerised deployment
GitHub Actions CI workflows for Rust, Python, Node.js, and Docker releases
Renamed Node.js package from @edgeparse/pdf → edgeparse

Changelog

[0.1.1] — 2026-03-23

Fixed

Added

Changed

[0.1.0] — 2026-03-22

Added

Links