Skip to content

Changelog

All notable changes to EdgeParse are documented here.

The format follows Keep a Changelog and this project adheres to Semantic Versioning.


  • Zero Clippy warnings across all crates
  • Corrected .gitignore to exclude build artifacts cleanly
  • Comprehensive Rust doc comments on public API surface
  • Step-by-step tutorials for CLI, Python SDK, Node.js SDK, Rust library, and output formats
  • CI/CD publishing guide
  • Bumped version to 0.1.1 in all crates and SDK manifests

  • Core extraction engine (edgeparse-core): Rust-native PDF-to-structured-data pipeline — no ML, no Java, no GPU
  • Python SDK (edgeparse): PyO3-based bindings, available on PyPI
  • Node.js SDK (edgeparse): NAPI-RS bindings, available on npm
  • CLI binary (edgeparse-cli): Zero-dependency binary for all major platforms
  • Rust library (edgeparse-core): First-class crate published to crates.io
  • Reading-order reconstruction for multi-column and sidebar layouts
  • Ruling-line and borderless table detection with cell-span merging
  • Heading and paragraph classification
  • AI safety filters (PII scrubbing, content flags)
  • Tagged PDF support (PDF/UA accessibility structure)
  • Output formats: JSON (full schema), Markdown, HTML, plain text
  • Benchmark suite comparing EdgeParse against Docling, Marker, pymupdf4llm, MinerU, MarkItDown, and LiteParse
  • Docker image for containerised deployment
  • GitHub Actions CI workflows for Rust, Python, Node.js, and Docker releases
  • Renamed Node.js package from @edgeparse/pdfedgeparse