EdgeParse is a high-performance PDF-to-structured-data extraction engine written in Rust. It converts complex PDFs into clean, structured JSON, Markdown, or HTML in milliseconds without ML dependencies.

How fast is EdgeParse compared to other PDF parsers?

EdgeParse processes 40+ pages per second — 10 to 100× faster than Python-based alternatives like Docling or Marker. It achieves 0.026s average processing time per document.

What programming languages does EdgeParse support?

EdgeParse provides native bindings for Python (via PyO3), Node.js (via NAPI-RS), a standalone CLI binary, and can be used directly as a Rust library crate.

Does EdgeParse require GPU or ML models?

No. EdgeParse is a rule-based extraction engine with zero ML dependencies. No GPU, no Java, no Poppler, no Tesseract required. Just pip install edgeparse and go.

Quick Start: WebAssembly

Overview

EdgeParse compiles to WebAssembly, enabling client-side PDF extraction in any modern browser. The same Rust engine that powers the CLI, Python, and Node.js SDKs runs locally in the user’s browser tab.

Key properties:

PDF data never leaves the user’s device
Works offline after initial WASM load (~4 MB)
Same accuracy as the native CLI
Zero backend infrastructure

Build the WASM package

# Install wasm-pack (one-time)
curl https://rustwasm.github.io/wasm-pack/installer/init.sh -sSf | sh

# Build for browser use
cd crates/edgeparse-wasm
wasm-pack build --target web --release

Output goes to crates/edgeparse-wasm/pkg/.

Install in your project

# Option 1: Link locally
npm install ./path-to/crates/edgeparse-wasm/pkg

# Option 2: Copy pkg/ into your project
cp -r crates/edgeparse-wasm/pkg/ src/edgeparse-wasm/

Basic usage

import init, { convert_to_string } from '@edgeparse/edgeparse-wasm';

// Load WASM binary (call once at startup)
await init();

// Read a PDF file (from user upload, fetch, etc.)
const response = await fetch('/my-report.pdf');
const bytes = new Uint8Array(await response.arrayBuffer());

// Extract Markdown
const markdown = convert_to_string(bytes, 'markdown');
console.log(markdown);

// Extract structured JSON
const json = convert_to_string(bytes, 'json');

// Extract HTML
const html = convert_to_string(bytes, 'html');

// Extract plain text
const text = convert_to_string(bytes, 'text');

Handle user file uploads

import init, { convert_to_string } from '@edgeparse/edgeparse-wasm';

await init();

const fileInput = document.getElementById('pdf-input') as HTMLInputElement;

fileInput.addEventListener('change', async () => {
  const file = fileInput.files?.[0];
  if (!file) return;

  const bytes = new Uint8Array(await file.arrayBuffer());
  const markdown = convert_to_string(bytes, 'markdown');

  document.getElementById('output')!.textContent = markdown;
});

Get structured document data

Use convert() instead of convert_to_string() to get a full JavaScript object with pages, elements, and bounding boxes:

import init, { convert } from '@edgeparse/edgeparse-wasm';

await init();

const bytes = new Uint8Array(await file.arrayBuffer());
const doc = convert(bytes, 'json');

// Access structured data
for (const page of doc.pages) {
  console.log(`Page ${page.page_number}:`);
  for (const element of page.elements) {
    console.log(`  [${element.type}] ${element.text}`);
  }
}

Page range selection

// Parse only pages 1-5
const markdown = convert_to_string(bytes, 'markdown', '1-5');

// Parse specific pages
const json = convert_to_string(bytes, 'json', '1,3,7');

Table extraction methods

// Default: ruling-line detection (best for tables with borders)
const md1 = convert_to_string(bytes, 'markdown', 'all', 'auto', 'default');

// Cluster method: for borderless tables
const md2 = convert_to_string(bytes, 'markdown', 'all', 'auto', 'cluster');

Vite configuration

import { defineConfig } from 'vite';

export default defineConfig({
  optimizeDeps: {
    exclude: ['@edgeparse/edgeparse-wasm'],
  },
  build: {
    target: 'esnext',
  },
});

Webpack configuration

module.exports = {
  experiments: {
    asyncWebAssembly: true,
  },
};

Live demo

Try EdgeParse WASM in your browser: edgeparse.com/demo/

Upload any PDF and see extracted Markdown, JSON, HTML, or text — all processing runs locally.

Next steps

WASM API Reference — full function signatures and parameter details
WASM Use Cases — client-side RAG, offline apps, privacy-first processing