EdgeParse is a high-performance PDF-to-structured-data extraction engine written in Rust. It converts complex PDFs into clean, structured JSON, Markdown, or HTML in milliseconds without ML dependencies.

How fast is EdgeParse compared to other PDF parsers?

EdgeParse processes 40+ pages per second — 10 to 100× faster than Python-based alternatives like Docling or Marker. It achieves 0.026s average processing time per document.

What programming languages does EdgeParse support?

EdgeParse provides native bindings for Python (via PyO3), Node.js (via NAPI-RS), a standalone CLI binary, and can be used directly as a Rust library crate.

Does EdgeParse require GPU or ML models?

No. EdgeParse is a rule-based extraction engine with zero ML dependencies. No GPU, no Java, no Poppler, no Tesseract required. Just pip install edgeparse and go.

WebAssembly API

Package

npm: edgeparse-wasm · Source: crates/edgeparse-wasm/

Initialization

Before calling any conversion function, initialize the WASM module:

import init from 'edgeparse-wasm';

// Async initialization — fetches and compiles the .wasm binary
await init();

Call init() once at application startup. Subsequent calls are no-ops.

`convert()`

Parses PDF bytes and returns a structured JavaScript object representing the full document.

function convert(
  pdf_bytes: Uint8Array,
  format?: string | null,
  pages?: string | null,
  reading_order?: string | null,
  table_method?: string | null,
): any;

Parameters

Parameter	Type	Default	Description
`pdf_bytes`	`Uint8Array`	(required)	Raw PDF file bytes
`format`	`string \| null`	`"json"`	Output format hint: `"json"`, `"markdown"`, `"html"`, `"text"`
`pages`	`string \| null`	`"all"`	Page range: `"all"`, `"1-5"`, `"1,3,7"`
`reading_order`	`string \| null`	`"auto"`	Reading order algorithm: `"auto"` (XY-Cut++) or `"off"`
`table_method`	`string \| null`	`"default"`	Table detection: `"default"` (ruling lines) or `"cluster"` (borderless)

Return value

A JavaScript object representing the Rust PdfDocument struct, serialized via serde_wasm_bindgen. The top-level document has a kids array containing ContentElement enum variants (externally-tagged):

{
  file_name: string,
  number_of_pages: number,
  author: string | null,
  title: string | null,
  kids: Array<Record<string, any>>  // externally-tagged Rust enum variants
}

Each element in kids is an externally-tagged enum object like { "Paragraph": { ... } } or { "Heading": { ... } }. For most use cases, convert_to_string(bytes, 'json') + JSON.parse() is simpler and gives the same structured schema as the Python/Node.js SDK.

Example

import init, { convert_to_string } from 'edgeparse-wasm';

await init();

const bytes = new Uint8Array(await file.arrayBuffer());

// Easiest: parse JSON string — same schema as Python/Node.js SDK
const doc = JSON.parse(convert_to_string(bytes, 'json'));

// Iterate elements (uses same keys as Python/Node.js JSON output)
for (const el of doc.kids) {
  if (el.type === 'table') {
    console.log('Table found on page', el['page number']);
  }
}

`convert_to_string()`

Parses PDF bytes and returns a formatted string.

function convert_to_string(
  pdf_bytes: Uint8Array,
  format?: string | null,
  pages?: string | null,
  reading_order?: string | null,
  table_method?: string | null,
): string;

Parameters

Same as convert().

Return value

A string in the requested format:

Format	Output
`"json"`	JSON string with bounding boxes and element types
`"markdown"`	Standard Markdown with GFM tables
`"html"`	HTML5 with semantic elements
`"text"`	Plain UTF-8 text with reading order preserved

Example

import init, { convert_to_string } from 'edgeparse-wasm';

await init();

const bytes = new Uint8Array(await file.arrayBuffer());

const markdown = convert_to_string(bytes, 'markdown');
const html = convert_to_string(bytes, 'html');
const text = convert_to_string(bytes, 'text');
const json = convert_to_string(bytes, 'json');

`version()`

Returns the EdgeParse version string.

function version(): string;

Example

import { version } from 'edgeparse-wasm';
console.log(version()); // "0.2.3"

Error handling

Both convert() and convert_to_string() throw JsError on failure (corrupted PDF, invalid page range, etc.):

try {
  const markdown = convert_to_string(bytes, 'markdown');
} catch (err) {
  console.error('PDF parsing failed:', err.message);
}

Browser compatibility

Browser	Minimum version
Chrome	57+
Firefox	52+
Safari	11+
Edge	16+

Requires WebAssembly and ES module support. The build.target: 'esnext' setting in Vite/Webpack ensures compatibility.

TypeScript support

Full .d.ts type definitions ship with the package. Your IDE will provide autocomplete and type checking automatically.

Live demo

Try the interactive demo: edgeparse.com/demo/