Skip to content

WebAssembly API

npm: edgeparse-wasm · Source: crates/edgeparse-wasm/

Before calling any conversion function, initialize the WASM module:

import init from 'edgeparse-wasm';
// Async initialization — fetches and compiles the .wasm binary
await init();

Call init() once at application startup. Subsequent calls are no-ops.

Parses PDF bytes and returns a structured JavaScript object representing the full document.

function convert(
pdf_bytes: Uint8Array,
format?: string | null,
pages?: string | null,
reading_order?: string | null,
table_method?: string | null,
): any;
ParameterTypeDefaultDescription
pdf_bytesUint8Array(required)Raw PDF file bytes
formatstring | null"json"Output format hint: "json", "markdown", "html", "text"
pagesstring | null"all"Page range: "all", "1-5", "1,3,7"
reading_orderstring | null"auto"Reading order algorithm: "auto" (XY-Cut++) or "off"
table_methodstring | null"default"Table detection: "default" (ruling lines) or "cluster" (borderless)

A JavaScript object representing the Rust PdfDocument struct, serialized via serde_wasm_bindgen. The top-level document has a kids array containing ContentElement enum variants (externally-tagged):

{
file_name: string,
number_of_pages: number,
author: string | null,
title: string | null,
kids: Array<Record<string, any>> // externally-tagged Rust enum variants
}

Each element in kids is an externally-tagged enum object like { "Paragraph": { ... } } or { "Heading": { ... } }. For most use cases, convert_to_string(bytes, 'json') + JSON.parse() is simpler and gives the same structured schema as the Python/Node.js SDK.

import init, { convert_to_string } from 'edgeparse-wasm';
await init();
const bytes = new Uint8Array(await file.arrayBuffer());
// Easiest: parse JSON string — same schema as Python/Node.js SDK
const doc = JSON.parse(convert_to_string(bytes, 'json'));
// Iterate elements (uses same keys as Python/Node.js JSON output)
for (const el of doc.kids) {
if (el.type === 'table') {
console.log('Table found on page', el['page number']);
}
}

Parses PDF bytes and returns a formatted string.

function convert_to_string(
pdf_bytes: Uint8Array,
format?: string | null,
pages?: string | null,
reading_order?: string | null,
table_method?: string | null,
): string;

Same as convert().

A string in the requested format:

FormatOutput
"json"JSON string with bounding boxes and element types
"markdown"Standard Markdown with GFM tables
"html"HTML5 with semantic elements
"text"Plain UTF-8 text with reading order preserved
import init, { convert_to_string } from 'edgeparse-wasm';
await init();
const bytes = new Uint8Array(await file.arrayBuffer());
const markdown = convert_to_string(bytes, 'markdown');
const html = convert_to_string(bytes, 'html');
const text = convert_to_string(bytes, 'text');
const json = convert_to_string(bytes, 'json');

Returns the EdgeParse version string.

function version(): string;
import { version } from 'edgeparse-wasm';
console.log(version()); // "0.2.3"

Both convert() and convert_to_string() throw JsError on failure (corrupted PDF, invalid page range, etc.):

try {
const markdown = convert_to_string(bytes, 'markdown');
} catch (err) {
console.error('PDF parsing failed:', err.message);
}
BrowserMinimum version
Chrome57+
Firefox52+
Safari11+
Edge16+

Requires WebAssembly and ES module support. The build.target: 'esnext' setting in Vite/Webpack ensures compatibility.

Full .d.ts type definitions ship with the package. Your IDE will provide autocomplete and type checking automatically.

Try the interactive demo: edgeparse.com/demo/