Skip to content

WebAssembly API

npm: @edgeparse/edgeparse-wasm · Source: crates/edgeparse-wasm/

Before calling any conversion function, initialize the WASM module:

import init from '@edgeparse/edgeparse-wasm';
// Async initialization — fetches and compiles the .wasm binary
await init();

Call init() once at application startup. Subsequent calls are no-ops.

Parses PDF bytes and returns a structured JavaScript object representing the full document.

function convert(
pdf_bytes: Uint8Array,
format?: string | null,
pages?: string | null,
reading_order?: string | null,
table_method?: string | null,
): any;
ParameterTypeDefaultDescription
pdf_bytesUint8Array(required)Raw PDF file bytes
formatstring | null"json"Output format hint: "json", "markdown", "html", "text"
pagesstring | null"all"Page range: "all", "1-5", "1,3,7"
reading_orderstring | null"auto"Reading order algorithm: "auto" (XY-Cut++) or "off"
table_methodstring | null"default"Table detection: "default" (ruling lines) or "cluster" (borderless)

A JavaScript object matching the PdfDocument structure:

{
pages: [
{
page_number: 1,
width: 612.0,
height: 792.0,
elements: [
{
type: "heading", // "heading" | "paragraph" | "table" | "list" | "image" | ...
text: "Introduction",
level: 1, // heading level (1-6)
bbox: {
x0: 72.0,
y0: 700.0,
x1: 300.0,
y1: 720.0,
},
},
// ... more elements
],
},
],
}
import init, { convert } from '@edgeparse/edgeparse-wasm';
await init();
const bytes = new Uint8Array(await file.arrayBuffer());
const doc = convert(bytes, 'json');
// Iterate pages and elements
for (const page of doc.pages) {
for (const el of page.elements) {
if (el.type === 'table') {
console.log('Table found on page', page.page_number);
}
}
}

Parses PDF bytes and returns a formatted string.

function convert_to_string(
pdf_bytes: Uint8Array,
format?: string | null,
pages?: string | null,
reading_order?: string | null,
table_method?: string | null,
): string;

Same as convert().

A string in the requested format:

FormatOutput
"json"JSON string with bounding boxes and element types
"markdown"Standard Markdown with GFM tables
"html"HTML5 with semantic elements
"text"Plain UTF-8 text with reading order preserved
import init, { convert_to_string } from '@edgeparse/edgeparse-wasm';
await init();
const bytes = new Uint8Array(await file.arrayBuffer());
const markdown = convert_to_string(bytes, 'markdown');
const html = convert_to_string(bytes, 'html');
const text = convert_to_string(bytes, 'text');
const json = convert_to_string(bytes, 'json');

Returns the EdgeParse version string.

function version(): string;
import { version } from '@edgeparse/edgeparse-wasm';
console.log(version()); // "0.1.1"

Both convert() and convert_to_string() throw JsError on failure (corrupted PDF, invalid page range, etc.):

try {
const markdown = convert_to_string(bytes, 'markdown');
} catch (err) {
console.error('PDF parsing failed:', err.message);
}
BrowserMinimum version
Chrome57+
Firefox52+
Safari11+
Edge16+

Requires WebAssembly and ES module support. The build.target: 'esnext' setting in Vite/Webpack ensures compatibility.

Full .d.ts type definitions ship with the package. Your IDE will provide autocomplete and type checking automatically.

Try the interactive demo: edgeparse.com/demo/