Image Extraction
Overview
Section titled “Overview”EdgeParse can detect and extract images embedded in PDF documents, reporting their bounding boxes and page locations.
Image Detection
Section titled “Image Detection”Images are identified in the JSON output:
{ "type": "image", "id": 5, "page number": 1, "bounding box": [72, 300, 540, 500]}CLI Usage
Section titled “CLI Usage”# Extract with image metadataedgeparse document.pdf -f json
# Images are reported in the kids array with type "image"Python Usage
Section titled “Python Usage”import edgeparseimport json
json_str = edgeparse.convert("document.pdf", format="json")data = json.loads(json_str)
for element in data["kids"]: if element["type"] == "image": print(f"Image on page {element['page number']}") print(f" Bounding box: {element['bounding box']}")Markdown Output
Section titled “Markdown Output”In Markdown format, images are represented as placeholders with their page and position information.