Document Processing Tool

A structured document-processing workflow for invoices and receipts, with embedded-text extraction, scanned-document fallback, and fixed-schema output. Processes typed and scanned invoices with ~94% extraction accuracy. Handles irregular formats with AI fallback.

Overview

Processes typed and scanned invoices with ~94% extraction accuracy. Handles irregular formats with AI fallback.

Document Processing Tool preview

This project showcases a Python backend built to process invoices, receipts, and similar financial documents into one dependable output format. Instead of treating every document the same way, the workflow first checks whether a PDF already contains readable text, then switches to a scanned-document vision path only when needed.

The goal is simple: accept messy real-world documents, keep the response shape fixed, and reduce all unstructured information into a single summary field that is still useful downstream.

Try It Out

Upload your own invoice, receipt, scanned PDF, or image to test the live document-processing flow.
If embedded text is missing, the backend automatically falls back to OCR-style vision extraction and still returns the same fixed schema.
The backend is deployed on Render free tier, so the first request can take a moment.

Invoice AI Extractor

Processed documents

...

successful documents since launch

Average processing time

Not available

calculated from lifetime successful runs

How It Works

The user uploads a PDF or image from the browser.
The backend checks whether the PDF has enough embedded text to extract directly.
If the text is weak or missing, the PDF pages are rendered as images and processed through a vision-based OCR fallback.
The extracted content is sent to the model with a strict Pydantic schema , so the response stays structured.
The API returns one consistent JSON shape with fields like vendor, date, totals, line items, category, and a concise summary.

This makes the system useful for both clean digital invoices and rough scans without forcing the frontend to handle multiple result formats.

Fixed Output

The backend always aims to return a single schema, including:

document_type
processing_mode
vendor
document_number
purchase_order_number
date
due_date
currency
subtotal
tax
total
category
payment_method
line_items
summary

That structure is what keeps the tool suitable for financial workflows, Excel pipelines, and downstream automation .

Tech Stack

Backend: FastAPI, Python
Parsing: pdfplumber, pypdfium2
AI Extraction: OpenAI Responses API
Schema Enforcement: pydantic
Frontend Demo: Astro + Svelte

← Back to projects