AI Data Extraction: Benchmarking GPT-4o vs. Veryfi
A comparative AI pipeline, extracting structured financial data from irregular scans into Excel using Python.
Overview

This project addresses the challenge of “dark data” trapped in irregular, low-quality scans and PDFs. By building a dual-engine extraction system, I compared OpenAI’s GPT-4o (a general-purpose Multimodal LLM) and Veryfi (a specialized financial OCR API).
The goal was to evaluate which system better handles real-world “noise” like faded text, line-item multipliers, and complex document layouts for downstream financial analysis.
Demo (Video)
Tech Stack
- Language: Python 3.11+
- AI/OCR: OpenAI API (GPT-4o), Veryfi SDK
- Data Orchestration:
instructor(Pydantic-based extraction),pandas - Preprocessing:
pdf2image,Poppler - Storage and Analysis: Microsoft Excel (
openpyxl)
Python Scripts Explanation
OpenAI GPT-4o Integration
The OpenAI script utilizes the instructor library to enforce a strict Pydantic schema. Because GPT-4o is a reasoning model,
the script uses specialized “System Prompts” to handle mathematical extraction.
- Process: Converts PDF to JPEG -> Base64 Encoding -> Vision-to-JSON extraction.
- The Logic: It doesn’t just “read” text; it interprets it. When it sees “Item A 39.9x2”, it calculates the unit price and quantity even if the receipt layout is cluttered.
Veryfi API Integration
The Veryfi script uses a dedicated financial OCR engine designed specifically for receipts and invoices.
- Process: Direct PDF/Image upload via the
veryfiPython SDK. - The Logic: It uses deterministic OCR rules to identify tax, subtotal, and vendor information instantly. It is pre-trained on millions of financial documents, making it extremely fast at identifying standard fields without complex prompting.
Key Learnings & Conclusion

- Reasoning vs. Recognition: GPT-4o showed superior “reasoning” for non-standard multipliers, whereas Veryfi showed superior “recognition” for standard tax and total fields.
- Accuracy vs. Cost:
- OpenAI GPT-4o is the more economical choice with slightly lower raw OCR precision but higher adaptability.
- Veryfi is highly accurate and purpose-built for finance, but comes at a significantly higher operational cost per document.
- The Verdict: GPT-4o is the optimal solution for cost-sensitive, high-volume projects, while Veryfi remains the benchmark for mission-critical financial auditing.
Need something similar?
I help startups, agencies, and small remote teams automate workflows, improve reporting, and build internal tools around real operational problems.
If this project looks close to what your team needs, feel free to reach out and I can suggest a practical approach.