AI Document Processing
Invoices, contracts, receipts, forms — extracted, validated, and pushed straight into your system of record.
What document processing actually is
You receive PDFs, scans, photos, or emails containing documents — invoices, contracts, receipts, forms. Someone reads each one, types the data into your system, and files the original. We replace the "reads and types" part with code that does it more accurately and at any scale, while flagging the documents that need a human's attention.
Modern document processing is not OCR. OCR returns characters; document agents return decisions.
A modern document agent:
- Receives the document.
- Runs vision-capable LLM extraction against a typed schema (Zod / Pydantic).
- Applies business rules (PO matching, duplicate detection, vendor whitelist, tax validation).
- Routes by confidence tier — auto-post, review queue, or reject.
- Writes the structured data to your ERP / CRM / Drive with full audit trail and the original file attached.
For a deep walkthrough, see our How AI invoice processing works post.
When it pays off
Document processing pays back quickly when:
- Volume is non-trivial. At least a few hundred documents per month, ideally a few thousand.
- Documents are reasonably standardised. Invoices, receipts, POs, claims — yes. Free-form contracts — possible but harder.
- There's a system of record. ERP, CRM, finance system. We need somewhere to write.
- The cost of errors is recoverable. A wrong line item is fixable; a wrong wire transfer is not. We build in human review proportional to the cost of error.
It doesn't pay off when volume is low (under ~100 docs/month) or when documents are too varied for a useful schema. We will tell you which side of the line you're on after a discovery sprint.
The pipeline
A typical AP invoice pipeline looks like this:
[Email inbox / portal / scanner]
↓
[Ingestion queue with deduplication]
↓
[Vision extraction (Claude / Gemini / GPT-4o)] → typed schema
↓
[Business rules: PO match, vendor check, tax validation, duplicate check]
↓
[Confidence routing]
├─ High confidence + matched PO → auto-post to ERP
├─ Medium confidence → review queue (human, 60 seconds per doc)
└─ Low confidence / unknown → reject with structured reason
↓
[ERP write + original file attached + audit log]
Every step is observable. Every transition is logged. The reviewer UI is a single-page app that shows the document side-by-side with the extracted fields so the reviewer can correct in seconds, not minutes.
Where the wins come from
Each of the five layers contributes:
| Layer | Without it | With it |
|---|---|---|
| Vision extraction | Manual keying, ~3 min/doc | LLM extraction, ~5 sec/doc + cost €0.02 |
| Schema validation | Garbage in, garbage to ERP | Invalid → review queue with structured reason |
| Business rules | Errors discovered downstream | Errors caught at the boundary |
| Confidence routing | All-or-nothing automation | Reviewer only sees the 10–15% that need attention |
| Audit trail | Compliance pain | Full provenance per posted record |
Real numbers from a real engagement
From a mid-market distributor we worked with (anonymised), processing ~400 invoices per week before automation:
| Metric | Before | After (week 8) |
|---|---|---|
| Human time per invoice | 4–6 min | 30 sec (reviewer only) |
| % invoices keyed manually | 100% | 13% |
| Cycle time (receipt → posted) | 2–4 days | 4 hours |
| Visible error rate | 3.2% (estimated) | 0.8% |
| Cost per invoice | ~€3.80 | ~€0.18 |
Payback hit at month 4. Full details in the Document Intake Agent case study.
How we build it
Phase 1 — Schema definition (1 week)
We sit with the AP / ops team and define what "structured invoice" means for your business — not the generic ANSI 810 standard, but the actual fields you care about, the tax codes you use, the cost centre mappings, the approval thresholds. The output is a Zod / Pydantic schema and a stack of 50–100 sample documents to test against.
Phase 2 — Prototype (2 weeks)
We build the extraction + validation + simple review UI. We run the prototype against the sample documents and against a fresh week of real documents in shadow mode (the human team still does the actual work; we compare). We tune.
Phase 3 — Production build (2–4 weeks)
Confidence routing, ERP integration, audit trail, observability. Vendor-specific templates for high-volume vendors where we know the layouts. Eval suite that runs on every prompt change.
Phase 4 — Phased rollout (1–2 weeks)
10% of inbound → 50% → 100%. Reviewer queue staffed at typical volume from day one to make sure the workflow is real. Daily standup with your team during rollout.
Phase 5 — Iterate (ongoing)
Documents drift. Vendors change layouts. New document types arrive. We keep the eval cadence on retainer and add new schemas as needed.
The stack
| Layer | Default |
|---|---|
| Ingestion | Email (Resend webhook) / SFTP / portal upload |
| Storage | Firestore + Cloud Storage |
| Vision extraction | Claude Sonnet 4.6 / Gemini 2.0 Flash for cost-sensitive |
| Schema validation | Zod (TypeScript) end-to-end |
| Business rules | TypeScript on Cloud Functions |
| Reviewer UI | Next.js 16 + shadcn/ui + Firebase Auth |
| ERP integration | Custom — depends on your ERP (we have shipped NetSuite, Dynamics 365, Xero, Sage) |
| Observability | Langfuse + Sentry + custom dashboard |
| Cost attribution | Per-document trace |
Pricing — the honest version
| Engagement | Scope | Investment |
|---|---|---|
| Discovery + schema spec | 1 week | €4,000–7,000 |
| Single doc-type pipeline (e.g. invoices) | 4–6 weeks | €25,000–45,000 |
| Multi-doc pipeline | 8–14 weeks | €60,000–120,000 |
| Multi-ERP / multi-tenant | 12–20 weeks | €100,000–200,000 |
| Retainer (operations, evals, new schemas) | Monthly | from €2,000/month |
Pass-through LLM costs typically run €0.01–€0.05 per document depending on length and vision usage.
What we won't do
- Promise a specific accuracy number before the prototype phase. Accuracy depends on your documents, and we don't bluff.
- Skip the reviewer queue. Document automation without human review is how clients end up with €50,000 of duplicate payments. We always build the queue, even if the eventual auto-post rate is 95%+.
- Hide failures. Every document the model couldn't confidently process gets routed to a human with the reason for the routing. Silent failures are the failure mode that breaks trust.
If you have a document workflow that costs your team hours per week, send a note. We respond within one business day.
Frequently asked questions
Related work
Document Processing Agent
Invoices, contracts, receipts, and forms → structured data with confidence-tier human review
Document Intake Agent
Supplier invoices end-to-end with an agentic pipeline
How AI invoice processing actually works (and where it breaks)
Modern AI invoice processing uses vision LLMs (Claude, GPT-4o, Gemini) to extract structured data from PDFs and images, then validates against business rules and routes by confidence — auto-post, review queue, or reject. The model is not the hard part; the schema, the reviewer UI, and the eval suite are.
AI agents for accounts payable: a deployment guide
AI agents in AP automate the high-volume, low-margin work of invoice keying and PO matching. Honest savings: €3-5 per invoice in loaded cost, 70-90% reduction in human handling time, payback typically 4-8 months on €25-50k builds. The agent isn't the hard part — the reviewer UI and the ERP integration are.
RAG done right: the patterns that survive production
Production RAG is engineering, not magic. The patterns that survive: hybrid retrieval (vector + BM25), rerank top-k with a cross-encoder, metadata filtering, source dating, citation rendering, sampled human review. Without these, your retrieval is good in the demo and broken in production.
Ready to scope ai document processing?
A discovery call is the fastest way to know if there's a fit.