Case study · Accounts payable

Document Intake Agent

Supplier invoices end-to-end with an agentic pipeline

Client

Coming soon — demo case

Timeline

Demo build

Stack

Claude visionZodFirestoreCloud FunctionsNetSuite integration

At a glance


Client	Demo build (template for our AP engagements)
Industry	Accounts payable / mid-market distribution
Engagement	Demo / template — full client builds typically 4–6 weeks
Stack	Claude vision, Zod, Firestore, Cloud Functions, NetSuite integration
Status	Template, ready to instantiate for client engagements

Mid-market finance teams routinely spend dozens of hours per week keying supplier invoices into their ERP. The work is high-volume, low-margin, and prone to errors that surface downstream when accounts don't reconcile.

Existing tools have problems:

OCR-only tools miss line-item logic. They return characters, not decisions. They get confused by varied layouts and never improve.
Full ERP AP modules (NetSuite AP, Dynamics 365 AP) are heavyweight, require dedicated implementations, and still need a human to key in the fields OCR couldn't read.
Off-the-shelf "AI invoice processing" SaaS ship impressive demos against their own clean test invoices, then meet your real-world supplier mix and disappoint.

The demo target: a pipeline that actually works on real-world supplier invoices, with honest measurement of error rates, and a human-in-the-loop that catches the edge cases gracefully.

What we built

The reference pipeline that we now use as the starting point for every AP automation engagement we ship.

1. Ingestion

Supplier invoices arrive via three channels: email-to-AP inbox, supplier portal upload, and the occasional SFTP drop. A Resend webhook captures email attachments, a portal route handles uploads, and a scheduled job watches the SFTP folder. All three converge on a single Firestore queue with content-hash deduplication.

2. Vision extraction

The agent picks up a queue item and runs Claude Sonnet vision with a typed Zod schema for the invoice structure:

const InvoiceSchema = z.object({
  vendor: z.object({
    name: z.string(),
    taxId: z.string().optional(),
    addressLines: z.array(z.string()),
  }),
  invoice: z.object({
    number: z.string(),
    date: z.string(),
    dueDate: z.string().optional(),
    currency: z.string(),
  }),
  lineItems: z.array(z.object({
    description: z.string(),
    quantity: z.number(),
    unitPrice: z.number(),
    taxRate: z.number(),
    lineTotal: z.number(),
    poRef: z.string().optional(),
  })),
  totals: z.object({
    subtotal: z.number(),
    tax: z.number(),
    total: z.number(),
  }),
  paymentTerms: z.string().optional(),
});

Schema-validated output means downstream code can rely on structure. Validation failures route the document to a human review queue with the structured error.

3. PO matching with tolerance

For each extracted line item, the agent queries NetSuite for candidate POs from the same vendor in a reasonable date window. Matching logic applies tolerance rules:

Line total within €5 or 1% (whichever is greater) is a match.
Description fuzzy-match against PO line description.
Tax rate must match exactly.
Currency must match.

Matched line items get a PO reference attached. Unmatched ones are flagged for human review.

4. Confidence-tier routing

The agent assigns each document one of three tiers:

Tier	Criteria	Action
High	All line items matched to PO, all schema-valid, vendor on whitelist, no duplicate	Auto-post to NetSuite
Medium	Some matches, minor mismatches, or new vendor	Route to human review queue
Low	Schema validation failed, no PO matches, suspicious patterns	Reject with structured reason; notify supplier

The tier thresholds are tunable per client.

5. NetSuite posting + audit

For documents that auto-post or pass review:

NetSuite bill record created with line items.
Original PDF attached to the bill record.
Audit log entry written: document ID, extraction trace, decision history, reviewer (if applicable), NetSuite record reference.
Email marked processed in the source mailbox.

For documents that reject:

Reject reason logged.
Supplier auto-notified with the structured reason and a request to resubmit.
Document held in the queue for resubmission tracking.

Architecture

[Email inbox] [Portal upload] [SFTP]
            \   |   /
             [Firestore ingestion queue with content-hash dedup]
                       ↓
             [Claude vision extraction]
                       ↓
             [Zod schema validation]
                       ↓
             [PO match against NetSuite]
                       ↓
             [Confidence tier]
              ├─ high → NetSuite post
              ├─ med → Review queue (Next.js admin UI)
              └─ low → Reject + supplier notify
                       ↓
             [Audit log + dashboard]

The reviewer UI

A single-page Next.js app showing the document beside the extracted fields. Reviewer corrects in 30 seconds — type, click, confirm — and the agent learns from the correction by storing it as a future training/eval example.

Per-document review actions:

Approve (auto-post on approval)
Correct field and approve
Reject (with reason)
Escalate to manager

Key features shipped

Multi-channel ingestion (email, portal, SFTP) with deduplication.
Claude vision extraction with typed Zod schema.
PO matching with configurable tolerance rules.
Confidence-tier routing (auto-post / review / reject).
NetSuite integration with custom field mapping.
Reviewer UI — fast keyboard-driven correction workflow.
Audit trail queryable per document, per supplier, per period.
Eval suite with sample documents per supplier.
Cost-per-document attribution on the dashboard.

Target outcomes for a typical engagement

Metric	Before	After (week 8)
Human time per invoice	4–6 minutes	~30 seconds (reviewer only)
% keyed manually	100%	~13%
Cycle time (receipt → posted)	2–4 days	~4 hours
Visible error rate	~3% (estimated)	<1%
Cost per invoice (loaded)	~€3.80	~€0.18

What we learned

The schema sprint is the highest-leverage step. Spending the first week sitting with the AP team and defining what "structured invoice" means for their business — not the generic ANSI 810 standard — is what makes the rest of the pipeline work.

General extraction beats per-vendor templates 80% of the time. Vision LLMs handle layout variation better than template-based OCR ever did. We reserve per-vendor templates for the highest-volume suppliers where they shave cost and latency.

The reviewer UI matters more than the model. Reviewer ergonomics dictate whether the system actually saves time. We spent more engineering on the review UX than on the extraction prompt.

Shadow mode is non-negotiable. Two weeks of running the agent in parallel with the human team before cutover catches a class of bugs that no eval suite can find.

Where to go next

For the full technical walkthrough of how this pipeline works, see our How AI invoice processing works post and our AI agents for accounts payable post.

If you have a real-world AP volume problem, drop us a note. We'll come back within a business day with a feasibility take and a discovery-phase quote.

Service

AI Document Processing

Invoices, contracts, receipts, forms — extracted, validated, and pushed straight into your system of record.

Service

AI Agents Development

Custom agents that read documents, hold conversations, take phone calls, and execute multi-step workflows — wired into the systems you already run.

Agent type

Document Processing Agent

Invoices, contracts, receipts, and forms → structured data with confidence-tier human review

Agent type

Workflow Orchestrator Agent

Cross-SaaS triggers — Microsoft 365, Slack, Sheets, HubSpot, Stripe — with idempotency and approvals

Article

How AI invoice processing actually works (and where it breaks)

Modern AI invoice processing uses vision LLMs (Claude, GPT-4o, Gemini) to extract structured data from PDFs and images, then validates against business rules and routes by confidence — auto-post, review queue, or reject. The model is not the hard part; the schema, the reviewer UI, and the eval suite are.