Case study · Accounts payable

Document Intake Agent

Supplier invoices end-to-end with an agentic pipeline

Client
Coming soon — demo case
Timeline
Demo build
Stack
Claude visionZodFirestoreCloud FunctionsNetSuite integration

At a glance

ClientDemo build (template for our AP engagements)
IndustryAccounts payable / mid-market distribution
EngagementDemo / template — full client builds typically 4–6 weeks
StackClaude vision, Zod, Firestore, Cloud Functions, NetSuite integration
StatusTemplate, ready to instantiate for client engagements

The challenge

Mid-market finance teams routinely spend dozens of hours per week keying supplier invoices into their ERP. The work is high-volume, low-margin, and prone to errors that surface downstream when accounts don't reconcile.

Existing tools have problems:

  • OCR-only tools miss line-item logic. They return characters, not decisions. They get confused by varied layouts and never improve.
  • Full ERP AP modules (NetSuite AP, Dynamics 365 AP) are heavyweight, require dedicated implementations, and still need a human to key in the fields OCR couldn't read.
  • Off-the-shelf "AI invoice processing" SaaS ship impressive demos against their own clean test invoices, then meet your real-world supplier mix and disappoint.

The demo target: a pipeline that actually works on real-world supplier invoices, with honest measurement of error rates, and a human-in-the-loop that catches the edge cases gracefully.

What we built

The reference pipeline that we now use as the starting point for every AP automation engagement we ship.

1. Ingestion

Supplier invoices arrive via three channels: email-to-AP inbox, supplier portal upload, and the occasional SFTP drop. A Resend webhook captures email attachments, a portal route handles uploads, and a scheduled job watches the SFTP folder. All three converge on a single Firestore queue with content-hash deduplication.

2. Vision extraction

The agent picks up a queue item and runs Claude Sonnet vision with a typed Zod schema for the invoice structure:

const InvoiceSchema = z.object({
  vendor: z.object({
    name: z.string(),
    taxId: z.string().optional(),
    addressLines: z.array(z.string()),
  }),
  invoice: z.object({
    number: z.string(),
    date: z.string(),
    dueDate: z.string().optional(),
    currency: z.string(),
  }),
  lineItems: z.array(z.object({
    description: z.string(),
    quantity: z.number(),
    unitPrice: z.number(),
    taxRate: z.number(),
    lineTotal: z.number(),
    poRef: z.string().optional(),
  })),
  totals: z.object({
    subtotal: z.number(),
    tax: z.number(),
    total: z.number(),
  }),
  paymentTerms: z.string().optional(),
});

Schema-validated output means downstream code can rely on structure. Validation failures route the document to a human review queue with the structured error.

3. PO matching with tolerance

For each extracted line item, the agent queries NetSuite for candidate POs from the same vendor in a reasonable date window. Matching logic applies tolerance rules:

  • Line total within €5 or 1% (whichever is greater) is a match.
  • Description fuzzy-match against PO line description.
  • Tax rate must match exactly.
  • Currency must match.

Matched line items get a PO reference attached. Unmatched ones are flagged for human review.

4. Confidence-tier routing

The agent assigns each document one of three tiers:

TierCriteriaAction
HighAll line items matched to PO, all schema-valid, vendor on whitelist, no duplicateAuto-post to NetSuite
MediumSome matches, minor mismatches, or new vendorRoute to human review queue
LowSchema validation failed, no PO matches, suspicious patternsReject with structured reason; notify supplier

The tier thresholds are tunable per client.

5. NetSuite posting + audit

For documents that auto-post or pass review:

  1. NetSuite bill record created with line items.
  2. Original PDF attached to the bill record.
  3. Audit log entry written: document ID, extraction trace, decision history, reviewer (if applicable), NetSuite record reference.
  4. Email marked processed in the source mailbox.

For documents that reject:

  1. Reject reason logged.
  2. Supplier auto-notified with the structured reason and a request to resubmit.
  3. Document held in the queue for resubmission tracking.

Architecture

[Email inbox] [Portal upload] [SFTP]
            \   |   /
             [Firestore ingestion queue with content-hash dedup]
                       ↓
             [Claude vision extraction]
                       ↓
             [Zod schema validation]
                       ↓
             [PO match against NetSuite]
                       ↓
             [Confidence tier]
              ├─ high → NetSuite post
              ├─ med → Review queue (Next.js admin UI)
              └─ low → Reject + supplier notify
                       ↓
             [Audit log + dashboard]

The reviewer UI

A single-page Next.js app showing the document beside the extracted fields. Reviewer corrects in 30 seconds — type, click, confirm — and the agent learns from the correction by storing it as a future training/eval example.

Per-document review actions:

  • Approve (auto-post on approval)
  • Correct field and approve
  • Reject (with reason)
  • Escalate to manager

Key features shipped

  • Multi-channel ingestion (email, portal, SFTP) with deduplication.
  • Claude vision extraction with typed Zod schema.
  • PO matching with configurable tolerance rules.
  • Confidence-tier routing (auto-post / review / reject).
  • NetSuite integration with custom field mapping.
  • Reviewer UI — fast keyboard-driven correction workflow.
  • Audit trail queryable per document, per supplier, per period.
  • Eval suite with sample documents per supplier.
  • Cost-per-document attribution on the dashboard.

Target outcomes for a typical engagement

MetricBeforeAfter (week 8)
Human time per invoice4–6 minutes~30 seconds (reviewer only)
% keyed manually100%~13%
Cycle time (receipt → posted)2–4 days~4 hours
Visible error rate~3% (estimated)<1%
Cost per invoice (loaded)~€3.80~€0.18

What we learned

The schema sprint is the highest-leverage step. Spending the first week sitting with the AP team and defining what "structured invoice" means for their business — not the generic ANSI 810 standard — is what makes the rest of the pipeline work.

General extraction beats per-vendor templates 80% of the time. Vision LLMs handle layout variation better than template-based OCR ever did. We reserve per-vendor templates for the highest-volume suppliers where they shave cost and latency.

The reviewer UI matters more than the model. Reviewer ergonomics dictate whether the system actually saves time. We spent more engineering on the review UX than on the extraction prompt.

Shadow mode is non-negotiable. Two weeks of running the agent in parallel with the human team before cutover catches a class of bugs that no eval suite can find.

Where to go next

For the full technical walkthrough of how this pipeline works, see our How AI invoice processing works post and our AI agents for accounts payable post.

If you have a real-world AP volume problem, drop us a note. We'll come back within a business day with a feasibility take and a discovery-phase quote.

Related

Have a similar problem?

A 30-minute call will tell us if there's a fit. No prep needed — just bring the messy version of the workflow.