The vendor demo extracts 50 fields. Your problem is the 51st.
Picture a tech-enabled services firm that ingests a few thousand inbound documents a month — signed order forms, vendor COIs, customer onboarding packets, supporting docs for invoices. A vendor runs the demo, the model pulls every field off a clean PDF, and someone in the room does the math: if it reads each doc in two seconds instead of four minutes, that's hundreds of hours a month. The deal practically signs itself.
Then the first week of production arrives and the queue doesn't empty. It just changes shape. The documents that used to wait on a person now wait in an exception bucket — wrong vendor name, a date the model read off the fax header instead of the contract, a packet that's missing page three. The two-second extraction was real. It just moved the bottleneck downstream instead of removing it, and the person who used to key the whole form now spends their day adjudicating the model's guesses.
This is why "hours saved" fails the moment a CFO pokes at it. The economic event in document intake isn't reading the page. It's the clean handoff into whatever comes next — billing, onboarding, fulfillment, compliance. McKinsey's State of AI 2025 keeps landing on the same point: value shows up when the surrounding workflow is redesigned, not when a model is bolted onto the old one. A faster reader feeding the same broken handoff produces a faster pile, not a faster business.
Five numbers, and the order you collect them in matters
Before you claim a dollar of value, write down the baseline for the workflow as it runs today — not the per-document time, the whole pipe. Five measures tell the real story:
1. Cycle time, intake to usable. How long from a document landing in the queue to it being clean enough for the next step to act on. Measure end-to-end, including the wait. A doc the model reads in two seconds but a human re-touches three days later didn't get processed in two seconds — it got postponed.
2. Exception rate. What fraction of documents the model can't fully clear on its own. This is the number that quietly eats your savings, and it's the one demos never show.
3. Rework. How often a "done" document comes back — wrong field caught downstream, a re-key, a customer correction. Rework after the handoff is more expensive than the original keystroke, because now two teams are involved.
4. Handoff quality. Does the next team trust the output enough to act without re-checking it? If billing still eyeballs every record the model touched, you automated the typing and kept the reviewing.
5. Adoption. Are people actually routing work through the system, or quietly maintaining the manual path "just to be safe"? A shadow process is a negative ROI line nobody put on the slide.
Capture exceptions first. Until you know your exception rate and what's in those buckets, every savings figure is a guess dressed up as a forecast. And because intake documents carry customer data, contract terms, claims, and pricing, the exception design is also a control question — NIST's AI Risk Management Framework gives you the structure for that: map the context, measure the failure modes, manage the controls, name who's accountable. The IBM Institute for Business Value research makes the same case from the capability side — the model is the easy part; data quality, operating model, and adoption are where the return actually lives or dies.
Turn the pilot into a case the CFO will sign
A pilot that produces only an accuracy percentage produces nothing you can take to a budget meeting. A pilot worth funding produces four things: a documented baseline (the five numbers above, measured before go-live), a production acceptance threshold (the exception rate and handoff quality you'll actually accept), a standing exception-review cadence (someone looks at what's failing every week, not once at launch), and a named owner for benefits realization — one person accountable for the number, not a committee that admires it.
One control item people skip until an auditor finds it: where these documents live. If your intake pulls from shared drives, email, and collaboration tools, the permissions on those sources are now part of your AI surface. Microsoft's 365 Copilot data protection architecture shows why access cleanup and audit logging belong in the ROI estimate, not the post-mortem — a fast intake workflow with loose access control isn't a win, it's an incident waiting for a date. PwC's 2025 Responsible AI survey reinforces the same discipline: governance has to sit where the build and rollout decisions are made, not get stapled on afterward.
If you want to model this without leaning on soft savings, start with the AI ROI Calculator and run your real intake volume, exception rate, and rework through it. When you're ready to build the measurement into the workflow itself, that's what our AI transformation work is for.