A reminder sent is not a dollar collected
Picture the AR ledger at a 60-person B2B company on the last Tuesday of the quarter. There are 340 open invoices. Two people working collections can meaningfully touch maybe 40 of them before close. So they chase the biggest balances, fire off a templated batch to everyone else, and hope. The 280 untouched accounts are not random — buried in them are three customers who would have paid this week with a nudge, and one who is quietly 90 days from disputing a $48k invoice nobody flagged.
This is where most teams measure AI wrong. They count messages sent and call it productivity. But a reminder sent is not a dollar collected, and a faster dunning cadence on the wrong account just trains good customers to ignore you. The only ROI that matters in collections is whether cash arrives sooner and whether the relationship survives the asking. That means your scoreboard is days sales outstanding, the percentage of promises-to-pay that actually clear, and how many disputes got caught before they aged into write-offs — not email volume.
AI earns its keep upstream of the message: reading the full account history in seconds, surfacing the dispute hiding in a comment field, ranking which of the 280 untouched accounts will move with a touch this week, and drafting context-rich follow-ups so a human spends judgment, not keystrokes. The patterns in McKinsey's 2025 State of AI and the IBM Institute for Business Value work on tying AI to operating capabilities both point the same direction: value shows up when the workflow is redesigned, not when a chatbot is bolted onto the old one.
Model the ROI as a DSO bridge, not a productivity stat
Here is the model that survives a CFO's scrutiny. Start with the account path: which invoices are eligible (say, $5k+ and 15+ days past terms), what signal sets priority (balance times probability-to-pay, not balance alone), what customer context the message must carry, who approves anything relationship-sensitive, and how a promise or dispute gets captured the moment it comes back. Then build a before-and-after bridge on the metrics that connect to cash — not to activity.
Concretely, track five things across a baseline quarter and a piloted quarter: weighted DSO, coverage (share of eligible accounts actually worked), promise-kept rate, dispute time-to-route, and collector hours per dollar recovered. The temptation is to credit AI with every day you pulled out of DSO. Resist it. If you also tightened credit terms or hired a second collector that quarter, those moved the number too. Attribute honestly — the IBM Institute for Business Value AI ROI research is a useful frame for separating tool effect from process effect, and a CFO who catches you over-crediting will discount everything else you claim.
The dollar math is unforgiving in a good way. A company carrying $4M in receivables at 52-day DSO has roughly $77k of cash tied up for every single day of float — call it half a million dollars a week. Shave seven days off and that is real working capital, valued at whatever your line of credit costs. Run the actual numbers for your book in the AI ROI Calculator before you commit to a vendor's projection.
Start with one invoice category, and keep a human on the trigger
Collections is the one finance workflow where a wrong automated message can cost you the customer, not just the invoice. A breezy AI reminder to an account that is late because your team shipped the wrong product turns a service recovery into a fight. So the first build should never auto-send anything relationship-sensitive. It prepares the account summary and a drafted message, flags whatever it could not reconcile, and routes to a collector or account owner for the send. The human keeps the trigger; the AI removes the 20 minutes of context-gathering that used to precede every thoughtful follow-up.
Scope it tight to start. Pick one slice — say, recurring SaaS invoices 15 to 45 days past due under $25k, where the language is fairly standard and the risk is contained. Run it for a quarter against a control group of similar accounts worked the old way. If weighted DSO drops, promise-kept rate holds or climbs, and your customer-complaint count does not move, you have proof, not a hunch. Then widen to larger balances and thornier categories where judgment matters more.
What to do Monday: pull last quarter's aging report and tag every account that was eligible but never got touched before close. That untouched pile is your ROI ceiling. Size it with the AI ROI Calculator, then use AI for Operations and Finance to design the approval-gated workflow around it.