The PO that slipped while everyone was busy
Picture a buyer at a 120-person manufacturer on a Tuesday. She has 60-some open purchase orders. Most are fine. But PO 4471 was promised for the 14th, a partial shipment arrived on the 9th covering 40% of the line, the supplier's last email said "rest to follow shortly," and nobody has touched it since. The first anyone hears about the gap is when the floor runs short and a production order stalls. The follow-up didn't fail because the buyer is bad at her job. It failed because purchase-order follow-up is a watching problem, and humans don't watch 60 things evenly.
That is the real question underneath "Copilot or custom AI" for PO follow-up. It isn't whether AI can write a status-request email — it obviously can. It's whether the thing you're buying actually notices the PO that's quietly drifting toward a stockout, or whether it just makes you faster at the ones you already remembered to chase. RSM's middle-market AI survey shows plenty of companies this size adopting AI; far fewer are using it to change an operating cadence rather than to draft text faster. PO follow-up is squarely an operating-cadence problem.
Two tools, two different jobs
Microsoft 365 Copilot is genuinely good at the part your buyer is already doing. Drop it on a tangled supplier email thread and it'll summarize where the commitment landed, draft a status request in her voice, and pull the relevant context from her mailbox and the attached PO. Because it works inside her permissioned Microsoft 365 content — the email, the SharePoint files she can already see — the privacy and data boundary stays clean and the architecture doesn't require you to wire anything new together. If your bottleneck is "my buyers spend too long rereading threads and rewriting the same email," Copilot is the cheaper, faster answer. Don't overbuild.
But Copilot has a structural blind spot for this workflow: it acts when she asks. It does not sit on the PO ledger overnight watching for the order that crossed its promised date with no receipt and no supplier reply. That's the custom-workflow job. A purpose-built workflow polls open POs against your ERP, flags the line that's aging past its commitment, spots the partial receipt that never got its balance, queues the exceptions in priority order, sends the approved follow-up, and writes the status back so the next buyer isn't reconstructing it from scratch. That reach across systems is exactly why governance stops being optional: supplier pricing, inventory positions, and purchasing records now move between tools, so CISA's data-security practices govern the pipes and the NIST AI Risk Management Framework sets where the system acts on its own versus where it must hand a buyer the decision. A workflow that emails a supplier "where's the rest of PO 4471" is fine to automate. One that changes a promised date in the ERP is not — that's a human approval, every time.
Decide it with one number: time-to-notice
Skip the feature-by-feature bake-off. The metric that separates these two purchases is time-to-notice: from the moment a PO crosses its promised date with no receipt and no supplier commitment, how long until a human knows? With Copilot, that clock runs as long as the gap between the buyer's manual reviews — days, sometimes the length of a vacation. A custom workflow drives it toward overnight. If your stockout exposure and project-delay risk live in that gap, you've found your build case. If they don't, you haven't, and a better email assistant is the honest answer.
So do this Monday: pull last quarter's late or short POs and, for each, write down when it actually went late versus when someone first noticed. If that gap is routinely measured in days and it's costing you expedite fees or stalled production, the watching is the value — and Copilot won't give you watching. Deloitte's State of AI report keeps landing on the same point: the hard part isn't the demo, it's making the thing run reliably in production. For PO follow-up, "production" means the exception queue is trustworthy enough that your buyers stop double-checking it — and that earns its build cost only on the POs where the cost of noticing late is real. Start with one aging-PO class, prove the time-to-notice drop, then expand.