The same SOP, saved four different ways
Ask the warehouse lead how returns get processed and you'll get one answer. Ask the new hire who started in March and you'll get the version from the training deck. Ask the customer-service manager and you'll get the one she keeps in a pinned Slack message because the "official" doc on SharePoint is two reorgs out of date. None of them are lying. They're all following a real SOP. It's just not the same SOP.
This is the actual problem at a 50-to-300-person company, and it's worth naming precisely because it's not a writing problem. The documents usually exist. What's missing is a single answer to four questions: who owns this process, who approves a change to it, where does the current version live, and how does a person learn that it changed. When those four answers are fuzzy, AI that "writes SOPs faster" makes things worse — now you have more documents, all confidently formatted, none of them authoritative.
RSM's middle-market AI survey captures why documentation backlogs are where so many mid-market teams want to point AI first — it feels like low-risk grunt work. It isn't low-risk. An SOP is the thing an auditor reads, the thing a lawsuit subpoenas, and the thing a brand-new employee follows literally. Before you decide between Copilot and something custom, settle the ownership question for one process family. Pick returns, or month-end close, or customer onboarding. Name the human who owns it. Decide how a change gets approved. Then, and only then, decide what AI does inside that boundary.
Copilot is a fast drafter. It is not a system of record.
Here's the honest division of labor. Microsoft 365 Copilot is genuinely good at the part everyone hates: turning a Loom recording of someone doing the task, a messy email thread, and last year's half-finished doc into a clean first draft of the steps. It grounds that draft in content the user already has permission to see — the Copilot architecture retrieves from your tenant rather than inventing from scratch, and the privacy and data protection model keeps that retrieval inside your permission boundary. For a 40-person professional-services firm with maybe two dozen SOPs that rarely change, that's often the whole job. Draft in Copilot, have the process owner edit, post it, done. Spending engineering money on anything more would be malpractice.
The picture changes the moment your SOPs have to do more than read well. Custom workflow earns its cost when you need: role-specific branches (the AP clerk sees different steps than the controller), an approval route before a change goes live, a version history you can point an auditor to, a way to capture the exceptions people hit in the field, a published-vs-draft status, and — the one almost everyone forgets — a record that the people who run the process actually acknowledged the new version. Copilot can draft a beautiful SOP. It will not stop someone from running last quarter's version because nobody told them it changed.
This is where governance stops being a buzzword. The NIST AI Risk Management Framework gives you a concrete handle on review cadence and what the fallback is when the AI-assisted process is wrong — useful structure for deciding how often an SOP gets re-validated and who signs off. And for any SOP that touches customer data, security steps, or proprietary operating detail, CISA's AI data security guidance should shape who can even see the draft, let alone edit it. A returns SOP can live wide open. The SOP for handling a security incident or a payment dispute cannot.
Run a pilot that measures drift, not page count
The trap is measuring the wrong thing. It is trivially easy to prove AI "wrote 40 SOPs this month." It is the wrong proof. Deloitte's 2026 State of AI research keeps landing on the same point: value shows up when the operating behavior changes, not when output volume goes up. For SOPs, behavior changing means people stop running the wrong version.
So scope a six-week pilot around one process family and instrument it for drift, not volume. Track time to publish or update an SOP. Track the percentage of your live SOPs that have a named owner — at the start of a real pilot this number is often shockingly low. Track how many "wait, which version is current?" questions hit the process owner per week, and watch whether that number falls. Track training-acknowledgement completion, and track the field exceptions people log against the documented steps, because that's your signal an SOP describes a fantasy version of the work. The OECD's research on AI adoption among smaller enterprises and the San Francisco Fed's analysis of AI and small businesses both point at the same gap — adoption is easy, durable operating gains are not — which is exactly why your metric has to be drift, not draft count.
The decision rule that falls out of this is clean. If your SOPs are few, stable, and seen by everyone, keep Copilot for drafting and stop there. The minute versioning, approval routing, and acknowledgement have to be enforced rather than hoped for, that's a workflow you build and govern. If you want the build-vs-buy line drawn against your actual process inventory before you spend a dollar of engineering time, start with an AI roadmap that scopes it to your real SOPs.