The tell that you hired the wrong consultant
Say a 60-person B2B services firm brings in an AI consultant. Two weeks later they get a 22-slide deck with 14 "high-potential" use cases: agents for support, drafting for proposals, a forecasting copilot, an onboarding assistant, a knowledge bot, automated QA, and seven more. Everyone leaves the readout energized. Nine months later, zero of them are running in production, and the firm has spent $90K learning that energy is not a roadmap.
Here is the tell. A use-case consultant who leaves you with more ideas than you walked in with did the easy half of the job. The hard half — the half worth paying for — is subtraction. You already have a surplus of AI suggestions: from vendors, from the board, from the one engineer who reads every model release. What you lack is someone willing to look at 14 candidates and say "fund two, shelve nine, and three of these are process problems that AI will only make faster and wronger."
That subtraction is where the money is. Most AI waste is committed before a line of code ships — at the moment a team picks the workflow that looks modern over the one with clean inputs and a metric finance already tracks. The pattern across McKinsey's State of AI research, the IBM Institute for Business Value, and PwC's responsible AI work is consistent: returns come from redesigned workflows, real adoption, and governance you can defend — not from buying a better model. A consultant who skips the subtraction is selling you the expensive part of the failure.
What a real evaluation scores — and what it refuses to score
Demand that every candidate workflow get scored on six dimensions, not described in prose: business value tied to a metric you already report, technical feasibility, data quality and access, process stability, compliance exposure, and change-management load. If those six aren't numbers on a page, what you have is a preference list wearing a spreadsheet's clothes — and preference lists always rank the workflow the loudest executive likes.
Watch the four traits that separate a fundable first use case from a demo that dies in month two. The work is repetitive enough to systematize. The current process hurts enough that people will actually switch to the new path instead of quietly reverting. The data is reachable and trustworthy without a six-month cleanup project. And the outcome moves a number the business tracks today — collections follow-up, proposal turnaround, support resolution time — so you're not inventing fake savings to justify the spend. That's exactly why finding the manual work worth fixing has to come before anyone talks to a vendor.
The dimension most consultants skip: failure conditions, written before the enthusiasm sets in. When does a pilot get killed? What error rate is unacceptable? Which actions require a human signature? Which data is off-limits, and what specific number proves the workflow is ready for production versus ready for another month of "almost"? Defining decision rights up front is unglamorous, which is precisely why a serious operator does it — and why a deck-driven consultant doesn't. It's the difference between a controlled experiment and sunk cost with a logo on it.
The deliverable to demand — and the two ideas they should send back manual
Ask for a use-case portfolio, not a trends briefing. Concretely: a ranked shortlist, a baseline metric for each candidate, the data sources and integration dependencies each one needs, the governance controls that apply, a build-versus-buy call, and a 90-day path from shortlist to a governed pilot plan. The best version of this includes a short kill list — the workflows that should be fixed by hand or by process before any model touches them. If a workflow is broken upstream, AI just produces wrong answers at scale and faster, which is worse, not better.
For a growing B2B services firm specifically, anchor the first use case to a metric a manager already loses sleep over: revenue cycle time, support resolution, onboarding ramp, proposal throughput, collections follow-up, reporting accuracy. When the number already matters to someone with a quota or a board seat, the pilot has a sponsor who fights for adoption past the honeymoon. When it's a "neat capability" with no metric owner, it's abandoned by the second sprint no matter how clean the demo was.
Monday, do one thing before you sign anything: list every AI idea floating around your firm and put a tracked metric next to each. The ones with no metric are your kill list, already half-written. Then pressure-test the survivors with the AI Opportunity Score, and bring in the QuickStart AI Audit when you want an operator-led shortlist instead of another wish list. A good use-case consultant makes your next AI investment smaller, clearer, and easier to defend in front of the people who control the budget.