The demo always works. That is the problem.
Picture the meeting. A consultant turns their laptop around and an AI tool drafts a customer email, summarizes a call, or pulls an answer out of a pile of documents in four seconds. Heads nod. Someone says "that would save us hours." A proposal lands two days later for $22,000.
Here is what the demo did not show you. It ran on clean data the consultant chose. It answered a question they knew it could answer. Nobody on your staff touched it, nobody questioned the output, and no customer ever saw the result. A demo proves the consultant can build something impressive. It tells you almost nothing about whether it survives contact with your actual business on a normal week.
For a small business, that gap is expensive in a way it is not for an enterprise. You do not have a pilot budget to burn or an innovation team to absorb a miss. So evaluate the consultant, not the trick. The research that matters here is not about which model is hottest — it is about whether the work gets adopted and measured. McKinsey's State of AI and the IBM Institute for Business Value both point to the same unglamorous truth: value comes from redesigning a real workflow and getting people to actually use it, not from the capability sitting in a slide.
Make them run it on your worst Tuesday
The single best evaluation move costs you nothing: ask the consultant to demo on your data, not theirs. Hand them five real customer records — including the messy one with the duplicate entry and the note that says "do not call before noon." Ask the tool to handle the support ticket that is half-vented frustration and half-question. Watch what happens when the input is not the polished sample they rehearsed with.
Say you run a 25-person HVAC company and a consultant wants to automate your scheduling follow-up. The demo handled a tidy "your appointment is confirmed" message beautifully. Now ask: what does it do when a customer replies "actually my furnace died last night, can someone come today"? If the consultant can show you that path — escalation to a human, a clear handoff, a record of who owns the reply — they understand your business. If they pivot back to the happy-path slide, the scope is not ready.
You do not need an enterprise compliance department to make this safe, but you do need rules. The NIST AI Risk Management Framework reduces to four plain questions a small-business owner can actually use: what is this for, what could go wrong, who reviews the output, and who owns it when it breaks. A consultant who cannot answer those in plain language about your specific workflow is selling you the demo, not the system. And watch where they want to start — HubSpot's State of Marketing shows most small businesses first reach for AI inside sales and marketing, which is fine, as long as it becomes an owned, checked workflow and not a drawer full of clever prompts nobody is accountable for.
Five questions that separate a builder from a vendor
Before you sign anything, make the consultant answer these out loud, about one specific workflow, with no slides:
1. Which one workflow goes first, and why that one before the others? A real answer names a single recurring task — proposal drafting, invoice follow-up, ticket triage — and explains why the rest should wait. 2. What data does it need, and do we already have it in usable shape? If the honest answer is "we'd need to clean up your records first," that is not a reason to walk — it is a cost they should name now, not surface in month two. 3. Who reviews the output before it reaches a customer? By name, on your staff. 4. What is the adoption plan when my team ignores it? Because they will, at first. 5. What single number will tell us in 60 days whether this helped? Hours saved, response time, follow-ups that didn't slip — one metric, agreed up front.
If a consultant leads with model names and tool enthusiasm instead of these five, you are paying for their excitement. The one you want makes the first project smaller, not bigger. Before you take a single sales call, run the AI Opportunity Score to find your own first workflow, then use the QuickStart AI Audit to pressure-test it — so you walk into every demo knowing the answers before they do.