Ask the question your bot will fail on
Here is the five-minute test before anyone scopes an internal knowledge-search project. Pick a question your team actually asks — "what's our refund policy for a client who cancels mid-engagement," "which MSA template do we use for a reseller deal," "what's the approved PTO carryover." Now ask two of your most senior people, separately, without letting them confer. If you get two different answers, stop. You do not have a search problem. You have a source-of-truth problem, and an AI layer will not solve it — it will pick one of the two answers, or worse, blend them, and deliver the result to a junior employee with total confidence and no footnote saying "the partners disagree on this."
That is the specific failure mode of knowledge search, and it is different from every other AI workflow. A ticket-triage bot that misroutes gets caught at the next step. A search bot that confidently cites a superseded 2023 engagement letter buried three folders deep in SharePoint gets believed, acted on, and discovered three weeks later when the client points at the contract. The RSM middle-market AI survey and Deloitte's State of AI in the Enterprise 2026 keep landing on the same point: the firms getting value run AI on top of governed, owned information — not on top of whatever has accumulated in the shared drive since the last office move.
For a professional-services or B2B-services firm, that shared drive is the problem. It is where the real institutional knowledge lives — and where four versions of the same proposal template, two contradictory expense policies, and a folder named "FINAL_v3_USE_THIS_ONE" all sit with equal authority. A retrieval model reads all of it as equally true.
What the model can't do that your senior people quietly do
When a new analyst asks a partner "which scoping template do I use," the partner does three things in two seconds without noticing: recalls that the template changed after the Q1 pricing reset, knows the old one is still floating around, and answers from the current one. That layer of "yes, but the recent version is the one that counts" is invisible governance, and it is exactly what a knowledge bot lacks unless you build it in. The model returns the document that scores highest on text similarity, which is just as likely to be the deprecated one with three years of edits making it look thorough.
So the real precondition isn't "do we have a lot of documents." Most firms have far too many. It's whether you can answer four questions in writing for the corpus you'd point the bot at. First, what is the single approved source for this topic, and who said so? Second, who owns retiring the stale copies? Third, who sees what — does the model respect the same permissions a person would, so an associate can't surface partner comp memos? Fourth, when the answer is genuinely contested or confidential, what does the system do instead of guessing? If those answers don't exist yet, the NIST AI Risk Management Framework and CISA's AI Data Security Best Practices both push the same direction — define the controls first, because a search tool inherits every permission gap and stale record underneath it and serves them faster.
The fix is not a year of cleanup. Pick one high-traffic topic — say, the questions your delivery team Slacks each other about most. Name the one canonical document. Assign one person to keep it current. Delete or clearly archive the duplicates. Decide what the system says when it doesn't know. That's a readiness sprint, not a transformation program, and it makes a future bot honest instead of confidently wrong.
Restart narrow — and measure the thing people actually feel
When you do build, resist the all-firm assistant. Launch one topic, for one team, with one owner and one weekly review of what the bot got asked and what it answered. Read the actual transcript for the first month. You are not grading the AI on a benchmark; you are watching whether the answers it gives are the answers your best person would have given, and whether people trust it enough to stop asking each other.
The metric that matters for knowledge search is not "queries answered." It's whether the time-to-correct-answer dropped and whether the senior people stopped getting interrupted with questions the bot now handles. Track exception rate — how often someone has to override or correct it — because a search tool that's wrong even five percent of the time on policy questions trains people to distrust all of it. Watch adoption honestly: if your team quietly goes back to DMing the office manager, the bot failed regardless of its eval scores.
If two senior people gave you two answers in the test above, your Monday is clear: pick one topic, settle the canonical answer, name the owner. That single act is worth more than the model. When the corpus is governed and the workflow is narrow, you can scale with evidence instead of optimism — and that's the moment to measure ROI without fake savings and lay out a proper 90-day rollout.