The assistant will confidently answer with your worst document
Picture a 60-person professional-services firm. A new hire asks the shiny internal assistant, "What's our refund policy for paused retainers?" It answers in two clean sentences, cites a document, sounds authoritative — and quotes a 2023 policy that the partners overruled in a Slack thread nobody ever wrote down. The new hire repeats it to a client. Now you have a billing dispute that the chatbot started.
This is the thing the demo never shows you. A retrieval assistant doesn't reason its way to the right answer; it finds the closest-matching text in whatever you fed it and dresses it up fluently. If your shared drive holds three versions of the same SOW template, two contradicting expense policies, and an "old_FINAL_v2" folder everyone forgot to delete, the assistant retrieves and recites all of it with equal confidence. Speed turns a quiet documentation mess into a fast, scaled, customer-facing one.
So the first question to ask any consultant is not "which model?" It's "how will you decide which documents are allowed to answer?" If the answer is a project plan that starts with source ownership, permissions, and a definition of the authoritative version of each document type, you're talking to a builder. If it starts with a model comparison and a vector database brand name, you're talking to someone who's about to ship your mess back to you at conversational speed.
Four things a real engagement leaves behind
A serious consultant hands you operating assets, not a chat window. Demand four. First, a source inventory: which repositories are approved, which document types live where, and a named human who owns updates for each. "The wiki" is not an owner. A person is. Second, a permissions model that inherits what you already have — if a coordinator can't open the partner comp doc today, the assistant must not summarize it for them tomorrow. The fastest way to lose executive trust is the assistant leaking restricted content to someone who asked an innocent question. Third, an answer standard: every response cites the source document, and the assistant says "I don't have an approved source for that" instead of improvising. Fourth, a maintenance cadence — a standing review where disputed or outdated material gets fixed at the source, not patched in a prompt.
The common architecture under all of this is retrieval-augmented generation, or RAG: source documents sit in a searchable layer, and the system pulls relevant excerpts to ground each answer (Gartner and IBM both have plain explainers). RAG is the right pattern — but it's only as honest as the documents behind it, which is exactly why Deloitte and McKinsey keep finding that enterprise pilots stall on data readiness and governance, not model quality.
Use RAG for SMB knowledge systems to decide whether the architecture is even worth building for your document volume. And if nobody internally owns the AI operating model after the consultant leaves, compare the roles in fractional chief AI officer vs. AI consultant before you hire a builder who'll disappear with the only mental model of how it works. One last test: ask what happens when a user clicks thumbs-down. If the answer stops at "we log it," the loop is broken. Negative feedback has to route to the document owner who fixes the source — otherwise you're collecting complaints, not improving anything.
Pick one domain, and measure whether it stays true
Resist the company-wide launch. The instinct is to point the assistant at everything; the discipline is to prove one domain can stay accurate for ninety days before you expand. For a services firm, the cleanest first domain is usually one with a single obvious owner and high repeat-question volume: implementation playbooks owned by delivery, or HR policy owned by one person who actually maintains it. Narrow scope is what makes the maintenance cadence survivable.
Then measure the right things. Prompt volume is a vanity number — a busy assistant can be a busy liar. Track answer acceptance, how often it cites a real source versus dodging, the count of questions it couldn't answer (your source-gap list, which is gold), and how many corrections got routed back to document owners and actually closed. That last metric tells you whether you built a living system or a static index that's quietly rotting. A real-world productivity lift from these tools is well documented — the NBER study on generative AI at work found meaningful gains for support agents — but the gains came from grounding the assistant in vetted, correct material, not from turning it loose on a drive.
Monday move: open your shared drive, search the single most-asked internal question, and count how many conflicting answers come back. That number — not a vendor pitch — tells you whether you're ready. For the build itself, use how to build an internal AI knowledge assistant as the blueprint, and if you want a structured read on source, permission, and adoption risk first, route the team through the QuickStart AI Audit. The job is to leave behind a maintained source of truth with a conversation layer on top — not a novelty search box that confidently quotes documents you forgot you had.