The assistant works in the demo. Then it cites the deprecated SOP.
Here is the failure that plays out in professional services and technology firms over and over. A partner asks the new AI assistant for the current data-retention policy. It answers in two seconds, beautifully formatted, with a citation. The citation points to a 2023 draft someone parked in a "Working" folder and never deleted. The 2025 approved version is sitting right next to it. The assistant had no way to tell them apart, so it picked the one that scored highest on text similarity. That is not a model problem. That is an intake problem.
Most knowledge teams want to start with the chatbot because it is the visible, fundable thing. The unglamorous move that actually determines whether the chatbot is trustworthy is document intake: classifying each new file as it arrives, pulling metadata, catching duplicates, identifying the owner, and routing it for approval before it ever reaches a search index. Microsoft's own Copilot architecture and data-protection documentation makes the dependency explicit: assistant quality rides on permissions, indexing, and auditability — all of which are decided at the moment content enters the system, not at query time. Skip intake and every answer inherits the mess underneath it.
Decide what is allowed to become an answer
Picture a 90-person consulting firm whose shared drive holds engagement letters, half-finished proposals, expired NDAs, and the one methodology deck everyone actually relies on — all in the same folder tree, all equally indexable. The real question for the intake layer is not "can we read this file." It is "should this file be allowed to speak on the firm's behalf."
The NIST AI Risk Management Framework supplies a usable interrogation for each incoming document: what is its context, what risk does it carry if it surfaces wrong, how will we measure its quality, and who handles the exceptions. Translate that into intake fields the system captures automatically — source system, permission scope, retention clock, last-modified freshness, and approval status — and you have drawn a hard line between draft and authoritative before anything is searchable. AI is genuinely good at suggesting here: it can propose the document type, draft a summary, and flag a likely owner. It should not be the one that promotes a file to "approved." A named human does that. IBM's Institute for Business Value research on AI capabilities lands on the same sequence — trusted data and an adopted workflow come before the flashy capability — and intake is the workflow that makes the trust auditable rather than assumed.
What to put on the dashboard Monday
You can prove the intake layer is working without waiting for the assistant to launch. Track five numbers from week one: duplicate documents collapsed, percentage of new files arriving with complete metadata, stale documents flagged past their retention date, reviewer throughput (items approved or rejected per day), and — once search is live — how often the top answer cites an approved source versus a draft. If duplicate count is falling and metadata completion is climbing, the knowledge base is getting more trustworthy, not just bigger. If ingestion sped up but those numbers are flat, you automated the mess. PwC's Responsible AI survey is a useful reminder that the controls are the point, not the throughput.
If your repositories look like the consulting-firm example above — overlapping versions, orphaned owners, no retention logic — start with a QuickStart AI Audit to map what is actually in there before you index a thing. If leadership is weighing intake against three other AI ideas and wants an honest ranking, run the AI Opportunity Score first. Either way, fix the front door before you hand anyone the assistant.