The second-year associate is the bottleneck, and everyone knows it
Picture a 60-person consulting shop on a Tuesday. A new analyst is three weeks in and has a question about how the firm handles a scope-change memo. She doesn't open the wiki, because the wiki has four versions of the answer and she can't tell which one survived the last methodology refresh. So she pings a senior associate, who is mid-deliverable, who answers from memory, who is sometimes wrong. Multiply that by every new hire and every "quick question," and your most billable people are spending their week as a human FAQ.
That is the actual problem AI on training documentation is supposed to solve. Not "let's have a chatbot." The job is to let a new consultant get a reliable answer about your delivery method without taxing the one person who knows it. The trap is putting a model on top of the same four-versions-of-everything document pile and shipping confident answers built on the outdated version. Now you've automated the wrong onboarding at machine speed.
So the first move isn't model selection. It's deciding which documents are allowed to answer. Pick a narrow set: the current delivery playbook, the active QA checklist, the onboarding task list, and the proposal-to-kickoff handoff notes. Performance reviews, client-specific judgment calls, and anything a partner would want to phrase carefully stay off the source list entirely. The Census Bureau's AI Use at U.S. Businesses data and the OECD's work on AI adoption by SMEs both point to the same thing: adoption is easy, but firms your size win or lose on whether the underlying content is actually trustworthy.
The test isn't "did it answer," it's "can she show a partner the source"
Here is the difference between a useful onboarding assistant and a liability. When the analyst asks how scope changes are handled, a useful system returns the answer and the exact playbook section it came from, dated, with the practice lead who owns it. She can paste that into a client doc and defend it. A liability returns a fluent paragraph with no provenance, and three weeks later a partner is unwinding a kickoff deck that quoted a process the firm retired last fall.
So build the review around provenance, not cleverness. Before any answer reaches a new hire, the practice lead who owns that domain should have signed off on the source document going into the index, and every answer should surface its source so the consultant can verify it herself. Deloitte's State of AI in the Enterprise 2026 keeps landing on this: the value shows up in the boring discipline after the demo, not the demo. The NIST AI Risk Management Framework is useful here precisely because risk in onboarding is contextual. A loose sentence in an internal draft is harmless until a brand-new consultant treats it as the firm's official method in front of a client. And since these documents describe how you actually win and run engagements, the CISA guidance on securing AI training data should set who can query what, what gets logged, and how long it's retained.
Measure four things in the first eight weeks: how long it takes a new hire to reach billable confidence, how many "quick question" interrupts your seniors absorb per week, the QA correction rate on new-hire work product, and how often the assistant gets a method question wrong. If those don't move, do not buy a bigger model. The problem is your document pile, not your retrieval. When the same wrong answer keeps surfacing, that's a signal that two versions of a playbook are both live and somebody needs to delete one.
A 90-day plan a managing partner can actually run
Days 1 to 30: have each practice lead point at the documents they would defend in front of a client, and archive everything else out of the index. This is unglamorous and it's the whole game. If your delivery playbook has a 2024 version and a 2025 version both sitting in the folder, the cleanup is the project. The model can't pick the right one; a person has to.
Days 31 to 60: run the assistant in shadow mode. When a new consultant asks something, compare the AI answer against what a second-year associate would actually say. Where they disagree, you've found either a bad source or a gap in the documentation, and both are worth more than the bot. Days 61 to 90: decide. The "scale it" signal is quiet and operational: new hires reach billable faster, seniors get interrupted less, and QA finds fewer method errors in junior work. The "don't scale it" signal looks polished but leaves your managers still spot-checking every onboarding answer by hand, which is just a new review queue wearing a nicer interface. The Federal Reserve Bank of San Francisco's early findings on small business AI are a useful gut check on what realistic gains look like at your size.
If you're weighing this against other places AI could earn its keep, run the AI Opportunity Score first, then the AI ROI Calculator once you have real ramp-time numbers rather than a vendor's slide. We package that sequence inside the AI Transformation Blueprint so a firm can go from "the onboarding docs answer questions correctly" to the next workflow without ever losing track of which document said what.