Your readiness signal is sitting in the PSA, not on a vendor demo
Picture the dispatch board at a 200-person IT services firm on a Tuesday morning: 340 open tickets, half of them tier-1 password resets and printer escalations, a backlog of client documentation that's three quarters stale, and a service desk manager triaging by gut because the PSA priority field has been wrong since the last RMM integration broke. That mess is the AI readiness assessment. You don't need a tool inventory or a "GenAI strategy offsite." You need to know which of those repeated motions has clean enough data, enough volume, and a human who can catch a bad output before it reaches a client.
Most assessments invert this. They start with the model and ask what it could touch. The better order, and the one the U.S. Census Bureau AI business adoption analysis, the OECD report on AI adoption by small and medium-sized enterprises, and Deloitte State of AI in the Enterprise 2026 all point toward, is to drag the conversation out of enthusiasm and into production reality: where does the work already repeat, where is the delay measurable, and who owns the source?
For a firm this size, the candidate surfaces are obvious once you stop looking at AI and start looking at your own systems: PSA ticket categorization, RMM alert triage, the knowledge base your techs ignore because it's never current, client runbooks, and the finance handoff where billable hours go to die. Rank those by source quality, client-data sensitivity, how loaded the reviewer is, and the hours you'd actually claw back. That ranked list, with the system of record and the named approver attached to each row, is the only readiness deliverable worth presenting to the managing partner.
Score the workflow before you fall in love with the model
Here's the trap that catches IT shops specifically: you're a technical team, so you skip straight to evaluating models and integrations. Resist it. For each candidate, first write down the exact artifact the assistant would consume (the PSA ticket body, the RMM alert payload, a specific knowledge article, the finance approval record), then name the shape of the output. Is it a draft reply? A category guess? A summary? A routing suggestion? A risk flag on a client environment? Those are not interchangeable. A drafted ticket response sits in a queue and waits for a tech to send it. A suggested RMM remediation that auto-executes against a client's server is a different risk class entirely, and you should treat it that way.
The NIST AI Risk Management Framework earns its place here because it forces the order: define the context, measure the risk, govern the thing before you scale it. Translate that into a scorecard your service leader can actually fill out: how fresh is the source data, where's the permission boundary (can this thing see Client A's documentation while working Client B's ticket?), how much review time does each output cost, what's the blast radius if it's wrong, and what's the exception rate you're willing to tolerate. The pilot you want is the one with real ticket volume, sources a human can inspect in seconds, and a reviewer who can accept or kill every single output during the first few weeks.
And if the team can't name who owns the knowledge base, or can't describe what a correct output even looks like for a given workflow, the honest readiness verdict is not "automate." It's "fix the source first." A 200-person firm with a stale CMDB and inconsistent ticket tagging will get faster, more confident garbage out of an assistant. Repairing the operating source is a better quarter's work than launching something nobody can govern.
Ship one governed sprint, and watch whether it actually shrinks the queue
An IT services firm carries the kind of data you do not casually feed a model: client architecture diagrams, credential-adjacent notes, contract obligations, ticket histories that name specific environments. CISA AI data-security best practices should set the rails for your first sprint before a single ticket gets summarized: which client sources are in scope, which are explicitly walled off, how outputs get logged, and what triggers a human escalation. Multi-tenant separation isn't a nice-to-have here; it's the whole ballgame. One leaked cross-client detail and the pilot is over.
So move exactly one workflow into production testing. Not three. The strongest first candidates for a shop this size are usually narrow and reviewable: summarizing the long tail of resolved tickets in a single queue so the next tech has context, reviewing implementation handoff notes for missing steps, or knowledge retrieval restricted to approved, current playbooks. For four weeks, log it ruthlessly: outputs the techs accepted, outputs they rejected, sources that turned up missing, and minutes of review time per output. Now you can answer the only question that matters: did this drain the queue, or did it just create a second queue of things to check?
Use the AI Opportunity Score to rank your top three candidates against each other, and the AI ROI Calculator to tie the case to recovered service hours and review burden rather than vibes. You'll know the assessment is done when leadership can say, in one sentence each: this workflow ships first, this one waits, and these data repairs happen before anything broader goes live.