Ask what the agent can touch
An AI agent consultant should be evaluated by the workflow, data, tools, and controls they design around the model. Bain agentic AI transformation research frames agentic transformation as an operating challenge, and NIST AI Risk Management Framework gives a useful risk-management vocabulary for mapping intended use, measuring risk, and managing the system after launch.
The first question is not whether the demo looks impressive. The first question is what systems the agent can access, what actions it can take, what evidence it shows the reviewer, and what it does when confidence is low or source data conflicts.
Inspect controls before features
Microsoft Learn Copilot architecture, data protection, and auditing is a useful reference because it explains the importance of tenant data boundaries, permissions, and auditing in an enterprise AI assistant. Even when you are not buying Copilot, the same evaluation logic applies: identity, access, logging, review, and retention need to be designed before the agent performs business work.
A credible consultant should describe the agent as a constrained workflow participant, not an autonomous employee. Look for a tool allowlist, action limits, audit logs, approval queues, prompt and retrieval governance, and a plan for testing against real edge cases.
Require operating evidence
McKinsey State of AI research and PwC Responsible AI survey both point to adoption, governance, and accountable redesign as value drivers. Ask the consultant to show a measured workflow improvement, not a model benchmark. Useful evidence includes cycle-time reduction, quality review results, fewer handoff misses, clearer escalation, and owner adoption.
Use AI agents and internal copilots when the work needs a governed assistant, and use workflow automation when the real bottleneck is process routing and data orchestration.