The quote that ate your gross margin
Picture a prospect who wants one number to deploy an AI support agent across their helpdesk. You scope it like the integration projects you've shipped for a decade: discovery, build, go-live, done. You quote $180K fixed. Eight weeks in, you discover their knowledge base is four years of unversioned Confluence pages, half of which contradict each other. The model hallucinates against the bad data, so you rewrite prompts. The rewrites surface more bad data, so your team starts cleaning it. None of that cleanup was in the statement of work, and all of it is now coming out of your margin.
This is the specific failure mode of pricing a generative AI engagement like a SaaS rollout. Deterministic software has a terminal state — it works, you sign off, you leave. A probabilistic system doesn't converge to "done"; it converges to "acceptable, for now," and getting there is iterative in a way you cannot fully estimate at quote time. When you collapse that uncertainty into a single fixed fee, you are not pricing software. You are underwriting the client's data debt and quietly volunteering to absorb every loop of prompt tuning and model testing the build turns out to need.
The advisory firms have been circling this for two years. Gartner's 2024 IT services growth forecast shows the spend is real and rising, while McKinsey's analysis of generative AI's productivity frontier is blunt that the value lives in workflow change and operating economics — not in the model dropping in clean. The lesson for an integrator: the part of the work that's hardest to bound is the part a fixed fee forces you to eat.
So flip to outcome-based — and inherit a fight you can't win
The reflex is to swing the other way: "Pay us 20% of the headcount cost we take out." It clears procurement fast and it sounds founder-aligned. It also hands you 100% of the operational risk over an environment you don't control, and it sets up two fights you will lose.
The first is attribution. Say the agent deflects 40% of Level 1 tickets. Before your invoice clears, the client's finance team will argue the win came from their refreshed help center, a slow quarter, or the new onboarding flow that cut inbound volume. They're not wrong, exactly — and that's the problem. Isolating the model's specific contribution from everything else moving in the operation is genuinely hard, which is why the McKinsey work cited above treats attribution as a structural challenge rather than a measurement detail. You'll spend more hours forensically auditing their P&L than tuning the model you were hired to build.
The second fight is compute. Every query burns inference, and you've fixed your revenue while leaving your cost variable. Bain's 2024 Technology Report frames exactly this exposure — uncapped consumption pushes infrastructure-cost volatility onto whoever owns the deployment. If real usage runs 10x your projection, your token bill detonates while your outcome fee sits flat. And this is precisely why an acquirer will discount you for it. As I covered in the services valuation matrix, buyers pay for predictable unit economics, not contested upside. A revenue line built on disputed savings and uncapped compute is a quality-of-earnings headache waiting to surface in diligence.
Price the three things separately, because they carry different risk
Both single-number models fail for the same reason: they price one engagement when you're actually selling three jobs with three different risk shapes. Split them.
Data readiness: time and materials, no exceptions
Ingestion, cleansing, and pipeline work is where the unknowns live, so it's the one place you must never quote fixed. Bill it on capacity until you've seen the actual state of their data — then you have the facts to scope the rest. Treat this as a paid discovery, not a loss leader. The firms that win here use it to right-size the engagement; the ones that lose here gave it away to land the build, then watched utilization on the build phase collapse under unbilled cleanup hours.
The build: a bounded fixed fee with hard SLAs
Once the data is structured, a fixed price is defensible — but only with inference caps, API-consumption limits, and explicit error thresholds written into the SLA. The cap is what converts an open-ended R&D loop into a deliverable. Forrester's 2024 AI services landscape found hybrid structures that pair baseline capacity with capped performance bonuses delivered a 24% higher realization rate for integrators than pure fixed-fee deals.
Tuning and governance: the recurring line that's actually worth the most
A model starts decaying the day it ships. Data drifts, source APIs change, edge cases pile up. BCG's research on maximizing AI ROI makes the case for treating deployed models as depreciating assets that need ongoing tuning — which is exactly why this is your highest-value layer, not an afterthought. A managed-services contract turns a one-time project into recurring revenue that compounds as the client's AI footprint grows, and recurring beats project every time on valuation. The mechanics of that gap are in our breakdown of managed versus professional services margins.
Do one thing Monday: take your last AI proposal and physically split the price into those three lines. If discovery isn't T&M, if the build has no inference cap, and if there's no tuning retainer, you're funding your client's experiment on your own balance sheet — and giving away the most valuable revenue you could be earning.