Skip to content
Contact Us
Unit Economics4 min

How to Price a Generative AI Build: Why One Number Wrecks Your Margin

Quoting one fixed number for a generative AI build means absorbing the client's data debt and uncapped inference costs. Here's the three-layer pricing fix.

A chart showing the unit economics of hybrid AI pricing models compared
to fixed fee and outcome-based structures.
Figure 01 A chart showing the unit economics of hybrid AI pricing models compared to fixed fee and outcome-based structures.
Answer summary

The practical answer

Short answer
Quoting one fixed number for a generative AI build means absorbing the client's data debt and uncapped inference costs. Here's the three-layer pricing fix.
Best fit
Industry: IT Services & Consulting. Function: Unit Economics
Operating path
Unit Economics -> Commercial Performance -> Transaction Advisory Services -> Valuations
Key metric
3 Pricing layers to separate: data readiness, bounded build, and managed optimization.

The quote that ate your gross margin

Picture a prospect who wants one number to deploy an AI support agent across their helpdesk. You scope it like the integration projects you've shipped for a decade: discovery, build, go-live, done. You quote $180K fixed. Eight weeks in, you discover their knowledge base is four years of unversioned Confluence pages, half of which contradict each other. The model hallucinates against the bad data, so you rewrite prompts. The rewrites surface more bad data, so your team starts cleaning it. None of that cleanup was in the statement of work, and all of it is now coming out of your margin.

This is the specific failure mode of pricing a generative AI engagement like a SaaS rollout. Deterministic software has a terminal state — it works, you sign off, you leave. A probabilistic system doesn't converge to "done"; it converges to "acceptable, for now," and getting there is iterative in a way you cannot fully estimate at quote time. When you collapse that uncertainty into a single fixed fee, you are not pricing software. You are underwriting the client's data debt and quietly volunteering to absorb every loop of prompt tuning and model testing the build turns out to need.

The advisory firms have been circling this for two years. Gartner's 2024 IT services growth forecast shows the spend is real and rising, while McKinsey's analysis of generative AI's productivity frontier is blunt that the value lives in workflow change and operating economics — not in the model dropping in clean. The lesson for an integrator: the part of the work that's hardest to bound is the part a fixed fee forces you to eat.

So flip to outcome-based — and inherit a fight you can't win

The reflex is to swing the other way: "Pay us 20% of the headcount cost we take out." It clears procurement fast and it sounds founder-aligned. It also hands you 100% of the operational risk over an environment you don't control, and it sets up two fights you will lose.

The first is attribution. Say the agent deflects 40% of Level 1 tickets. Before your invoice clears, the client's finance team will argue the win came from their refreshed help center, a slow quarter, or the new onboarding flow that cut inbound volume. They're not wrong, exactly — and that's the problem. Isolating the model's specific contribution from everything else moving in the operation is genuinely hard, which is why the McKinsey work cited above treats attribution as a structural challenge rather than a measurement detail. You'll spend more hours forensically auditing their P&L than tuning the model you were hired to build.

The second fight is compute. Every query burns inference, and you've fixed your revenue while leaving your cost variable. Bain's 2024 Technology Report frames exactly this exposure — uncapped consumption pushes infrastructure-cost volatility onto whoever owns the deployment. If real usage runs 10x your projection, your token bill detonates while your outcome fee sits flat. And this is precisely why an acquirer will discount you for it. As I covered in the services valuation matrix, buyers pay for predictable unit economics, not contested upside. A revenue line built on disputed savings and uncapped compute is a quality-of-earnings headache waiting to surface in diligence.

A diagram illustrating the margin erosion risk associated with
uncapped inference and API costs in outcome-based AI engagements.
A diagram illustrating the margin erosion risk associated with uncapped inference and API costs in outcome-based AI engagements.

Price the three things separately, because they carry different risk

Both single-number models fail for the same reason: they price one engagement when you're actually selling three jobs with three different risk shapes. Split them.

Data readiness: time and materials, no exceptions

Ingestion, cleansing, and pipeline work is where the unknowns live, so it's the one place you must never quote fixed. Bill it on capacity until you've seen the actual state of their data — then you have the facts to scope the rest. Treat this as a paid discovery, not a loss leader. The firms that win here use it to right-size the engagement; the ones that lose here gave it away to land the build, then watched utilization on the build phase collapse under unbilled cleanup hours.

The build: a bounded fixed fee with hard SLAs

Once the data is structured, a fixed price is defensible — but only with inference caps, API-consumption limits, and explicit error thresholds written into the SLA. The cap is what converts an open-ended R&D loop into a deliverable. Forrester's 2024 AI services landscape found hybrid structures that pair baseline capacity with capped performance bonuses delivered a 24% higher realization rate for integrators than pure fixed-fee deals.

Tuning and governance: the recurring line that's actually worth the most

A model starts decaying the day it ships. Data drifts, source APIs change, edge cases pile up. BCG's research on maximizing AI ROI makes the case for treating deployed models as depreciating assets that need ongoing tuning — which is exactly why this is your highest-value layer, not an afterthought. A managed-services contract turns a one-time project into recurring revenue that compounds as the client's AI footprint grows, and recurring beats project every time on valuation. The mechanics of that gap are in our breakdown of managed versus professional services margins.

Do one thing Monday: take your last AI proposal and physically split the price into those three lines. If discovery isn't T&M, if the build has no inference cap, and if there's no tuning retainer, you're funding your client's experiment on your own balance sheet — and giving away the most valuable revenue you could be earning.

Continue the operating path
Topic hub Unit Economics CAC payback, NRR, gross margin by segment, cohort analysis, paid-on-bookings vs. paid-on-cash. Pillar Commercial Performance Unit economics are board-pack math: defensibly true, executable now, the floor of every valuation conversation. Service Transaction Advisory Services Operator-led buy-side and sell-side diligence for technology middle-market deals. Financial rigor, technical diligence, and integration risk in one workstream. Service Valuations Credible valuation work for SaaS, services, IP, ARR/MRR, cap tables, and exit readiness in technology middle-market transactions. Service Office of the CFO ARR waterfalls, board reporting, FP&A, unit economics, forecast accuracy, and finance infrastructure for technology companies scaling or preparing for exit.
Related intelligence
Sources
  1. Gartner's 2024 IT Services Growth Forecast
  2. McKinsey's Generative AI Productivity Frontier analysis
  3. Bain's 2024 Technology Report
  4. Forrester's 2024 AI Services Landscape
  5. BCG's Maximizing AI ROI research
Move on this

A 14-day operator-led diagnostic, before the gap is priced into your multiple.

No retainer until we agree on the work.

Request a Turnaround Assessment →