The CIM said "proprietary machine learning." The cloud bill said OpenAI.
On a $150M SaaS target last year, the Confidential Information Memorandum led with "proprietary machine learning" across four slides. The product demo backed it up — clean output, fast responses, a confident sales engineer. Then the diligence team pulled the vendor invoices. The "proprietary intelligence" was a roughly $25-a-month OpenAI integration with no fine-tuning, no retained training data, and a prompt-injection surface wide enough to drive a truck through. The moat was a system prompt and a UI. That is the single most expensive gap in tech M&A right now: the distance between what the CIM claims about AI and what the infrastructure actually is.
The macro pressure makes this gap worse, not better. PwC's 2026 Global M&A Industry Trends finds roughly a third of the 100 largest corporate deals now cite AI as core strategic rationale — so sellers know "AI-native" earns a premium and they write the CIM accordingly. But MIT's 2025 Enterprise GenAI Failure Analysis puts the share of enterprise generative-AI pilots that never reach production or measurable value at 95%. You are routinely being asked to pay a full premium for a capability with a one-in-twenty chance of becoming durable.
So the demo is not your diligence. The CIM is the seller's argument, not your evidence. The job in the data room is to replace the slide narrative with three artifacts the deck never includes: the actual cloud and API invoices, the commit history of the model code, and the data-flow diagram for where customer text goes after a user hits enter. Everything that matters about a GenAI target lives in those three documents — and Gartner's 2026 AI Technical Debt Projections warn that buyers skipping the structural read are quietly absorbing liabilities the seller never priced.
Two failure modes you can find before you sign
The market has split sellers into winners and losers, and you can tell which you are buying. Bain & Company's 2026 Tech M&A Valuation Analysis shows AI-native companies commanding roughly double the ARR multiple of legacy SaaS — but the same report notes about 20% of strategic acquirers walked away from deals specifically because AI was expected to erode the target's core business. The premium and the haircut sit in the same dataset. Your job is to figure out which side of it your target is on, and there are two failure modes that decide it.
The first is the IP chain. Before you accept that the target owns its codebase, ask how the code got written. Gartner's 2025 Shadow AI Cybersecurity Survey reports 69% of organizations suspect or have evidence of employees using prohibited public GenAI tools. If a meaningful share of the target's core application logic was generated by a public model with murky output ownership, your exclusive copyright over that asset is not what the reps and warranties claim. That is not a footnote — it reopens purchase-price allocation and turns a clean acquisition into a contested one. Ask for the engineering team's AI tooling policy and whether anyone enforced it; the gap between the written policy and the actual commit pattern is the real answer.
The second is multi-tenant isolation. An LLM feature bolted onto a shared database is a different security object than the CRUD app it sits on, and most diligence misses it: PwC's 2025 Cyber Due Diligence Benchmarks indicate only about 10% of acquirers perform adequate cybersecurity diligence at all. The specific thing to test here is whether one tenant's adversarial prompt can reach another tenant's data through a shared vector store. We have watched red flags in technology due diligence end nine-figure conversations the week they surfaced, because a missing tenant boundary in a vector database is not a patch — it is a re-architecture the seller wants you to pay to discover.
The three numbers that price the AI
Translate the AI claim into three hard numbers before the next IC meeting, and let the answers move the price. Number one: cost per paid query. Pull the target's real inference invoice for the trailing three months and divide by the volume of revenue-generating queries — not total queries, the ones a customer actually pays for. If a feature burns 4 cents of compute against 2 cents of allocated subscription revenue, the AI is not expanding margin, it is a subsidy you are about to own. Map every AI feature to its compute line and discount anything that drags gross margin below the 80% level a healthy SaaS asset should clear; the technical-debt quantification method for pre-acquisition pricing gives you the mechanics to fold that straight into the model.
Number two: who holds the data. Trace where customer text goes after submission. If prompts and customer content are flowing back into a foundation model without explicit opt-in consent, you are buying latent GDPR and CCPA exposure that no one in the room has reserved against. The defensible posture is zero-retention commercial APIs or self-hosted models where the data stays inside the target's own cloud boundary — and you confirm that from the architecture, not from the CIM's reassurance.
Number three: the defect surface from AI-written code. Speed of generation outran architectural review at most of these companies, and Gartner's 2026 AI Technical Debt Projections forecast a steep rise in GenAI-introduced software defects. Velocity dashboards will not show it, so run the codebase read directly — a focused five-day codebase audit aimed specifically at machine-generated debt will tell you how much of your first-year engineering capacity gets eaten by refactoring before you bid. Monday, request three things from the data room: trailing inference invoices, the data-flow diagram, and the model-code commit history. If the seller cannot produce them quickly, you have learned something the slides were built to hide.