What Is the Unit of Value in AI Spend?

In partnership with

Scale Your IRL Campaigns Like Digital Ads

Out Of Home advertising has long been effective but hard to scale—until now. AdQuick makes it simple to plan, deploy, and measure campaigns with the same efficiency and insight you expect from online marketing tools.

Marketers agree: OOH is powerful for brand growth, driving new customers, and reinforcing messaging. AdQuick makes it easy, intuitive, and data-driven—so you can treat real-world campaigns like any other digital channel.

Learn more, visit AdQuick.com

AI
What Is the Unit of Value in AI Spend?

Cloud cost has always had a natural unit: a virtual machine running for an hour, a storage bucket holding a gigabyte, a data transfer event moving a measurable number of bytes from one region to another. Finance could read a bill, trace a line item back to a workload, and connect that workload to a team or a product. It was not always clean, but the cost object was recognizable.

AI workloads change the cost object in every way. When an application calls a large language model (LLM), the billing unit is the token, a fragment of text representing a word, part of a word, or a punctuation character. A single user request can consume hundreds or thousands of tokens depending on prompt length, context window size, and how much the model needs to reason before responding. Inference can cost anywhere from fractions of a cent to several dollars depending on the model tier and input complexity. That range does not exist in traditional compute billing, and organizations have not yet built the instrumentation designed for it.

This is where a familiar FinOps problem surfaces in a new form. Finance can see aggregate AI spend climbing. Engineering can see token counts in logs. Neither team can easily answer the question that matters most to the business: what did it cost to produce one unit of output?

The Measurement Gap

The challenge is that the relevant cost object does not map to any existing financial structure most teams have. Consider what happens when a user interacts with an AI-powered feature:

The application sends a prompt to a model API, incurring token-based input costs
The model generates a response, incurring output costs at a different and typically higher rate
If the system uses retrieval-augmented generation (RAG) [a technique that pulls relevant context into the prompt before the model responds] vector search and storage costs are triggered separately
Retry logic may fire when a response fails validation, silently doubling or tripling the call cost
Networking and egress fees accumulate, especially across multi-region deployments
Logging and observability infrastructure captures each interaction at additional cost

The posted per-call price reflects only a fraction of what a resolved AI task costs the business. Each of those surrounding costs is distributed across services, accounts, and billing categories built for different cost shapes, and none of them shows up on a single line item.

Why the Unit Matters Financially

There is a pricing dynamic that makes this harder to ignore over time. Per-token costs have fallen by more than 280 times for comparable model performance between 2022 and 2024, according to the Stanford HAI 2025 AI Index. Enterprise AI spending still more than tripled between 2024 and 2025, according to Menlo Ventures' 2025 State of Generative AI in the Enterprise report. The unit got cheaper while volume expanded faster, and teams reading falling per-token prices as a sign that AI costs are under control are working from an incomplete signal.

The gross margin implication is more direct than most finance teams currently appreciate. Traditional SaaS companies typically operate with gross margins well above 80 percent. AI-centric products compress that number considerably because inference cost functions as a direct input into cost of goods sold (COGS) in a way that traditional software compute never did. Engineers building AI features are making gross margin decisions whether finance knows it or not. As one FinOps practitioner put it in the State of FinOps 2026 report: "Is your AI providing value? No one can answer that question yet."

Where FinOps Becomes the Connective Layer

FinOps is positioned to close this gap because its function is to translate technical activity into financial context. The same practices that connect compute usage to cost centers can extend to AI workloads. The extension requires deliberate instrumentation decisions that most teams have not yet made, but the organizational pattern is familiar. Some teams are beginning to build toward three meaningful metrics:

Cost per inference call: the total cost of one model response, including all downstream services it triggers
Cost per token: the per-unit billing rate adjusted for consumption patterns, tracked over time to surface prompt inefficiencies before they become expensive habits at scale
Cost per AI-assisted transaction: the metric that connects model usage to a business outcome [a completed support resolution, a generated document, a processed order]

Cost per AI-assisted transaction is where FinOps and product economics finally meet. When the cost is justifiable against the outcome it produces, the number becomes a planning input rather than a budget anomaly. When it cannot be explained in those terms, it will eventually surface in a finance conversation with no clean answer.

What often gets missed here is that the instrumentation problem is primarily organizational. Finance, engineering, and product teams are all measuring different things, and FinOps has not yet been consistently asked to bridge them in the context of AI workloads.

Getting to a reliable cost-per-output number requires shared definitions, consistent tagging of AI spend at the feature or product level, and a shared agreement to treat inference spend as a cost of goods rather than an open-ended experiment. That last shift carries more financial weight than it initially sounds.

10 AI Stocks to Lead the Next Decade

AI isn’t a tech trend – it’s a full-blown, multi-trillion dollar race, and 10 companies are already pulling ahead.

These are the innovators driving real revenue, attracting institutional attention, and positioning for massive growth.

Get all 10 tickers in The 10 Best AI Stocks to Own in 2026, free today.

Download Now

RESOURCES
The Burn-Down Bulletin: More Things to Know

Inference Economics: What It Is And Why It Matters Now
CloudZero (2026): A detailed breakdown of AI inference unit economics, including why per-token price declines do not translate to falling total spend and what a cost-per-unit framework looks like in practice.
Cost-to-Serve in AI: The Most Overlooked Metric for Sustainable Margins
Mavvrik (2025): Covers why cost-to-serve attribution is the missing link between AI spend and margin management, and how the accountability gap widens when finance and engineering are working from different cost signals.
Cloud Unit Economics: How AI Is Making It a FinOps Must-Have Holori (2026): Covers the unit metrics needed to track AI workload costs and explains why cost per transaction is the shared language that bridges engineering decisions and financial outcomes.
AI Cost Observability: Measuring and Justifying Token Spend in 2026 Vantage (2026): A practitioner-focused breakdown of where AI provider billing data is granular enough for attribution and where the gaps are, with specific coverage of AWS Bedrock, OpenAI, and Anthropic API structures.

That’s all for this week. See you next Tuesday!

What Is the Unit of Value in AI Spend?

Scale Your IRL Campaigns Like Digital Ads

AI
What Is the Unit of Value in AI Spend?

10 AI Stocks to Lead the Next Decade

RESOURCES
The Burn-Down Bulletin: More Things to Know

Keep Reading

FinOps Cash Flow

What Is the Unit of Value in AI Spend?

Scale Your IRL Campaigns Like Digital Ads

AIWhat Is the Unit of Value in AI Spend?

10 AI Stocks to Lead the Next Decade

RESOURCESThe Burn-Down Bulletin: More Things to Know

Keep Reading

FinOps Cash Flow

AI
What Is the Unit of Value in AI Spend?

RESOURCES
The Burn-Down Bulletin: More Things to Know