10 min Devops

Why is AI so hard to cost? FinOps Foundation codifies ‘price mechanics’ of AI

Why is AI so hard to cost? FinOps Foundation codifies ‘price mechanics’ of AI

One of the Linux Foundation’s more vibrant project arms these days, the nonprofit FinOps Foundation has already been vocal on extending the scope of FinOps as we have known it thus far. The foundation explains that the FinOps Framework has been updated to reflect the “present-day practice” of FinOps, as it now evolves to a Cloud+ approach to managing technology spend.  To define this term, this is where FinOps practices have also begun to apply their capabilities to private cloud and datacenter “scopes of spending” and wider associated technologies. The FinOps Foundation has also laid down its approach to FinOps for AI, so what does this mean and how do we calculate the price mechanics o f AI itself?

What is FinOps for AI?

Because AI is (obviously) advancing so fast, it creates a corresponding (if not perhaps even equal and opposite) set of FinOps challenges. The FinOps Foundation thinks it has an answer to these challenges with its new launch of FinOps Certified: FinOps for AI. This is a new education series and certification built to help DevOps (and indeed FinOps) practitioners to understand, manage and optimise AI spend. Educational content in this stream will be released in stages, individuals can start learning right away and stay current as the field evolves. 

This progress of certification from the foundation is meant to signify FinOps that is suitable to, attuned towards and aligned within FinOps practices for the age of AI.

The cost of AI

“AI adoption is exploding, but with innovation comes complexity… and managing the cost of AI services is quickly becoming one of the biggest challenges facing FinOps teams today (the State of FinOps survey revealed 63% are already being asked to manage AI costs),” detailed theFinOps Foundation, in a technical statement. “We’re taking an agile, iterative approach by releasing content in phases throughout the year so [users] can start learning right away, build skills as [they] go and keep pace with an industry that’s constantly evolving. Earn badges as you progress through levels and work toward full certification by March 2026.”

The foundation says it has structured its learning series as follows:

Level 1: Education services designed to give practitioners a strong foundation in AI cost allocation, data ingestion, anomaly detection, so-called “chargeback models” and how the scope of AI differs from traditional cloud services. An increasingly popularised term in cloud computing, a chargeback model is a cost allocation method that sees costs incurred by the IT department then assigned and billed to the appropriate business units or departments that utilise those resources.

Level 2: Users can learn how to plan, forecast and govern AI investments with cost visibility and accountability in mind.

Level 3: FinOps fans can dive into advanced topics like workload and rate optimisation, unit economics, sustainability and architecting AI systems for cost efficiency. 

In classical economics, the three “units” are the firm, the household and the government. But here, in the case of unit economics for cloud computing (and wider SaaS and ITAM cost management) we are talking about the direct revenues and costs of a particular business team, department or organisation measured on a per-unit basis, where a unit can be any quantifiable item that brings value to the business. In the ever-more clinical world of FinOps, being able to detail IT costs down to the per-unit level is fundamental.

The FinOps Foundation says its certification will enable professionals to “demonstrate a full understanding” of FinOps for AI by completing the final exam and earning the FinOps Certified: FinOps for AI credential.

Digital badges, gotta get ’em all

As we know, developers like stickers, pins and collectables of all kinds. The FinOps Foundation has catered for an element of gamification here and will provide “digital badges” for each level so users can show their level of competency across FinOps Trained: FinOps for AI: Level 1, 2, and 3 to recognise growing expertise on the road to a final exam pass and a FinOps Certified: FinOps for AI badge and certificate.

“This is more than just another training initiative; it’s a forward-looking learning path for FinOps practitioners who want to stay ahead of the curve and lead their organisations through the next wave of AI growth with cost management and business value in mind,” explained the FinOps Foundation, in a launch statement.

A “build fundamentals” foundational course section introduces the relationship between FinOps and AI, why AI-specific cost management matters and how to begin thinking about AI usage and spend within an organisation. This includes recognising why AI cost management is uniquely important; it also codifies and teaches the skills needed to identify initial steps for ingesting, tracking and managing AI cost and usage data.

A dynamic that surely needs tabling and explaining more openly, the foundation here promises to help users build practical skills to understand and manage the “cost mechanics of AI services” and how to interpret and allocate AI costs effectively. Users will need to appreciate how AI services differ from traditional cloud in terms of their cost behaviour.

Why is AI so hard to cost?

All of which begs the question then… how are AI cloud costs so different from “traditional” (i.e. pre the AI hype cycle) costs? 

To explain this “phenomenon”, we need to remind ourselves that AI workloads are nonlinear and more complex in terms of their topographic compute structure. They are nonlinear due to their need to skew to the non-deterministic nature of agentic AI functions… and, further, they are nonlinear due to the diversity of data input size, data batch size and AI model complexity being executed. But, perhaps even more importantly, AI compute functions span a topography that touches large and small language models, high-throughput storage, lightning-fast interconnects (especially in the realm of real-time data processing for real-time AI) and massive memory reserves. This is not CPU-RAM I/O GUI bingo, this is a different beast with a higher grade price list and therefore, a complex FinOps discipline is needed to serve it.  

“The difficulties behind calculating costs in AI start with the fact that generative AI software and development is fundamentally different from traditional software development. Gen-AI is probabilistic, meaning you’ll get different answers at runtime even if you’re asking the same question. If the base layer of what is being charged changes without notice (or without your visibility), the problem becomes – how do you manage your costs if the bill is just as variable and probabilistic as the responses? The margins are inherently dynamic and unpredictable, making forecasting at any realistic timeline (even for 6 months) very difficult and in many cases impossible to do,” said Tracy Woo, principal analyst at Forrester.

Danger Of Discount Deals

Woo, who has a dedicated focus on FinOps as well as practical hands-on experience its tools, says that we can break it down further if we look at areas where we might want to save when using a public cloud generative AI service. In this scenario, she says that we may have some discounted reserved capacity discount (such as what Azure provides with its OpenAI reservations), so if we get a usage spike that bumps requests per minute over the amount an organisation has agreed to pay for, the IT team is going to get rate limited and won’t have enough capacity to handle the new usage demand.

“So, in this predicament, the organisation is then making tradeoffs between cost and throughput. Do you pay for what you most of the time won’t use in order to handle the few times where there is a usage spike, just to guarantee performance and resilience? Also, if you look at models… many charge by tokens, each using a different tokenizer and each with their own variable costs,” highlighted Woo. “None of these models tell us what they are. So if you have the same prompt and the exact same words in the exact same order, the costs won’t be identical. Which means that if you’re switching between models for different capabilities or to save money, it gets very difficult to predict if that’s a good idea cost-wise. Also model costs are calculated with a lot of different variables. These variables are constantly changing under the covers where you don’t get any visibility. You won’t know how to divvy up your costs and what parts are actually driving up the costs because you won’t know the difference between what is an image cost versus what is a text cost.”

FLOPs costs flip-flop

In a world where GPU compute pricing is complex, convoluted and prone to cumbersome exponential convolutions, cloud-driven AI is even more difficult to cost. The cost of cabbage might be fairly stable from one month to the next, but the cost of Floating Point Operations Per Second (FLOPs) is not so immovably anchored. We might even suggest that FLOPs costs often flip-flop, but only so we can use it as a fabulous subheadline here.

In the simplest terms possible, traditional cloud IT can be costed as € x per hour multiplied by data throughput + compute processing to the power of input/output divided by data ingestion. FinOps for AI on the other hand needs to be costed as € x per hour, per GPU cycle, per language model token call workload, multiplied by model complexity (plus all of our previous equations) and then further multiplied by machine learning configuration requirements.

The discussion thus far leads us to question why the FinOps Foundation wants to help develop strategic oversight into these processes and help “learners” (software engineers and businesspeople demanding AI services) to move beyond cost analysis into forecasting, governance and operational control. All of which is designed to ensure AI spend is aligned with business goals and financial accountability.

“Practitioners need to learn to plan, estimate and forecast AI costs. They must learn to apply governance and policy structures to AI service use; use metrics and KPIs to evaluate the business value of AI investments; select tools and practices suited to AI-specific cost management and make informed architectural decisions that balance performance and cost. This means developing an understanding of how to optimise AI workloads for efficiency, licensing and rate structures; being able to integrate AI services into a mature, sustainable FinOps practice… and align FinOps roles and responsibilities with AI priorities across teams.”

How much is that AI doggy?

All of which analyses perhaps give us some big picture takeaways here. We can’t these days just ask how much is that AI doggy in the window? We need to ask a far more reasoned and rounded architectural and mechanically aware question of the AI services intended for deployment and then be able to underpin that request with a specifically trained set of FinOps for AI practices in order to be able to ring up the bill accordingly. 

If we can do all that, then we can keep the C-suite happy in the realm of cost control and – hopefully – we can also actually deliver AI services to end users that are performant and cost-effective. Let’s not kiss on that, let’s just shake on the agreement.

10