Enterprise AI budget: controlling token costs in 2025

When Microsoft Loses Control of Its AI Bill

In May 2025, news that shook the enterprise artificial intelligence world: Microsoft cancelled its internal Anthropic licenses. The reason? An uncontrolled explosion of costs linked to token-based billing. What was supposed to be an annual budget was consumed in just a few months.

If a tech giant like Microsoft, with dedicated teams and considerable internal expertise, can be caught off guard by escalating AI costs, what about French and Belgian SMEs and mid-market companies? This situation reveals a structural problem that many executives discover too late: the usage-based billing model for LLMs is a budgetary time bomb.

This article gives you the keys to understanding what happened, anticipating the risks for your business, and implementing an AI cost control strategy without sacrificing your visibility in generative search engines.

Understanding the Token-Based Billing Model

What is a token and why is it expensive?

A token represents approximately 4 characters in English, or about 0.75 words. In French, this ratio is often less favorable due to accents and grammatical structure. Each interaction with an LLM like Claude (Anthropic), GPT-4 (OpenAI), or Gemini (Google) consumes tokens on input (your query) and output (the generated response).

Pricing varies considerably across models:

GPT-4 Turbo: approximately $10 per million input tokens, $30 for output
Claude 3 Opus: $15 per million input tokens, $75 for output
Claude 3.5 Sonnet: $3 per million input tokens, $15 for output
Gemini 1.5 Pro: $3.50 per million input tokens, $10.50 for output

These figures seem modest. But a single complex conversation can consume 10,000 to 50,000 tokens. Multiply this by hundreds of employees using these tools daily, and the amounts become astronomical.

The multiplier effect nobody anticipates

The Microsoft case illustrates a phenomenon we regularly observe at AISOS during our audits: actual consumption systematically exceeds initial projections by 300 to 800%. Why?

Users reformulate their queries multiple times to achieve satisfaction
Long contexts (attached documents, conversation histories) multiply tokens
Automated integrations generate invisible but costly API calls
The absence of per-user limits encourages extensive usage

Five Strategies to Control Your Enterprise AI Budget

1. Audit and map your current usage

Before reducing costs, you need to understand them. Identify precisely:

Which departments use AI tools and which ones
The volume of queries per day, week, and month
High-value use cases versus accessory usage
Duplicates (multiple teams paying for similar tools)

This mapping often reveals surprises. A 200-person company may discover it's simultaneously paying for ChatGPT Team licenses, OpenAI API access, Perplexity Pro subscriptions, and Claude credits, with no coordination between teams.

2. Implement tiered governance

Not all use cases require the most powerful models. Structure your AI access in three levels:

Tier 1 (80% of usage): economical models like GPT-3.5 Turbo, Claude 3 Haiku, or Gemini 1.0 Pro. Cost divided by 10 to 20.
Tier 2 (15% of usage): intermediate models for complex but non-critical tasks.
Tier 3 (5% of usage): premium models reserved for validated strategic projects.

This tiered approach can reduce your bill by 60 to 75% with no perceptible impact on productivity.

3. Optimize your prompts to reduce consumption

A well-designed prompt consumes fewer tokens and produces better results. Key principles:

Be precise from the first query rather than iterating
Limit context to what's strictly necessary
Request concise responses when the situation allows
Use standardized templates for recurring tasks

Training your teams in prompt engineering represents a minimal investment with immediate cost returns.

4. Set up limits and alerts

Professional platforms allow you to configure:

Spending caps per user, team, or project
Alerts at 50%, 75%, and 90% of allocated budget
Weekly consumption reports
Automatic blocking beyond a critical threshold

Microsoft would have avoided its Anthropic mishap by activating these safeguards. Many companies neglect these features despite their availability.

5. Evaluate open source and local alternatives

For certain internal use cases, open source models like Llama 3, Mistral, or Qwen may suffice. Advantages:

Marginal cost nearly zero after initial investment
Data remains within your infrastructure
Customization possible according to your business needs

The trade-off: generally inferior performance compared to cutting-edge commercial models and technical skills required for deployment.

AI Budget and GEO Visibility: Two Compatible Objectives

The trap of going all-free for visibility

Some companies, frightened by costs, give up on AI investment. This is a strategic mistake. In 2025, nearly 40% of B2B searches go through conversational interfaces like ChatGPT, Perplexity, or Google AI Overview.

Not appearing in these generative engines' responses means becoming invisible to a growing portion of your prospects. The challenge isn't to spend less on AI, but to spend intelligently.

Prioritize high-impact investments

To maximize your AI ROI, focus your resources on:

Optimizing your content for LLMs: structuring your web pages to be cited by generative engines
Creating proprietary data: LLMs value original and expert sources
Automating repetitive tasks: where ROI is measurable and rapid

AISOS audits regularly reveal that companies invest in generic AI tools when their priority should be optimizing their presence in generative engine responses.

Case Study: A Mid-Market Industrial Company Masters Its Costs

Take the example of a 450-employee manufacturing company in the Lyon region. In January 2025, the company accumulates:

85 ChatGPT Team licenses at EUR 25/month: EUR 2,125/month
OpenAI API credits for the R&D department: EUR 3,200/month on average
15 Perplexity Pro subscriptions: EUR 300/month
A pilot project with Claude API: EUR 1,800/month

Total: approximately EUR 7,400/month, or EUR 89,000 annually, with a 15% monthly growth trend.

Actions implemented

After a three-week audit:

Reduction of ChatGPT Team licenses from 85 to 40 (truly active users)
Migration of 70% of API calls to GPT-3.5 Turbo
Centralization of access via a single platform with quotas
Training of 20 power users in prompt optimization
Abandonment of the Claude pilot in favor of targeted use of OpenAI premium tier

Results at 6 months

New monthly budget: EUR 3,100, a 58% reduction. Measured productivity hasn't decreased. The heaviest user teams even report improvement thanks to the training received.

Mistakes to Absolutely Avoid

Signing annual commitments without visibility

Enterprise contracts with multi-year commitments may seem attractive. But in a market where prices and technologies evolve every quarter, they quickly become burdens. Prioritize flexibility, even if it means paying slightly more short-term.

Ignoring token billing in favor of unlimited plans

Unlimited offers often hide limitations: query quotas, restrictions on premium models, response length throttling. Read the terms carefully.

Centralizing without consulting

Imposing a single tool across the enterprise without consulting teams generates shadow IT. Employees find workarounds, often more expensive and less secure. Involve key users in the decision.

Neglecting security and compliance dimensions

GDPR and sector requirements impose constraints on data processing by LLMs. A compliance incident costs infinitely more than savings realized on a subscription.

Building a Sustainable AI Policy for 2025-2026

Microsoft's mishap with Anthropic isn't an isolated case. It foreshadows what many companies will experience in coming months if they don't anticipate.

A sustainable AI policy rests on four pillars:

Visibility: knowing precisely who uses what and how much it costs
Governance: defining clear access and usage rules
Optimization: choosing the right tool for the right use at the right price
Measurement: regularly evaluating ROI and adjusting

Companies that master these four dimensions transform AI from an unpredictable cost center into a measurable competitive advantage.

Conclusion: Act Before You Suffer

AI budget explosion isn't inevitable. The Microsoft-Anthropic case simply demonstrates that even the biggest players can be caught off guard by an economic model that's still young and poorly understood.

For SME and mid-market executives, the lesson is clear: implement AI usage governance now. Audit, structure, train, measure. The tools and methods exist.

The challenge isn't to slow AI adoption in your company. It's to accelerate it in a controlled manner, maximizing return on every euro invested, including on your visibility in generative engines where an increasing share of your customer acquisition will be played out.

Want to assess your exposure to AI budget drift risks and optimize your presence in LLMs? Contact AISOS for a personalized diagnosis.

Microsoft cancels its Anthropic licenses: how businesses can avoid their AI budget spiraling out of control