Microsoft cancels its Anthropic licenses following skyrocketing AI token costs. Learn how to optimize your budget without sacrificing performance.


In May 2025, Microsoft made a radical decision: cancel its internal Anthropic licenses. The reason? The switch to token-based billing caused annual budgets to explode within just a few months. What was supposed to cost a predictable amount transformed into an uncontrollable financial hemorrhage.
This situation isn't isolated. It reveals a systemic crisis affecting all companies using generative AI, from large corporations to SMEs. The token-based economic model, opaque and difficult to predict, traps organizations in a spiral of rising costs.
This article gives you the keys to understand this billing mechanism, anticipate its impact on your budget, and most importantly optimize your AI spending without sacrificing your competitiveness or visibility in generative search engines.
A token isn't a word. It's a unit of text that AI processes, generally equivalent to 0.75 words in English and often less in French. The word "optimization" counts as 3 tokens. A 20-word sentence can consume 30 or more tokens.
Each interaction with an AI model consumes input tokens (your query, the provided context) and output tokens (the generated response). Input tokens are billed differently from output tokens, with the latter typically costing 3 to 4 times more.
Here are the costs per 1 million tokens from major players as of May 2025:
These figures appear modest. But a 500-employee company using AI daily can easily consume 500 million tokens per month. Do the math: that represents 50,000 to 200,000 EUR per year depending on the models used.
Three factors transform a controlled budget into a financial sinkhole:
A French industrial mid-market company with 800 employees deployed an AI assistant for customer service in January 2025. Projected budget: 24,000 EUR per year. Actual first quarter bill: 47,000 EUR. Annual projection: nearly 190,000 EUR, eight times the initial budget.
The problem? Each customer conversation included the complete history of previous exchanges as context. A loyal customer with 50 past interactions consumed 50 times more tokens than a new customer for an identical question.
At AISOS, we observe this pattern in 70% of audits of companies that deployed AI without an optimization strategy. The ratio between projected cost and actual cost ranges from 3 to 12 depending on the case.
Analysis of token bills reveals systematically underestimated categories:
Using Claude Opus or GPT-4 for all tasks is a costly mistake. The rule: match model power to the actual complexity of the task.
Recommended distribution:
An automatic routing system can analyze each query and direct it to the appropriate model. This single optimization generates 40 to 60% savings.
Context often represents 80% of tokens consumed. Three techniques to reduce it:
Anthropic and OpenAI now offer prompt caching. Cached tokens cost 75 to 90% less than standard tokens.
Practical applications:
Without governance, costs drift. Put in place:
For certain uses, open-source models deployed locally eliminate token costs:
The initial infrastructure investment pays for itself in 3 to 6 months for high-consumption companies (more than 100 million monthly tokens).
Cutting AI budgets without discernment threatens your visibility. Search engines like ChatGPT, Perplexity, and Google AI Overview favor rich, structured, and regularly updated content. Reducing content production makes you disappear from generative responses.
Focus your AI resources on high-impact content:
The goal: produce less but better, with a content strategy aligned with LLM citation criteria.
Priority actions:
Rapid gains to implement:
For sustainable control:
The Microsoft-Anthropic case marks a turning point. The period when companies deployed AI without counting costs is over. Token costs have become a strategic budget item that requires the same level of rigor as other technology expenses.
But this constraint is also an opportunity. Companies that master their AI costs can invest more intelligently, focusing their resources on high-value use cases. Those who optimize their token consumption without sacrificing their presence in generative search engines gain a decisive advantage over less rigorous competitors.
AISOS audits systematically reveal savings opportunities of 40 to 70% in companies that haven't yet optimized their token usage. The question is no longer whether you should act, but how quickly you can implement these optimizations before your competitors do.
Start by auditing your bills this week. The results will probably surprise you as much as they surprised Microsoft.