An experiment in making visible something that pricing pages hide: the carbon cost of AI inference changes by an order of magnitude depending on where and when you make the call. The same model, same provider, same price — 4 gCO2 per million tokens in the Netherlands, 530 in Singapore. Built a leaderboard and API to track it.

What it does

Leaderboard ranking every model-provider-region combination by carbon intensity, cost, and speed. 85 models across 10 families — Llama, GPT, Claude, Mistral, Gemma, Qwen, DeepSeek, Phi, Falcon, Cohere. Filter by family, filter by provider, sort by what matters. Live carbon intensity charts showing 24-hour curves for each region.

The /api/recommend endpoint answers one question: what’s the lowest-carbon way to run this model right now? Returns the best option, four alternatives, and a human-readable insight explaining the carbon savings.

How it works

Carbon per million tokens = GPU energy per token × grid carbon intensity. GPU energy comes from standardised benchmarks — the AI Energy Score project on HuggingFace. Grid carbon intensity comes from Electricity Maps, updated daily. Provider pricing from AWS Bedrock, GCP Vertex AI, Azure OpenAI, Together, Groq, Fireworks.

The interesting finding: carbon-aware routing might be mostly free. The biggest carbon differences come from region choice, not provider choice. And providers typically charge the same regardless of region.

A disconnect worth noting

Groq charges $0.05 per million input tokens for Llama 3 8B. AWS charges $0.30. But they’re on similar grids. Meanwhile the difference between Netherlands (129 gCO2/kWh) and Singapore (530 gCO2/kWh) is 4x — and the price is the same. The cost optimisation and the carbon optimisation are almost completely decoupled.

Stack: Next.js 14 · Railway Postgres · Vercel · TypeScript · Electricity Maps API · Liveline charts 85 models · 6 providers · 9 regions · Updated daily carbonbench.ai · GitHub