Logiciel Solutions · AI Engineering

AI Inference Cost Calculator

Token counts and per-request pricing look small until you multiply by daily request volume. Put in your numbers to see what your AI inference actually costs per day, month, and year.

Your numbers

Six inputs. Results update as you type.

Requests per day

Total API calls per day across all users and use cases.

Avg input tokens (per request)

System prompt + user message + any retrieved context. ~750 words ≈ 1,000 tokens.

Avg output tokens (per request)

Generated response length. Output tokens typically cost 3–5× more than input tokens.

Prompt cache hit rate 40%

Percentage of input tokens served from the prompt cache (free or heavily discounted). 0% if caching is not enabled.

Input price ($/1M tokens)

GPT-4o: $2.50 • Claude Sonnet: $3.00 • Gemini 1.5 Pro: $1.25 • Llama 3 (self-hosted): ~$0.20

Output price ($/1M tokens)

GPT-4o: $10.00 • Claude Sonnet: $15.00 • Gemini 1.5 Pro: $5.00 • Llama 3 (self-hosted): ~$0.80

The verdict

Adjust the inputs — results unlock when you submit.

Verdict

Calculating…

Enter your token counts and pricing, then submit your email to see the full cost breakdown.

🔒

Submit your email to unlock your cost breakdown

Cost per request

Per day

Per month

Per year

Input cost / day

Output cost / day

Get the full inference analysis

We’ll send the detailed cost model and have an AI engineer review your token usage patterns, caching strategy, and model selection for your scale.

No spam. We’ll follow up only if it’s relevant.

How the math works. Effective input tokens = avg input tokens × (1 − cache hit rate). Cached tokens are treated as free in this model; if your provider charges a discounted rate for cached tokens, the actual cost will be slightly higher. Cost per request = (effective input tokens × input price + output tokens × output price) ÷ 1,000,000. Daily, monthly (30 days), and annual (365 days) costs scale linearly from there. Output tokens dominate cost for most LLMs because they are generated sequentially. Prompt caching and reducing output verbosity are the two highest-leverage levers to reduce per-request cost. This model assumes flat daily volume; seasonal or growth-driven changes will shift the annual figure.

Logiciel Solutions · logiciel.io