Logiciel Solutions · AI Engineering
Token counts and per-request pricing look small until you multiply by daily request volume. Put in your numbers to see what your AI inference actually costs per day, month, and year.
Your numbers
Six inputs. Results update as you type.
Total API calls per day across all users and use cases.
System prompt + user message + any retrieved context. ~750 words ≈ 1,000 tokens.
Generated response length. Output tokens typically cost 3–5× more than input tokens.
Percentage of input tokens served from the prompt cache (free or heavily discounted). 0% if caching is not enabled.
GPT-4o: $2.50 • Claude Sonnet: $3.00 • Gemini 1.5 Pro: $1.25 • Llama 3 (self-hosted): ~$0.20
GPT-4o: $10.00 • Claude Sonnet: $15.00 • Gemini 1.5 Pro: $5.00 • Llama 3 (self-hosted): ~$0.80
The verdict
Adjust the inputs — results unlock when you submit.
We’ll send the detailed cost model and have an AI engineer review your token usage patterns, caching strategy, and model selection for your scale.
No spam. We’ll follow up only if it’s relevant.
How the math works. Effective input tokens = avg input tokens × (1 − cache hit rate). Cached tokens are treated as free in this model; if your provider charges a discounted rate for cached tokens, the actual cost will be slightly higher. Cost per request = (effective input tokens × input price + output tokens × output price) ÷ 1,000,000. Daily, monthly (30 days), and annual (365 days) costs scale linearly from there. Output tokens dominate cost for most LLMs because they are generated sequentially. Prompt caching and reducing output verbosity are the two highest-leverage levers to reduce per-request cost. This model assumes flat daily volume; seasonal or growth-driven changes will shift the annual figure.
Logiciel Solutions · logiciel.io