Start free.
Upgrade when it matters.
The open-source layer gets you running in minutes. The Cloud layer makes it smart.
Open source
Free
Self-hosted - BYO provider keys
- -Drop-in OpenAI SDK replacement
- -Waterfall routing - free models first
- -Ollama and vLLM (local models) support
- -10+ provider adapters built in
- -Weekly model pool auto-discovery
- -Self-hosted, BYO API keys
- -MIT licensed - use it anywhere
Compare
| Feature | OSS | Cloud |
|---|---|---|
| OpenAI-compatible API | yes | yes |
| Groq, Gemini, Ollama support | yes | yes |
| Waterfall / fallback routing | yes | yes |
| BYO provider API keys | yes | - |
| Smart scoring engine | - | yes |
| Semantic response cache | - | yes |
| Console + cost analytics | - | yes |
| Weekly model auto-discovery | yes | yes |
| Hosted - no infra to run | - | yes |
Questions
What is the difference between OSS and Cloud?
The OSS version is a self-hosted proxy you run yourself using your own provider API keys. It handles routing, fallback, and provider abstraction. The Cloud version adds our optimization engine - the part that makes real cost decisions, caches intelligently, and learns over time. That part stays closed.
What does 'BYO keys' mean?
Bring Your Own Keys. With the OSS version you add your own OpenAI, Anthropic, Groq, Gemini etc. API keys to your environment. Tokaroo routes between them. You pay the providers directly - we do not sit in the middle of the money.
How does Cloud billing work?
Cloud prices each request dynamically based on the tier you choose, the market baseline for that request, and how efficiently Tokaroo routed it. You pre-load credits, and each request deducts from your balance. No monthly seat fees - you only pay for what you use.
Can I start on OSS and switch to Cloud?
Yes. The API is identical - it is one line to point baseURL at our servers instead of yours. Your code does not change at all.
What's the 'optimization engine'?
It is the part we do not open-source. It is the scoring model that decides which provider to use per-request based on task type, cost, latency, and recent performance. Over time it gets smarter. It is also the semantic cache - which checks whether a semantically similar question was already answered, and returns the cached result instead of paying for a new generation.
Is the OSS version useful on its own?
Absolutely. Groq, Gemini Flash, and local models are free - and Tokaroo OSS will use them first. For many workloads this alone cuts AI costs dramatically. Cloud adds intelligence on top of that foundation.