API Reference
Base URL: https://api.tokaroo.com
Quickstart
Tokaroo is an OpenAI-compatible API gateway that routes every request to the cheapest model that fits your requirements - automatically.
Install the SDK
npm install tokaroo
Make your first request
import { Tokaroo } from "tokaroo";
const client = new Tokaroo({ apiKey: "tok_..." });
const res = await client.chat.completions.create({
model: "auto", // Tokaroo picks the route
messages: [{ role: "user", content: "Hello!" }],
});
console.log(res.choices[0].message.content);
Or swap one line in your existing OpenAI code
const openai = new OpenAI({
baseURL: "https://api.tokaroo.com/v1",
apiKey: process.env.TOKAROO_KEY,
});
// Everything else stays the sameChat completions
POST /v1/chat/completions
OpenAI-compatible. Supports streaming, tools, and vision.
Response headers
// Non-streaming
const res = await client.chat.completions.create({
model: "auto", // or any specific model
messages: [{ role: "user", content: "Summarize this text: ..." }],
max_tokens: 500,
temperature: 0.7,
});
// Streaming
for await (const chunk of client.chat.completions.stream({
model: "auto",
messages: [{ role: "user", content: "Tell me a story" }],
})) {
process.stdout.write(chunk.choices[0]?.delta?.content ?? "");
}
model values| Value | Behaviour | Pricing profile |
"auto"| Tokaroo owns all decisions - cheapest capable model, full caching | value-first |
"fast"| Speed priority - low latency tier | speed-first |
"max"| Maximum capability - frontier models only, no shortcuts | premium capability |
| Header | Value |
x-tokaroo-cachehit or missx-tokaroo-cost| USD charged for this request |
Embeddings
POST /v1/embeddings
const res = await client.embeddings.create({
input: "The quick brown fox",
// model: "text-embedding-3-small" // optional
});
const vector = res.data[0].embedding; // number[]
Routes to OpenAI text-embedding-3-small (1536 dims) by default, or Google text-embedding-004 as fallback.Models
GET /v1/models
Returns all available models with reference retail pricing.
const { data } = await client._fetch("GET", "/v1/models");
// [{ id: "gpt-4o", owned_by: "openai", pricing: { input_per_1m: 2.5, output_per_1m: 10 } }, ...]
The model pool is updated weekly by Tokaroo's market research job.Balance & billing
GET /v1/balance
const { balance_usd } = await client.balance.get();
Add credits
Redirect the user to a Stripe Checkout session:
const res = await fetch("https://api.tokaroo.com/v1/billing/checkout", {
method: "POST",
headers: { Authorization: `Bearer ${key}`, "Content-Type": "application/json" },
body: JSON.stringify({ amount_usd: 10 }),
});
const { url } = await res.json();
window.location.href = url; // Stripe-hosted checkout
Billing model
You pre-load credits and pay per request. No subscriptions, no seat fees.
Tokaroo prices requests dynamically
based on the selected tier,
the market baseline for that request,
and how efficiently the router executed it.
The exact pricing function is internal.Usage history
GET /v1/usage
const { data } = await client.usage.list({ limit: 50 });
// [{
// id, tier, // "instant" | "standard" | "complex"
// input_tokens,
// output_tokens,
// charged_usd, // what you paid
// latency_ms,
// created_at
// }]
Tokaroo never exposes which model or provider handled your request - that's the black box.
What you see is what matters: tokens consumed, cost, and latency.API keys
POST /v1/keys
const { key } = await client.keys.create("production");
// key is shown only once - save it
GET /v1/keys - list keys (no secrets returned)
DELETE /v1/keys/:id - revoke a keyLocal models (Ollama / vLLM)
Connect your own inference server:
# .env
LOCAL_AI_BASE_URL=http://localhost:11434/v1 # Ollama
LOCAL_AI_API_KEY=local # any string
LOCAL_DEFAULT_MODEL=llama3.2
Tokaroo treats local models as near-zero marginal cost and routes to them first when model: "auto".
Supported local runtimes
- Ollama (/v1 OpenAI-compatible endpoint)
- vLLM (--served-model-name flag)
- LocalAI
- Any OpenAI-compatible serverOpenClaw integration
[OpenClaw](https://openclaw.ai) is the fastest-growing open-source AI agent framework (200k+ GitHub stars).
Add Tokaroo as a provider in under a minute.
1. Add to your openclaw.json config
// ~/.openclaw/openclaw.json
{
models: {
mode: "merge",
providers: {
tokaroo: {
baseUrl: "https://api.tokaroo.com/v1",
apiKey: "${TOKAROO_API_KEY}",
api: "openai-completions",
models: [
{ id: "auto", name: "Tokaroo Auto (best price)", contextWindow: 200000, maxTokens: 8192 },
{ id: "gpt-4.1", name: "GPT-4.1", cost: { input: 2.00, output: 8.00 } },
{ id: "claude-sonnet-4-6", name: "Claude Sonnet 4.6", cost: { input: 3.00, output: 15.00 } },
{ id: "gemini-2.5-flash", name: "Gemini 2.5 Flash", cost: { input: 0.30, output: 2.50 } },
]
}
}
},
agents: {
defaults: { model: "tokaroo/auto" }
}
}
2. Set your key
export TOKAROO_API_KEY=tok_...
Every OpenClaw LLM call now routes through Tokaroo - cheapest capable model per request,
semantic cache, automatic fallback, and cost analytics.
With local models (Ollama)
{
models: {
mode: "merge",
providers: {
tokaroo: {
baseUrl: "https://api.tokaroo.com/v1",
apiKey: "${TOKAROO_API_KEY}",
api: "openai-completions",
models: [{ id: "auto", name: "Tokaroo Auto" }]
},
local: {
baseUrl: "http://localhost:11434/v1",
apiKey: "local",
api: "openai-completions",
models: [{ id: "llama3.2", name: "Llama 3.2 (local)" }]
}
}
},
agents: { defaults: { model: "tokaroo/auto" } }
}