API Reference

Base URL: https://api.tokaroo.com

Quickstart

Tokaroo is an OpenAI-compatible API gateway that routes every request to the cheapest model that fits your requirements - automatically. Install the SDK
npm install tokaroo
Make your first request
import { Tokaroo } from "tokaroo";

const client = new Tokaroo({ apiKey: "tok_..." });

const res = await client.chat.completions.create({
  model: "auto",   // Tokaroo picks the route
  messages: [{ role: "user", content: "Hello!" }],
});

console.log(res.choices[0].message.content);
Or swap one line in your existing OpenAI code
const openai = new OpenAI({
  baseURL: "https://api.tokaroo.com/v1",
  apiKey:  process.env.TOKAROO_KEY,
});
// Everything else stays the same

Chat completions

POST /v1/chat/completions OpenAI-compatible. Supports streaming, tools, and vision.
// Non-streaming
const res = await client.chat.completions.create({
  model: "auto",           // or any specific model
  messages: [{ role: "user", content: "Summarize this text: ..." }],
  max_tokens: 500,
  temperature: 0.7,
});

// Streaming
for await (const chunk of client.chat.completions.stream({
  model: "auto",
  messages: [{ role: "user", content: "Tell me a story" }],
})) {
  process.stdout.write(chunk.choices[0]?.delta?.content ?? "");
}
model values
ValueBehaviourPricing profile
"auto"
Tokaroo owns all decisions - cheapest capable model, full cachingvalue-first
"fast"
Speed priority - low latency tierspeed-first
"max"
Maximum capability - frontier models only, no shortcutspremium capability
Response headers
HeaderValue
x-tokaroo-cache
hit or miss
x-tokaroo-cost
USD charged for this request

Embeddings

POST /v1/embeddings
const res = await client.embeddings.create({
  input: "The quick brown fox",
  // model: "text-embedding-3-small"  // optional
});

const vector = res.data[0].embedding; // number[]
Routes to OpenAI text-embedding-3-small (1536 dims) by default, or Google text-embedding-004 as fallback.

Models

GET /v1/models Returns all available models with reference retail pricing.
const { data } = await client._fetch("GET", "/v1/models");
// [{ id: "gpt-4o", owned_by: "openai", pricing: { input_per_1m: 2.5, output_per_1m: 10 } }, ...]
The model pool is updated weekly by Tokaroo's market research job.

Balance & billing

GET /v1/balance
const { balance_usd } = await client.balance.get();
Add credits Redirect the user to a Stripe Checkout session:
const res = await fetch("https://api.tokaroo.com/v1/billing/checkout", {
  method: "POST",
  headers: { Authorization: `Bearer ${key}`, "Content-Type": "application/json" },
  body: JSON.stringify({ amount_usd: 10 }),
});
const { url } = await res.json();
window.location.href = url; // Stripe-hosted checkout
Billing model You pre-load credits and pay per request. No subscriptions, no seat fees.
Tokaroo prices requests dynamically
based on the selected tier,
the market baseline for that request,
and how efficiently the router executed it.

The exact pricing function is internal.

Usage history

GET /v1/usage
const { data } = await client.usage.list({ limit: 50 });
// [{
//   id, tier,              // "instant" | "standard" | "complex"
//   input_tokens,
//   output_tokens,
//   charged_usd,           // what you paid
//   latency_ms,
//   created_at
// }]
Tokaroo never exposes which model or provider handled your request - that's the black box. What you see is what matters: tokens consumed, cost, and latency.

API keys

POST /v1/keys
const { key } = await client.keys.create("production");
// key is shown only once - save it
GET /v1/keys - list keys (no secrets returned) DELETE /v1/keys/:id - revoke a key

Local models (Ollama / vLLM)

Connect your own inference server:
# .env
LOCAL_AI_BASE_URL=http://localhost:11434/v1   # Ollama
LOCAL_AI_API_KEY=local                         # any string
LOCAL_DEFAULT_MODEL=llama3.2
Tokaroo treats local models as near-zero marginal cost and routes to them first when model: "auto". Supported local runtimes - Ollama (/v1 OpenAI-compatible endpoint) - vLLM (--served-model-name flag) - LocalAI - Any OpenAI-compatible server

OpenClaw integration

[OpenClaw](https://openclaw.ai) is the fastest-growing open-source AI agent framework (200k+ GitHub stars). Add Tokaroo as a provider in under a minute. 1. Add to your openclaw.json config
// ~/.openclaw/openclaw.json
{
  models: {
    mode: "merge",
    providers: {
      tokaroo: {
        baseUrl: "https://api.tokaroo.com/v1",
        apiKey: "${TOKAROO_API_KEY}",
        api: "openai-completions",
        models: [
          { id: "auto",               name: "Tokaroo Auto (best price)", contextWindow: 200000, maxTokens: 8192 },
          { id: "gpt-4.1",            name: "GPT-4.1",            cost: { input: 2.00, output: 8.00 } },
          { id: "claude-sonnet-4-6",  name: "Claude Sonnet 4.6",  cost: { input: 3.00, output: 15.00 } },
          { id: "gemini-2.5-flash",   name: "Gemini 2.5 Flash",   cost: { input: 0.30, output: 2.50 } },
        ]
      }
    }
  },
  agents: {
    defaults: { model: "tokaroo/auto" }
  }
}
2. Set your key
export TOKAROO_API_KEY=tok_...
Every OpenClaw LLM call now routes through Tokaroo - cheapest capable model per request, semantic cache, automatic fallback, and cost analytics. With local models (Ollama)
{
  models: {
    mode: "merge",
    providers: {
      tokaroo: {
        baseUrl: "https://api.tokaroo.com/v1",
        apiKey: "${TOKAROO_API_KEY}",
        api: "openai-completions",
        models: [{ id: "auto", name: "Tokaroo Auto" }]
      },
      local: {
        baseUrl: "http://localhost:11434/v1",
        apiKey: "local",
        api: "openai-completions",
        models: [{ id: "llama3.2", name: "Llama 3.2 (local)" }]
      }
    }
  },
  agents: { defaults: { model: "tokaroo/auto" } }
}