Real advice for every tool your agent considers.
AI agents burn tokens retrying flaky, slow, or non-compliant tools. ToolRate delivers objective reliability ratings and smart recommendations from thousands of real agent executions in production.
Know before you call.
Agents burn cycles on failing tools
Stripe times out. LemonSqueezy rejects auth. PayPal finally works. Three attempts, wasted tokens, degraded UX — and no record of why any of it happened.
One assessment before every call
ToolRate scores every tool in real time from the collective experience of thousands of production agents. Pick the best option first, fall back intelligently. Every decision is logged with a confidence score attached.
Jurisdiction, Made Visible
Exclusive to ToolRateEvery tool in ToolRate comes with clear, real-world jurisdiction data — hosting location, GDPR risk level, and confidence score. So your agents decide based on facts, not assumptions.
- Reliability first — every tool is scored neutrally, with no geographic penalty.
- GDPR risk made explicit — clear signals for data-residency compliance when it matters.
- Region as a choice, never a default — EU, US, and global tools ranked by performance.
- Your rules, your ranking — pass
preferencesonce via the SDK and ToolRate weighs every recommendation against your policy (e.g. “prefer EU for European data” or “optimize for lowest latency worldwide”).
From San Francisco to Berlin to Singapore, every agent builder gets the same transparent view — with full control to match how you actually build.
Install ToolRate
Beginner-friendly in two commands. Works on every platform — no PEP 668 drama, no virtualenv archaeology.
# Install uv (one-time) curl -LsSf https://astral.sh/uv/install.sh | sh # Add ToolRate to your project uv add toolrate
python3 -m venv .venv source .venv/bin/activate pip install toolrate
PEP 668 “externally-managed-environment” error with plain pip, that’s because of Homebrew Python. Use one of the methods above instead. For TypeScript / Node 18+: npm install toolrate.Three lines to get started
from toolrate import ToolRate, guard client = ToolRate("nf_live_...") # Check reliability before calling score = client.assess("https://api.stripe.com/v1/charges") # => { reliability_score: 94.2, failure_risk: "low", ... } # Or use guard() for auto-fallback result = guard(client, "https://api.stripe.com/v1/charges", lambda: stripe.Charge.create(...), fallbacks=[ ("https://api.lemonsqueezy.com/v1/checkouts", lambda: lemon.create_checkout(...)), ])
import { ToolRate } from "toolrate"; const client = new ToolRate("nf_live_..."); // Check reliability before calling const score = await client.assess("https://api.stripe.com/v1/charges"); // Or use guard() for auto-fallback const result = await client.guard( "https://api.stripe.com/v1/charges", () => stripe.charges.create({...}), { fallbacks: [ ["https://api.lemonsqueezy.com/v1/checkouts", () => lemon.createCheckout({...})], ]} );
# Assess a tool curl -X POST https://api.toolrate.ai/v1/assess \ -H "X-Api-Key: nf_live_..." \ -H "Content-Type: application/json" \ -d '{"tool_identifier": "https://api.stripe.com/v1/charges"}' # Report a result curl -X POST https://api.toolrate.ai/v1/report \ -H "X-Api-Key: nf_live_..." \ -H "Content-Type: application/json" \ -d '{"tool_identifier": "https://api.stripe.com/v1/charges", "success": true, "latency_ms": 420}'
Built for production agents
Reliability intelligence for the developers, enterprises, and agents running production AI workloads.
Reliability Scoring
Real-world success rates, common failure modes, and recommended mitigations — so agents know exactly how much to trust the tool, and auditors know precisely how the score was calculated.
One-Line Guard
result = toolrate.guard(tool="stripe/charges", context=plan)
Zero branching logic. Zero retry boilerplate. Production-ready in one line.
Hidden Gems
The tools nobody pitches but production agents quietly rely on — surfaced from real fallback patterns across thousands of sessions and ranked by recovery rate.
Fallback Chains
When OpenAI, Stripe, or SendGrid drops, what do production agents actually switch to? Live journey data, ranked by downstream completion rate.
Reliability Webhooks
Get paged the moment a tool's reliability crosses a threshold you define. HMAC-signed, per-tool, exponential-backoff delivery — wired into PagerDuty or Slack in seconds.
MCP Server
Drop ToolRate into Claude Code, Cursor, or Zed in one line — npx -y @toolrate/mcp-server or uvx toolrate-mcp. Nine native tools — assess, route_llm, report, fallback chains — live on npm and PyPI.
LLM Router — one call, the right model.
The ToolRate LLM Router picks the optimal model for each task — combining real-time reliability, exact per-token pricing, and latency awareness across all major providers, plus Ollama for local and free.
# Tell ToolRate your constraints, get the right model back. result = client.assess( tool_identifier="https://api.anthropic.com/v1/messages", task_complexity="low", expected_tokens=500, max_price_per_call=0.01, budget_strategy="cost_first", ) # → recommended_model: "claude-haiku-4-5" # → price_per_call: $0.00152 (exact per-token math) # → within_budget: true # → reasoning: "Anthropic Messages scored 91.7/100 # for reliability (low risk). Recommended # model: claude-haiku-4-5. Cost: $0.0015/call. # Typical latency ~500ms. Strategy: cost-first; # task complexity: low. Fits within your budget."
What you get per call
- Exact per-million-token cost at your expected volume
- Specific model inside each provider (Haiku for low, Opus for reasoning)
- Human-readable
reasoningstring — drop it in your logs - Over-budget tools flagged, never silently filtered
- Drop-in
LLMRouterclass with automatic fallback cascade
Pricing that scales with your agents
Start free. Scale with pay-as-you-go. Flat-rate when you need it. See all plans →
For testing and side projects
- 100 assessments / day
- Public data pool
- Python & TypeScript SDKs
- Standard support
Best for autonomous agents and bots
- First 100 / day free
- $0.008 per assessment after
- No monthly commitment
- Webhook alerts included
Flat rate for heavy usage
- 10,000 assessments / month
- Priority support
- Higher rate limits
- Webhook alerts
Building an AI platform? Talk to sales about Enterprise →