Real advice for every tool your agent considers.

AI agents burn tokens retrying flaky, slow, or non-compliant tools. ToolRate delivers objective reliability ratings and smart recommendations from thousands of real agent executions in production.

Know before you call.

832
Tools Rated
100K
Data Points
<8ms
Avg Response
10
LLM Sources
The Problem

Agents burn cycles on failing tools

Stripe times out. LemonSqueezy rejects auth. PayPal finally works. Three attempts, wasted tokens, degraded UX — and no record of why any of it happened.

The Solution

One assessment before every call

ToolRate scores every tool in real time from the collective experience of thousands of production agents. Pick the best option first, fall back intelligently. Every decision is logged with a confidence score attached.

Global Compliance Layer
🌍

Jurisdiction, Made Visible

Exclusive to ToolRate

Every tool in ToolRate comes with clear, real-world jurisdiction data — hosting location, GDPR risk level, and confidence score. So your agents decide based on facts, not assumptions.

  • Reliability first — every tool is scored neutrally, with no geographic penalty.
  • GDPR risk made explicit — clear signals for data-residency compliance when it matters.
  • Region as a choice, never a default — EU, US, and global tools ranked by performance.
  • Your rules, your ranking — pass preferences once via the SDK and ToolRate weighs every recommendation against your policy (e.g. “prefer EU for European data” or “optimize for lowest latency worldwide”).

From San Francisco to Berlin to Singapore, every agent builder gets the same transparent view — with full control to match how you actually build.

Install ToolRate

Beginner-friendly in two commands. Works on every platform — no PEP 668 drama, no virtualenv archaeology.

Alternative
Without uv
python3 -m venv .venv
source .venv/bin/activate
pip install toolrate
Note: If you see a PEP 668 “externally-managed-environment” error with plain pip, that’s because of Homebrew Python. Use one of the methods above instead. For TypeScript / Node 18+: npm install toolrate.

Three lines to get started

from toolrate import ToolRate, guard

client = ToolRate("nf_live_...")

# Check reliability before calling
score = client.assess("https://api.stripe.com/v1/charges")
# => { reliability_score: 94.2, failure_risk: "low", ... }

# Or use guard() for auto-fallback
result = guard(client, "https://api.stripe.com/v1/charges",
               lambda: stripe.Charge.create(...),
               fallbacks=[
                   ("https://api.lemonsqueezy.com/v1/checkouts",
                    lambda: lemon.create_checkout(...)),
               ])
import { ToolRate } from "toolrate";

const client = new ToolRate("nf_live_...");

// Check reliability before calling
const score = await client.assess("https://api.stripe.com/v1/charges");

// Or use guard() for auto-fallback
const result = await client.guard(
  "https://api.stripe.com/v1/charges",
  () => stripe.charges.create({...}),
  { fallbacks: [
    ["https://api.lemonsqueezy.com/v1/checkouts",
     () => lemon.createCheckout({...})],
  ]}
);
# Assess a tool
curl -X POST https://api.toolrate.ai/v1/assess \
  -H "X-Api-Key: nf_live_..." \
  -H "Content-Type: application/json" \
  -d '{"tool_identifier": "https://api.stripe.com/v1/charges"}'

# Report a result
curl -X POST https://api.toolrate.ai/v1/report \
  -H "X-Api-Key: nf_live_..." \
  -H "Content-Type: application/json" \
  -d '{"tool_identifier": "https://api.stripe.com/v1/charges",
    "success": true, "latency_ms": 420}'

Built for production agents

Reliability intelligence for the developers, enterprises, and agents running production AI workloads.

01

Reliability Scoring

Real-world success rates, common failure modes, and recommended mitigations — so agents know exactly how much to trust the tool, and auditors know precisely how the score was calculated.

02

One-Line Guard

result = toolrate.guard(tool="stripe/charges", context=plan)

Zero branching logic. Zero retry boilerplate. Production-ready in one line.

03

Hidden Gems

The tools nobody pitches but production agents quietly rely on — surfaced from real fallback patterns across thousands of sessions and ranked by recovery rate.

04

Fallback Chains

When OpenAI, Stripe, or SendGrid drops, what do production agents actually switch to? Live journey data, ranked by downstream completion rate.

05

Reliability Webhooks

Get paged the moment a tool's reliability crosses a threshold you define. HMAC-signed, per-tool, exponential-backoff delivery — wired into PagerDuty or Slack in seconds.

06

MCP Server

Drop ToolRate into Claude Code, Cursor, or Zed in one line — npx -y @toolrate/mcp-server or uvx toolrate-mcp. Nine native tools — assess, route_llm, report, fallback chains — live on npm and PyPI.

NEW · LLM Router

LLM Router — one call, the right model.

The ToolRate LLM Router picks the optimal model for each task — combining real-time reliability, exact per-token pricing, and latency awareness across all major providers, plus Ollama for local and free.

reliability_first  80 / 20
balanced  55 / 45
cost_first  25 / 75
speed_first  35 / 45 / 20
# Tell ToolRate your constraints, get the right model back.
result = client.assess(
  tool_identifier="https://api.anthropic.com/v1/messages",
  task_complexity="low",
  expected_tokens=500,
  max_price_per_call=0.01,
  budget_strategy="cost_first",
)

# → recommended_model: "claude-haiku-4-5"
# → price_per_call:    $0.00152  (exact per-token math)
# → within_budget:     true
# → reasoning:         "Anthropic Messages scored 91.7/100
#                       for reliability (low risk). Recommended
#                       model: claude-haiku-4-5. Cost: $0.0015/call.
#                       Typical latency ~500ms. Strategy: cost-first;
#                       task complexity: low. Fits within your budget."

What you get per call

  • Exact per-million-token cost at your expected volume
  • Specific model inside each provider (Haiku for low, Opus for reasoning)
  • Human-readable reasoning string — drop it in your logs
  • Over-budget tools flagged, never silently filtered
  • Drop-in LLMRouter class with automatic fallback cascade
anthropic
openai
groq
together
mistral
deepseek
ollama · local · free

Pricing that scales with your agents

Start free. Scale with pay-as-you-go. Flat-rate when you need it. See all plans →

Free
$0 / forever

For testing and side projects

  • 100 assessments / day
  • Public data pool
  • Python & TypeScript SDKs
  • Standard support
Create Free Key
Pro
$29 / month

Flat rate for heavy usage

  • 10,000 assessments / month
  • Priority support
  • Higher rate limits
  • Webhook alerts
Upgrade to Pro

Building an AI platform? Talk to sales about Enterprise →