Methodology

How we review AI tools at ToolDirectory.AI

By Jake Snider

Lead AI Reviewer · 2026-05-17 · 4 min read

How we review AI tools at ToolDirectory.AI

Most AI directories list every tool at five stars. We don't. This post explains how we actually rate the 2,000+ products we cover, what makes us de-rate or unpublish a tool, and why the Graveyard page exists.

If you're a buyer using us to make a decision, this is what's behind the rating you see on every page.

The principle: every rating has evidence behind it

Every star rating in our directory should be defensible in one sentence. Concretely, we want to be able to point to one of:

G2 review count and average (the standard for B2B SaaS credibility)
Funding round and stage (Crunchbase-verifiable)
Public-company status (NASDAQ / NYSE / LSE ticker)
Named enterprise customers (logos on the website, case studies, press)
Trustpilot or App Store score (for consumer tools)
Gartner Magic Quadrant position (for enterprise categories)
News-confirmed acquisition or shutdown (for lifecycle calls)

We've explicitly run this evidence pass on roughly half the catalog and we keep it as the standing bar for new additions. If we can't cite something, the rating doesn't go above the floor.

The five things we weigh

When we pick the Top 100, or when we write a Best AI Collection, we score against these criteria in roughly this order:

Real production deployments at named teams. Marketing copy is cheap. Customer logos and case studies aren't. We weight this heaviest.
Independent benchmarks. Where they exist (LMSYS for chat models, SWE-bench for coding agents, etc.), we cite them.
Funding and customer signals. A unicorn round closed last quarter is meaningful. So is being a public company. So is "30,000 paying customers" if it's verifiable.
Pricing transparency. Tools that publish actual prices rank above tools that hide behind "Contact sales" — for buyers our directory is meant to help.
2026 currency. A product that hasn't shipped in 18 months gets demoted, even if the brand is famous.

The tier system

Inside the Top 100, we group tools into four editorial tiers:

Flagship — the household names. Every category has one or two. These are the "if you only try one, try this" picks. ChatGPT, Claude, GitHub Copilot, Cursor, Midjourney, Perplexity, Gemini sit here.
Leader — established, well-funded, broad customer base. NotebookLM, ElevenLabs, Suno, Runway, Glean, Gamma, Synthesia, Sierra, Otter, Notion AI.
Rising — strong trajectory, real revenue, not yet at leader scale. Spara, Fireflies, Clay, Harvey, AlphaSense, Sourcegraph, Writer, Abridge.
Gem — defensible niche pick, often under-covered.

Across the broader catalog (the 2,000-plus tools), we use an internal tier system anchored on the evidence above — public companies and unicorns at the top, real-but-niche products in the middle, "looks like a real product but no signal" in the de-rate zone, and confirmed scams or fraud at the bottom.

What gets a tool de-rated or unpublished

We move ratings down when reality moves down. Recent examples:

Builder.ai — unpublished after the bankruptcy plus fraud reporting in 2025
11x AI — de-rated after TechCrunch reporting on fake customer claims
Stealthwriter — de-rated after Trustpilot 2.x with billing complaints
Civitai — de-rated after payment processors dropped them
Rabbit R1 — de-rated after the 95% device-abandonment data became public
Generic SEO names — "Quantum AI Crypto Trading" and similar pump-style listings sit at the floor or get unpublished

We don't enjoy doing this. We do it because a directory that won't say "this got worse" can't credibly say "this got better."

Lifecycle: tracking what happened

When a tool shuts down or gets acquired, we don't delete the entry. We move it to the Graveyard with the news date, the cause, and the acquirer (if any). Recent moves:

Streamlit → Snowflake $800M (2022); now operating inside Snowflake
Reclaim → Dropbox (2024)
Adept → Amazon acqui-hire (2024)
MarketMuse → Siteimprove (2024)
Wonder Dynamics → Autodesk (2024)
Limitless → Meta (2025)
Moveworks → ServiceNow $2.85B (2025)
Weights & Biases → CoreWeave (2025)
Crowdfire → shutdown (2025)
Sora original → replaced by Sora 2 (2025)

For acquired tools that are still operating under their new parent, we keep the rating and add an honest "still operating" epitaph. For shutdowns, we mark them deceased with the date and cause.

This matters because our Graveyard is the bit of the directory we get the most credibility from. Buyers can see we'll tell them when something dies.

The Bayesian floor

A small number of products in our catalog have only a handful of reviews. We use a Bayesian-shrunk average to display ratings on those — a tool with 3 five-star reviews gets pulled toward the global mean rather than displayed as a misleading 5.0. This is the kind of fix that buyers don't notice, but it's the difference between a directory that looks honest and one that actually is.

How to push back

If you think we got a rating wrong — too high, too low, or based on stale information — we want to know. Reply to the newsletter or email the editorial team. We update when reality does, and we cite our sources when we change something.

— The ToolDirectory.AI editorial team

Newsletter

Get the weekly roundup.

One email each Friday. The week's additions, the week's deaths, and one thing we changed our mind about. No drip sequences, no AI-generated filler.

Subscribe to the newsletter →