Editorial matchup · June 2026

Cerebras vs Etched: Which AI Tool Is Better in 2026?

Side-by-side comparison of Cerebras and Etched — pricing, features, and use cases. Reviewed by our editorial team in Jun 2026.

Use-case score 20Updated Jun 2026
Etched logo

Etched

AI Infrastructure
4.5Paid80
The verdictUse-case score · 20

Cerebras's WSE-3 is the largest AI chip ever built, measuring 46,225 mm² and containing 4 trillion transistors with 900,000 AI-optimized cores delivering 125 petaflops of compute, while Etched builds Sohu—a transformer-specific ASIC chip betting that fixed-function silicon beats general-purpose GPUs for transformer inference.

These represent fundamentally opposed architectural philosophies in the race to dethrone GPU dominance. Cerebras became a public company on May 14, 2026, via the largest U.S. technology initial public offering since 2019, signaling investor confidence in wafer-scale deployment at scale.

Etched has raised close to one billion in total funding at a five billion valuation but remains private and in early customer ramp, with the chip not publicly available for purchase or rental as of April 2026.

Cerebras targets broad inference workloads—language models, reasoning tasks, scientific computing—across its cloud platform and sold systems, leveraging routinely ranking among the fastest inference providers in the world with over 2,200 tokens per second when running GPT-OSS 120B High.

Etched's Sohu is narrower by design: one 8-chip Sohu server delivers 500,000 tokens/sec on Llama 70B, a jaw-dropping per-chip figure if true, but Sohu cannot run CNNs, RNNs, LSTMs, deep learning recommendation models, or AlphaFold 2—limiting it to transformer inference only.

Sohu claims 90%+ FLOPS use where H100 transformers typically hit 30-40% of peak FLOPS; by eliminating instruction overhead, Sohu can theoretically deliver 2-3x more useful compute from the same transistors.

This is the core tension: Cerebras trades specialization for flexibility and proven production scale across multiple model families; Etched bets everything on transformers remaining the dominant architecture long enough to amortize its bet.

For organizations running diverse workloads and needing immediate deployment, Cerebras is operationally ready today.

For enterprises serving pure transformer-based services at hyperscale with tolerance for single-architecture lock-in and longer timelines, Sohu's theoretical cost-per-token advantage could prove significant—if it ships, works at claimed performance, and transformers do not cede ground to emerging architectures.

T
ToolDirectory.AIEditorial Team

Large-scale LLM inference now (2026)

Cerebras

Cerebras WSE-3 is publicly deployed and serving tokens at scale through its cloud platform, partnerships with Meta and Mistral, and 40+ million tokens/sec aggregate capacity. Sohu is not yet in production.

Transformer-only shops maximizing cost-per-token

Etched

Sohu's hard-coded transformer architecture claims 20x H100 throughput on Llama 70B, but only for autoregressive inference. Not a universal accelerator.

Multi-model production workloads (training or diverse inference)

Cerebras

Cerebras supports PyTorch, multiple model families (Llama, Qwen, DeepSeek), vision transformers, and mixture-of-experts. Etched cannot run diffusion, recommendation models, or non-transformer architectures.

Section 01

Best for what

4 use cases scored. Cerebras wins 2, Etched wins 0.

  • Pricing value

    Neither tool publishes a starting price.

    Even
  • Free tier

    Neither tool offers a free tier or trial.

    Even
  • User ratings

    Cerebras averages 4.9 / 5 vs 4.5 / 5 on the other side.

    Cerebras
  • Review volume

    Cerebras has 211 ratings vs 90 on the other.

    Cerebras
Section 02

Pros & cons

Where each tool earns its rating — and where it falls short.

Cerebras logo

Cerebras

AI Infrastructure
Pros
  • Wafer-scale architecture eliminating GPU cluster interconnect bottlenecks. Single unified compute surface reduces latency and memory movement penalties compared to hundreds of stitched GPUs.
  • Public cloud platform with 40+ million tokens/sec aggregate capacity and established deployments: Mistral, Perplexity, Mayo Clinic, US Department of Defense, and Amazon Web Services.
  • Broad model ecosystem support: PyTorch 2.0 native, native sparsity acceleration up to 8x, vision transformers, mixture-of-experts, diffusion, and quantization—not limited to transformer inference.
  • Proven token throughput at scale: Llama 4 Maverick 400B at 2,500 tokens/second per user, demonstrated nearly 2x faster than NVIDIA DGX B200 Blackwell on same model.
  • Production maturity with IPO completed May 2026 and institutional backing; hardware shipping to major cloud providers with years of field validation.
Cons
  • High capital intensity and manufacturing costs reflected in public company valuation; expensive to deploy at scale versus commodity GPU alternatives.
  • Wafer-scale yield challenges historically limited adoption. While Cerebras solved defect tolerance, scaling manufacturing remains non-trivial compared to standard GPU lines.
  • High on-chip SRAM (44GB) constrains multi-user isolation and time-sharing; not optimized for low-batch, low-latency consumer-facing inference where GPUs dominate.
  • Specialized software stack required; fewer third-party libraries and tools compared to CUDA ecosystem, raising integration and hiring friction.
  • Limited training workload optimization compared to inference; users seeking training-heavy pipelines may find GPU clusters more pragmatic.
Section 03

At a glance

Every spec on one page. Live-pulled from each tool's detail page.

  • Pricing
    Inquire
    Paid
  • Pricing model
    Paid
    Paid
  • Free tier
    No
    No
  • Free trial
    No
    No
  • Rating
    4.9 / 5 (211 ratings)
    4.5 / 5 (90 ratings)
  • Saves
    470
    80
  • Categories
    AI Infrastructure
    AI Infrastructure, Engineering & Simulation
  • Verified
    Yes
    No
  • Top 100 tier
  • Last updated
    Jun 2026
    May 2026
Frequently asked

Cerebras vs Etched FAQs

Quick answers to the questions readers ask before picking between these two.

Can Etched Sohu run non-transformer models like diffusion, CNNs, or mixture-of-experts?

No. Sohu is hard-coded for transformer inference only and cannot run diffusion models, convolutional networks, RNNs, LSTMs, deep learning recommendation models, or any non-transformer architecture. Etched made this trade-off deliberately to maximize transformer throughput.

Is Cerebras WSE-3 available to rent or buy today?

Yes, via two channels: Cerebras Cloud (serverless consumption model with shared or dedicated tenant options) and CS-3 system purchase for on-premises or colocation deployment. Public cloud access is live as of May 2026.

How much faster is Etched Sohu than Cerebras on transformer inference?

Etched claims 500K tokens/sec on Llama 70B for an 8-chip server; Cerebras claims 2,500 tokens/sec per user on Llama 4 Maverick 400B. Direct comparison is difficult because models, batch sizes, and latency SLAs differ, and Cerebras numbers are field-proven while Etched's are theoretical.

Which chip is cheaper per token at scale?

Cerebras pricing is custom inquiry-based; Etched pricing is not public. Etched's theoretical per-token advantage stems from 20x throughput gain per chip, but real-world production volumes, yield, and customer ramp are unknown.

What happens if transformers are displaced by a new AI architecture?

Cerebras can adapt its software and serve any workload on its wafer-scale engine. Etched explicitly states it will become obsolete; CEO Gavin Uberti said the company dies if transformers go away. This is Etched's existential bet.

Does Cerebras do training, or only inference?

Both. CS-3 systems support both AI training (PyTorch, Llama 70B fine-tuning in a day on 4 systems, full training from scratch on 2048 systems) and inference via its cloud platform. Etched Sohu is inference-only.

When will Sohu ship to customers?

Etched has announced customer engagement and early partnerships but as of June 2026 has no public production timeline. Sohu is not available for purchase or rental. Cerebras is shipping now via cloud and sold systems.

Bottom line

Choose Cerebras for proven, deployed large-model inference at scale.

If you are serving Llama, Qwen, or proprietary LLMs at 40B+ parameters to production users today, need multi-model flexibility, and can justify the infrastructure commitment, Cerebras is operationally ready with public cloud access and institutional customer proof points spanning Meta, Mistral, Mayo Clinic, and AWS.

The WSE-3 is the only wafer-scale inference engine in production, and its IPO validates technical and commercial viability.

Choose Etched for pure transformer inference depth if you are willing to wait for 2027 customer shipments, can tolerate single-architecture risk, and operate at hyperscale where a 20x throughput gain on a narrow workload justifies silicon redesign.

Etched is for teams running billions of Llama or closed-model transformer inferences where cost-per-token dominates all other concerns and transformers are genuinely expected to remain the AI architecture for another 5+ years.

For enterprises balancing near-term inference demands, workload diversity, and vendor flexibility, Cerebras is the only rational choice in June 2026. Etched remains a research bet—sophisticated, well-funded, and architecturally bold, but not yet production-ready.

Related matchups

Keep comparing

More ai infrastructure head-to-heads.

Sign up for our newsletter

Receive weekly updates so you can stay up-to-date with the world of AI