
Side-by-side comparison of Cerebras and Etched — pricing, features, and use cases. Reviewed by our editorial team in Jun 2026.


Cerebras's WSE-3 is the largest AI chip ever built, measuring 46,225 mm² and containing 4 trillion transistors with 900,000 AI-optimized cores delivering 125 petaflops of compute, while Etched builds Sohu—a transformer-specific ASIC chip betting that fixed-function silicon beats general-purpose GPUs for transformer inference.
These represent fundamentally opposed architectural philosophies in the race to dethrone GPU dominance. Cerebras became a public company on May 14, 2026, via the largest U.S. technology initial public offering since 2019, signaling investor confidence in wafer-scale deployment at scale.
Etched has raised close to one billion in total funding at a five billion valuation but remains private and in early customer ramp, with the chip not publicly available for purchase or rental as of April 2026.
Cerebras targets broad inference workloads—language models, reasoning tasks, scientific computing—across its cloud platform and sold systems, leveraging routinely ranking among the fastest inference providers in the world with over 2,200 tokens per second when running GPT-OSS 120B High.
Etched's Sohu is narrower by design: one 8-chip Sohu server delivers 500,000 tokens/sec on Llama 70B, a jaw-dropping per-chip figure if true, but Sohu cannot run CNNs, RNNs, LSTMs, deep learning recommendation models, or AlphaFold 2—limiting it to transformer inference only.
Sohu claims 90%+ FLOPS use where H100 transformers typically hit 30-40% of peak FLOPS; by eliminating instruction overhead, Sohu can theoretically deliver 2-3x more useful compute from the same transistors.
This is the core tension: Cerebras trades specialization for flexibility and proven production scale across multiple model families; Etched bets everything on transformers remaining the dominant architecture long enough to amortize its bet.
For organizations running diverse workloads and needing immediate deployment, Cerebras is operationally ready today.
For enterprises serving pure transformer-based services at hyperscale with tolerance for single-architecture lock-in and longer timelines, Sohu's theoretical cost-per-token advantage could prove significant—if it ships, works at claimed performance, and transformers do not cede ground to emerging architectures.
Large-scale LLM inference now (2026)
Cerebras WSE-3 is publicly deployed and serving tokens at scale through its cloud platform, partnerships with Meta and Mistral, and 40+ million tokens/sec aggregate capacity. Sohu is not yet in production.
Transformer-only shops maximizing cost-per-token
Sohu's hard-coded transformer architecture claims 20x H100 throughput on Llama 70B, but only for autoregressive inference. Not a universal accelerator.
Multi-model production workloads (training or diverse inference)
Cerebras supports PyTorch, multiple model families (Llama, Qwen, DeepSeek), vision transformers, and mixture-of-experts. Etched cannot run diffusion, recommendation models, or non-transformer architectures.
4 use cases scored. Cerebras wins 2, Etched wins 0.
Neither tool publishes a starting price.
Neither tool offers a free tier or trial.
Cerebras averages 4.9 / 5 vs 4.5 / 5 on the other side.
Cerebras has 211 ratings vs 90 on the other.
Where each tool earns its rating — and where it falls short.



Every spec on one page. Live-pulled from each tool's detail page.
Quick answers to the questions readers ask before picking between these two.
No. Sohu is hard-coded for transformer inference only and cannot run diffusion models, convolutional networks, RNNs, LSTMs, deep learning recommendation models, or any non-transformer architecture. Etched made this trade-off deliberately to maximize transformer throughput.
Yes, via two channels: Cerebras Cloud (serverless consumption model with shared or dedicated tenant options) and CS-3 system purchase for on-premises or colocation deployment. Public cloud access is live as of May 2026.
Etched claims 500K tokens/sec on Llama 70B for an 8-chip server; Cerebras claims 2,500 tokens/sec per user on Llama 4 Maverick 400B. Direct comparison is difficult because models, batch sizes, and latency SLAs differ, and Cerebras numbers are field-proven while Etched's are theoretical.
Cerebras pricing is custom inquiry-based; Etched pricing is not public. Etched's theoretical per-token advantage stems from 20x throughput gain per chip, but real-world production volumes, yield, and customer ramp are unknown.
Cerebras can adapt its software and serve any workload on its wafer-scale engine. Etched explicitly states it will become obsolete; CEO Gavin Uberti said the company dies if transformers go away. This is Etched's existential bet.
Both. CS-3 systems support both AI training (PyTorch, Llama 70B fine-tuning in a day on 4 systems, full training from scratch on 2048 systems) and inference via its cloud platform. Etched Sohu is inference-only.
Etched has announced customer engagement and early partnerships but as of June 2026 has no public production timeline. Sohu is not available for purchase or rental. Cerebras is shipping now via cloud and sold systems.
Choose Cerebras for proven, deployed large-model inference at scale.
If you are serving Llama, Qwen, or proprietary LLMs at 40B+ parameters to production users today, need multi-model flexibility, and can justify the infrastructure commitment, Cerebras is operationally ready with public cloud access and institutional customer proof points spanning Meta, Mistral, Mayo Clinic, and AWS.
The WSE-3 is the only wafer-scale inference engine in production, and its IPO validates technical and commercial viability.
Choose Etched for pure transformer inference depth if you are willing to wait for 2027 customer shipments, can tolerate single-architecture risk, and operate at hyperscale where a 20x throughput gain on a narrow workload justifies silicon redesign.
Etched is for teams running billions of Llama or closed-model transformer inferences where cost-per-token dominates all other concerns and transformers are genuinely expected to remain the AI architecture for another 5+ years.
For enterprises balancing near-term inference demands, workload diversity, and vendor flexibility, Cerebras is the only rational choice in June 2026. Etched remains a research bet—sophisticated, well-funded, and architecturally bold, but not yet production-ready.
More ai infrastructure head-to-heads.
Receive weekly updates so you can stay up-to-date with the world of AI
Receive weekly updates so you can stay up-to-date with the world of AI