
Side-by-side comparison of Etched and Groq — pricing, features, and use cases. Reviewed by our editorial team in Jun 2026.


Etched and Groq represent two distinct bets on specialized AI inference acceleration.
Groq's Language Processing Unit (LPU) is shipping today with proven 300-800 tokens per second throughput on Llama models, backed by NVIDIA's 20 billion licensing deal in December 2025 and deep enterprise deployments at companies including Dropbox and Volkswagen.
Etched's Sohu ASIC claims 500,000 tokens per second across an 8-chip server on Llama 70B, but as of April 2026 has not shipped to production customers—no independent benchmarks exist at shipping batch sizes.
Groq optimizes for deterministic decode latency through SRAM-only weight storage and static scheduling; Sohu bets on pure transformer specialization via fixed-function silicon at 90% FLOPS utilization versus 30-40% on GPUs. The practical choice depends on timeline and architectural constraints.
Groq suits teams needing production inference with sub-300ms latency available now via GroqCloud; Etched targets hyperscalers running pure transformer workloads at massive scale, but only after delivery credibility is established.
Groq also carries NVIDIA backing and integration with NVIDIA Vera Rubin infrastructure, dramatically reducing execution risk compared to Etched's startup supply chain.
Architecturally, both chips solve the memory-bandwidth bottleneck facing GPUs in decode-dominant inference, but Etched is transformer-only forever—if state-space models or hybrid architectures gain share, Sohu loses its value proposition. Groq runs any transformer model without hardware modifications, making it future-resistant for model architecture shifts.
Production-ready, low-latency inference today
Groq LPU ships in GroqCloud with independent benchmarks confirming 241-800 tokens per second and sub-100ms time-to-first-token. Etched Sohu is not yet available for purchase or rental; claims are unverified at production batch sizes.
Architectural future-proofing
Groq LPU runs any transformer model via compiler. Etched Sohu cannot run CNNs, RNNs, state-space models, or MoE architectures like DeepSeek V4—a permanent hardware constraint that becomes critical if transformer dominance ends.
Extreme transformer throughput at batch-1
Etched claims 500,000 tokens per second on Llama 70B with 8 Sohu chips versus 45,000 on 8xB200 GPUs. However, this benchmark is at batch size 1; GPU throughput scales with batching, and Etched has not published batch-32 or batch-256 figures.
4 use cases scored. Etched wins 0, Groq wins 2.
Neither tool publishes a starting price.
Neither tool offers a free tier or trial.
Groq averages 4.9 / 5 vs 4.5 / 5 on the other side.
Groq has 196 ratings vs 90 on the other.
Where each tool earns its rating — and where it falls short.



Every spec on one page. Live-pulled from each tool's detail page.
Quick answers to the questions readers ask before picking between these two.
No. As of April 2026, Sohu has not shipped to production customers. Etched has demonstrated the chip to investors and run internal benchmarks, but no independent verification exists and pricing has not been disclosed. Groq LPU is available now via GroqCloud with metered pricing on a per-token basis.
Etched claims 500,000 tokens per second on Llama 70B with 8 Sohu chips at batch 1, versus Groq's verified 300-800 tokens per second on single LPU. The comparison is misleading: Etched's figure requires 8 chips and is measured at batch 1; Groq's baseline requires scaling across many chips for large models too. Groq's advantage is lower time-to-first-token (sub-100ms versus unknown for Sohu), which matters more for interactive applications.
Yes. Groq's LPU runs any transformer architecture including MoE models. DeepSeek V4 and Qwen3 run on Groq LPU without limitation. Etched Sohu cannot run MoE, SSM, or non-transformer architectures—a permanent hardware constraint.
Groq LPU remains relevant because the compiler is model-agnostic; if a new architecture emerges, Groq's software stack can adapt. Etched Sohu becomes obsolete hardware because the entire chip is hard-wired for transformer ops. Etched has acknowledged this risk explicitly; it is a binary bet that transformers dominate for 5+ years.
Unknown for Etched because Sohu is not yet available and pricing is not disclosed. Groq pricing is available on a metered per-token basis and is generally competitive with or cheaper than GPU cloud at typical throughput. For comparable quality, Groq Llama 70B is typically 20-60% cheaper than GPU cloud on throughput-adjusted basis, but Groq requires hundreds of interconnected chips for large models, creating high CapEx and rack footprint.
NVIDIA acquired Groq's core technology and engineering team via 20 billion licensing deal in December 2025. Jonathan Ross and 80% of Groq engineers joined NVIDIA's Real-Time Inference division. Groq 3 LPU is now part of NVIDIA Vera Rubin. The remaining GroqCloud entity continues serving customers under new leadership, but core product development is now NVIDIA-controlled. This reduces Groq's independence but eliminates execution risk versus Etched.
Groq via GroqCloud. Production-ready, metered pricing, proven performance on open-source models, NVIDIA backing, and no hardware supply risk. Use Groq for latency-sensitive workloads (chat, voice, coding). Use GPU cloud (H100/B200) for diverse workloads, training, and proprietary models. Monitor Etched but do not bet infrastructure on unshipped hardware.
Teams with proven inference workloads should use Groq today. GroqCloud is production-ready with transparent pricing, enterprise support, and independent performance validation.
The LPU's sub-100ms latency unlocks use cases—voice AI, real-time code completion, conversational agents—that GPU infrastructure cannot economically reach. The NVIDIA partnership removes execution risk and ensures long-term platform development.
Etched's Sohu is a 2027-2028 play at earliest: the technology is credible, but capital-intensive hyperscalers and cloud providers must wait for production silicon, independent batch-size benchmarks, and proven supply chain before committing.
If Sohu ships on schedule and meets claimed throughput, it becomes mandatory for teams with pure transformer workloads at petaflop scale. Until then, Groq is the lower-risk, faster inference path.
For enterprises evaluating both: Groq via GroqCloud for immediate latency wins on open-source models; GPU cloud (NVIDIA H100/B200) as the default for diverse workloads, training, and proprietary model access.
Etched should be monitored closely but not adopted until independent production benchmarks exist and pricing is disclosed. The key risk for Etched is architectural: if state-space models, mixture-of-experts variants, or novel architectures displace pure transformers, Sohu becomes obsolete. Groq hedged this risk by supporting any transformer and remaining part of NVIDIA's broader inference ecosystem.
More ai infrastructure head-to-heads.
Receive weekly updates so you can stay up-to-date with the world of AI
Receive weekly updates so you can stay up-to-date with the world of AI