
Side-by-side comparison of Etched and FriendliAI — pricing, features, and use cases. Reviewed by our editorial team in Jun 2026.


Etched and FriendliAI address the same pain point—expensive transformer inference—but from opposite ends of the stack.
Etched is a pre-production hardware ASIC bet that theoretically offers 20x throughput per transformer workload by embedding attention circuits into fixed silicon, while FriendliAI is a production-ready software inference platform claiming 50-90% cost reduction through algorithmic optimizations and GPU utilization improvements.
The choice between them hinges on timeline, risk tolerance, and workload profile.
Etched Sohu has not shipped to customers as of April 2026 and no independent benchmarks exist to validate its 500,000 tokens/sec claims on Llama 70B, while the platform has no published pricing and requires Etched's proprietary compiler, creating high execution and vendor risk for early adopters.
FriendliAI, by contrast, is shipping today with Series A completed in August 2025, powering inference for enterprises including LG Electronics and AI startups like Upstage, with support for 570k+ Hugging Face models and proven cost reductions measured in live production.
However, FriendliAI's architecture remains bounded by GPU throughput constraints—it delivers 3x faster inference than vLLM through algorithmic techniques like continuous batching and speculative decoding, not architectural specialization.
If Etched's Sohu reaches production and validates its performance claims, it could become a generation-defining shift for dense transformer deployments at hyperscale. For teams making decisions today and running production inference workloads at scale, FriendliAI offers immediate cost relief and reliability.
For organizations betting on a 2-3 year horizon where Sohu becomes available, and where transformer-only workloads dominate, Etched represents a potential long-term CapEx advantage—but only if the startup clears manufacturing, software, and supply-chain hurdles that have historically derailed specialty silicon companies.
Production inference today
FriendliAI is shipping with Friendli Engine delivering 3x faster throughput than vLLM and claimed 50-90% cost reductions, backed by customer deployments at LG and Upstage. Etched Sohu is not publicly available for purchase or rental as of April 2026.
Transformer-only workloads at hyperscale
Etched's hardwired attention circuits claim 500,000 tokens/sec on Llama 70B with 8-chip server versus 23,000 tokens/sec on 8x H100, but performance is unverified and requires full infrastructure rebuild with proprietary compiler.
Model and architecture flexibility
FriendliAI supports 570,000+ Hugging Face models and works with vision, multimodal, MoE, and non-transformer architectures. Etched can only run dense transformers and cannot support MoE with dynamic expert routing, diffusion, or architecture changes.
4 use cases scored. Etched wins 1, FriendliAI wins 1.
Neither tool publishes a starting price.
Neither tool offers a free tier or trial.
Etched averages 4.5 / 5 vs 4.5 / 5 on the other side.
FriendliAI has 125 ratings vs 90 on the other.
Where each tool earns its rating — and where it falls short.



Every spec on one page. Live-pulled from each tool's detail page.
Quick answers to the questions readers ask before picking between these two.
No. As of April 2026, Sohu has not shipped to customers and is not available for purchase or cloud rental. Etched has demonstrated the chip to investors and shown controlled benchmarks, but there is no public production deployment, no independent verification of performance claims, and no announced availability date.
FriendliAI. Third-party comparisons from September 2024 and later studies confirm its 3x throughput improvement over vLLM on Llama benchmarks and lowest TTFT at 0.24 seconds among tested inference providers. Etched publishes only its own benchmarks; no third-party testing has validated its 500,000 tokens/sec claim on Llama 70B.
FriendliAI supports MoE models with dynamic expert routing, quantized models, and LoRA adapters natively on standard GPU infrastructure. Etched's Sohu cannot support MoE with dynamic routing; Etched has mentioned a separate variant for fixed-MoE architectures, but that chip is not yet available.
Etched builds transformer-specific ASIC silicon that hardwires attention, projection, and normalization operations for maximum compute density. FriendliAI runs on standard NVIDIA GPUs and optimizes inference through software techniques like iteration batching and speculative decoding. Etched aims for 20x throughput advantage; FriendliAI aims for 3x improvement over vLLM on the same GPUs.
FriendliAI's pricing is transparent: usage-based per GPU hours or per token, with claimed reductions of 50-90% versus Together AI and Fireworks for equivalent workloads. Etched has not published pricing and is not available for purchase, so cost-per-token comparisons cannot be made. Once Sohu ships, competitive pricing will depend on manufacturing yield and scale.
FriendliAI supports 570,000+ models from Hugging Face, plus custom fine-tuned, proprietary, and multimodal models. Etched supports only dense transformer models and cannot run diffusion, vision, MoE-routed, or SSM-based models like Mamba.
FriendliAI remains viable because it runs on general-purpose GPU silicon; you can immediately switch to new model families without hardware changes. Etched's entire value proposition depends on transformers remaining dominant. If architectures shift, Etched would require a full hardware redesign, a process the company estimates at 3+ years.
Choose FriendliAI for immediate, production-grade cost and latency reduction on transformer workloads running at scale today.
Teams operating Llama 70B inference across thousands of requests should evaluate FriendliAI's cost-per-token math against existing GPU infrastructure, especially if running vLLM or TensorRT-LLM at batching factors where continuous-batching and speculative-decoding optimizations yield measurable savings.
Choose Etched only if you operate at datacenter scale (5,000+ GPU equivalents), can absorb 18-24 month wait-and-see risk on production availability and performance validation, are willing to rebuild serving infrastructure with proprietary tooling, and your workloads are 100% transformer-only with zero need for multimodal, vision, or model-architecture flexibility.
Etched targets the same pain point—transformer inference CapEx—but from a hardware bet rather than software optimization, making it a long-cycle strategy for organizations with patient capital and high conviction on transformer stability. For enterprises making decisions in 2026, FriendliAI is the operational choice.
For hyperscalers or frontier labs betting 2-3 years ahead, Etched represents a potential category shift—but unproven and unshipped.
More ai infrastructure head-to-heads.
Receive weekly updates so you can stay up-to-date with the world of AI
Receive weekly updates so you can stay up-to-date with the world of AI