Editorial matchup · June 2026

Etched vs FriendliAI: Which AI Tool Is Better in 2026?

Side-by-side comparison of Etched and FriendliAI — pricing, features, and use cases. Reviewed by our editorial team in Jun 2026.

Use-case score 11Updated Jun 2026
Etched logo

Etched

AI Infrastructure
4.5Paid80
FriendliAI logo

FriendliAI

AI Infrastructure
4.5Paid110
The verdictUse-case score · 11

Etched and FriendliAI address the same pain point—expensive transformer inference—but from opposite ends of the stack.

Etched is a pre-production hardware ASIC bet that theoretically offers 20x throughput per transformer workload by embedding attention circuits into fixed silicon, while FriendliAI is a production-ready software inference platform claiming 50-90% cost reduction through algorithmic optimizations and GPU utilization improvements.

The choice between them hinges on timeline, risk tolerance, and workload profile.

Etched Sohu has not shipped to customers as of April 2026 and no independent benchmarks exist to validate its 500,000 tokens/sec claims on Llama 70B, while the platform has no published pricing and requires Etched's proprietary compiler, creating high execution and vendor risk for early adopters.

FriendliAI, by contrast, is shipping today with Series A completed in August 2025, powering inference for enterprises including LG Electronics and AI startups like Upstage, with support for 570k+ Hugging Face models and proven cost reductions measured in live production.

However, FriendliAI's architecture remains bounded by GPU throughput constraints—it delivers 3x faster inference than vLLM through algorithmic techniques like continuous batching and speculative decoding, not architectural specialization.

If Etched's Sohu reaches production and validates its performance claims, it could become a generation-defining shift for dense transformer deployments at hyperscale. For teams making decisions today and running production inference workloads at scale, FriendliAI offers immediate cost relief and reliability.

For organizations betting on a 2-3 year horizon where Sohu becomes available, and where transformer-only workloads dominate, Etched represents a potential long-term CapEx advantage—but only if the startup clears manufacturing, software, and supply-chain hurdles that have historically derailed specialty silicon companies.

T
ToolDirectory.AIEditorial Team

Production inference today

FriendliAI

FriendliAI is shipping with Friendli Engine delivering 3x faster throughput than vLLM and claimed 50-90% cost reductions, backed by customer deployments at LG and Upstage. Etched Sohu is not publicly available for purchase or rental as of April 2026.

Transformer-only workloads at hyperscale

Etched

Etched's hardwired attention circuits claim 500,000 tokens/sec on Llama 70B with 8-chip server versus 23,000 tokens/sec on 8x H100, but performance is unverified and requires full infrastructure rebuild with proprietary compiler.

Model and architecture flexibility

FriendliAI

FriendliAI supports 570,000+ Hugging Face models and works with vision, multimodal, MoE, and non-transformer architectures. Etched can only run dense transformers and cannot support MoE with dynamic expert routing, diffusion, or architecture changes.

Section 01

Best for what

4 use cases scored. Etched wins 1, FriendliAI wins 1.

  • Pricing value

    Neither tool publishes a starting price.

    Even
  • Free tier

    Neither tool offers a free tier or trial.

    Even
  • User ratings

    Etched averages 4.5 / 5 vs 4.5 / 5 on the other side.

    Etched
  • Review volume

    FriendliAI has 125 ratings vs 90 on the other.

    FriendliAI
Section 02

Pros & cons

Where each tool earns its rating — and where it falls short.

Etched logo

Etched

AI Infrastructure
Pros
  • Transformer-specific ASIC architecture with hardwired attention circuits claims 20x throughput advantage over H100 by eliminating instruction fetch and scheduling overhead inherent to general-purpose GPUs.
  • 144GB HBM3E memory per chip on TSMC 4nm reticle-limit die provides ample context window and KV-cache capacity for frontier models like Llama 70B.
  • If production timeline is met and performance validated, represents potential order-of-magnitude CapEx reduction for inference-only operations at hyperscale deployment.
  • Foundational bet on transformer stability could unlock edge-deployment and humanoid-robot real-time inference where latency and power efficiency are critical.
Cons
  • Not available for purchase or rental as of April 2026—more than 20 months after announcement. No independent benchmarks exist; all performance claims come from Etched's own marketing materials and controlled demos.
  • Requires complete serving stack rebuild with Etched's proprietary compiler. No vLLM, TensorRT-LLM, or CUDA support—migration from established GPU frameworks carries high implementation and technical risk.
  • Cannot run MoE with dynamic expert routing, diffusion models, vision transformers, or any non-transformer architecture. A single architecture pivot by the AI community leaves hardware obsolete.
  • Startup execution risk: founded 2022 with no prior shipped silicon products. Supply chain constraints on HBM3E memory and TSMC yield on reticle-limit die could delay ramp significantly.
  • No published pricing, no disclosed TDP, server architecture unclear, no enterprise support roadmap disclosed.
Section 03

At a glance

Every spec on one page. Live-pulled from each tool's detail page.

  • Pricing
    Paid
    Paid
  • Pricing model
    Paid
    Paid
  • Free tier
    No
    No
  • Free trial
    No
    No
  • Rating
    4.5 / 5 (90 ratings)
    4.5 / 5 (125 ratings)
  • Saves
    80
    110
  • Categories
    AI Infrastructure, Engineering & Simulation
    AI Infrastructure, AI/ML Models
  • Verified
    No
    No
  • Top 100 tier
  • Last updated
    May 2026
    May 2026
Frequently asked

Etched vs FriendliAI FAQs

Quick answers to the questions readers ask before picking between these two.

Is Etched Sohu available to rent or purchase today?

No. As of April 2026, Sohu has not shipped to customers and is not available for purchase or cloud rental. Etched has demonstrated the chip to investors and shown controlled benchmarks, but there is no public production deployment, no independent verification of performance claims, and no announced availability date.

Which platform has verified, independent benchmarks?

FriendliAI. Third-party comparisons from September 2024 and later studies confirm its 3x throughput improvement over vLLM on Llama benchmarks and lowest TTFT at 0.24 seconds among tested inference providers. Etched publishes only its own benchmarks; no third-party testing has validated its 500,000 tokens/sec claim on Llama 70B.

Can either platform run mixture-of-experts models?

FriendliAI supports MoE models with dynamic expert routing, quantized models, and LoRA adapters natively on standard GPU infrastructure. Etched's Sohu cannot support MoE with dynamic routing; Etched has mentioned a separate variant for fixed-MoE architectures, but that chip is not yet available.

What is the difference between Etched hardware and FriendliAI software?

Etched builds transformer-specific ASIC silicon that hardwires attention, projection, and normalization operations for maximum compute density. FriendliAI runs on standard NVIDIA GPUs and optimizes inference through software techniques like iteration batching and speculative decoding. Etched aims for 20x throughput advantage; FriendliAI aims for 3x improvement over vLLM on the same GPUs.

How much does each platform cost for inference?

FriendliAI's pricing is transparent: usage-based per GPU hours or per token, with claimed reductions of 50-90% versus Together AI and Fireworks for equivalent workloads. Etched has not published pricing and is not available for purchase, so cost-per-token comparisons cannot be made. Once Sohu ships, competitive pricing will depend on manufacturing yield and scale.

Which platform supports more AI models?

FriendliAI supports 570,000+ models from Hugging Face, plus custom fine-tuned, proprietary, and multimodal models. Etched supports only dense transformer models and cannot run diffusion, vision, MoE-routed, or SSM-based models like Mamba.

What happens if AI architectures shift away from transformers?

FriendliAI remains viable because it runs on general-purpose GPU silicon; you can immediately switch to new model families without hardware changes. Etched's entire value proposition depends on transformers remaining dominant. If architectures shift, Etched would require a full hardware redesign, a process the company estimates at 3+ years.

Bottom line

Choose FriendliAI for immediate, production-grade cost and latency reduction on transformer workloads running at scale today.

Teams operating Llama 70B inference across thousands of requests should evaluate FriendliAI's cost-per-token math against existing GPU infrastructure, especially if running vLLM or TensorRT-LLM at batching factors where continuous-batching and speculative-decoding optimizations yield measurable savings.

Choose Etched only if you operate at datacenter scale (5,000+ GPU equivalents), can absorb 18-24 month wait-and-see risk on production availability and performance validation, are willing to rebuild serving infrastructure with proprietary tooling, and your workloads are 100% transformer-only with zero need for multimodal, vision, or model-architecture flexibility.

Etched targets the same pain point—transformer inference CapEx—but from a hardware bet rather than software optimization, making it a long-cycle strategy for organizations with patient capital and high conviction on transformer stability. For enterprises making decisions in 2026, FriendliAI is the operational choice.

For hyperscalers or frontier labs betting 2-3 years ahead, Etched represents a potential category shift—but unproven and unshipped.

Related matchups

Keep comparing

More ai infrastructure head-to-heads.

Sign up for our newsletter

Receive weekly updates so you can stay up-to-date with the world of AI