The Index · AI Categories · LLM Gateways & Serving

LLM Gateways & Serving

LLM API routers, gateways, serving infrastructure, and model hosting. Tools that sit between your app and one or more language models.

Tools indexed
14
Reviewed by our editors
Edition
Vol. 4 · Iss. 19
Last reviewed 2026-05-30
Status
Live
Reviewed each edition
Narrow by sub-topic
Featured · this edition
1 featured
Editor's Picks

Where to start

Best for · Route to any LLM through one API
OpenRouter ai infrastructure tool logo

OpenRouter

AI Infrastructure
Freemium
4.84
360
Best for · Ultra-low-latency model inference
Groq ai infrastructure tool logo

Groq

AI Infrastructure
Paid - Inquire
4.87
430
Best for · Host and fine-tune open models
TOGETHER's logo

TOGETHER

Developer Tools
Paid - Inquire
4.93
441
Best for · Fast inference for open models
Fireworks AI ai/ml models tool logo

Fireworks AI

AI/ML Models
Paid - Inquire
4.92
420
Best for · Run and deploy models with an API
Replicate developer tools tool logo

Replicate

Developer Tools
Paid - Inquire
4.92
420
Best for · Run open models locally
Ollama brand logo mark shown as a square app icon

Ollama

Developer Tools
Free
4.93
470
Every listing
Sortable
Sorted by
Nebius llm gateways & serving tool logo

Nebius

Nebius is an AI-native GPU cloud platform that rents NVIDIA H100 through GB200 clusters with managed Slurm, Kubernetes and an inference API.

Paid - Paid
4.93
470
Ollama brand logo mark shown as a square app icon

Ollama

Ollama is a local LLM runtime that downloads, runs, and serves open models on your own hardware via a CLI and an OpenAI-compatible API.

Free
4.93
470
TOGETHER's logo

TOGETHER

Cloud service for developers to build with open-source AI, offering APIs, distributed training systems, and leading open-source models.

Paid - Inquire
4.93
441
Groq llm gateways & serving tool logo

Groq

Enterprise-scale AI solutions for ultra-fast language processing and inference.

Paid - Inquire
4.87
430
Fireworks AI llm gateways & serving tool logo

Fireworks AI

High-speed, cost-efficient generative AI for product innovation with advanced fine-tuning capabilities.

Paid - Inquire
4.92
420
Replicate llm gateways & serving tool logo

Replicate

Cloud platform for running, deploying, and scaling machine learning models with ease.

Paid - Inquire
4.92
420
RunPod llm gateways & serving tool logo

RunPod

Globally distributed GPU cloud for AI tasks.

Paid - Inquire
4.91
410
Modal llm gateways & serving tool logo

Modal

Modal offers an easy way for developers to run code in the cloud with serverless compute and containerized environments.

Paid - $30 /mo
4.86
400
OpenRouter llm gateways & serving tool logo

OpenRouter

Unified API and marketplace for the best LLMs at the best prices for any prompt.

Freemium
4.84
360
Anyscale-logo

Anyscale

Unified compute platform for scalable AI and Python applications using Ray

Paid - Inquire
4.83
360
LiteLLM llm gateways & serving tool logo

LiteLLM

Universal LLM proxy — call 100+ LLMs (OpenAI, Anthropic, Bedrock, Vertex) with one API.

Freemium
4.75
325
Voltage Park llm gateways & serving tool logo

Voltage Park

Voltage Park is a GPU cloud platform that rents NVIDIA H100 and Blackwell clusters on-demand or on dedicated reserve for AI training and inference.

Paid - Paid
4.75
240
DeepInfra official company logo for the AI tool

DeepInfra

DeepInfra is an inference cloud that serves open-weight AI models — Llama, DeepSeek, Qwen, Mistral — behind a pay-per-token, OpenAI-compatible API.

Paid - Paid
4.46
170
BentoML llm gateways & serving tool logo

BentoML

Platform for software engineers to build AI applications.

Paid - Inquire
4.63
113
Related categories
Questions

LLM Gateways & Serving AI, answered

What is an LLM gateway?

An LLM gateway is a single API that routes requests to many language models behind one interface, handling keys, fallbacks, and cost tracking. OpenRouter is a common example, letting you switch models without rewriting code. It simplifies comparing providers and avoids lock-in to one vendor.

What is the fastest LLM inference provider?

Groq is known for very low latency using custom hardware, and Fireworks and Together also optimize open-model serving for speed. The fastest choice depends on the model and request pattern, so benchmark on your own prompts. Latency, throughput, and cost trade off differently across providers.

Where can I host open-source models?

Together AI, Fireworks, and Replicate host open models behind an API so you avoid managing GPUs, while RunPod and Modal give you raw compute to run them yourself. For local use, Ollama runs models on your own machine. Choose based on scale, control, and whether you want managed or self-operated serving.

What is the difference between a model gateway and a serving platform?

A gateway like OpenRouter routes requests across providers through one API but does not host the models itself. A serving platform like Fireworks or Together runs the models and returns results. Many teams use a gateway in front of one or more serving platforms to balance cost and reliability.

How do I run an LLM locally?

Tools like Ollama download and run open models on your own hardware with a simple command, exposing a local API your app can call. Local serving keeps data private and removes per-call cost, but it is limited by your GPU or CPU. It suits development, privacy-sensitive use, and smaller models.

How do I cut LLM API costs?

Route cheaper requests to smaller or open models, cache repeated responses, and trim prompt length. A gateway like OpenRouter makes it easy to switch models by price and performance, and open-model hosts like Together often cost less than frontier APIs. Match each task to the smallest model that meets quality.

Collections featuring these tools

Vol. 4 · Issue 19 · Last reviewed 2026-05-30

Sign up for our newsletter

Receive weekly updates so you can stay up-to-date with the world of AI