LLM API routers, gateways, serving infrastructure, and model hosting. Tools that sit between your app and one or more language models.

Nebius is an AI-native GPU cloud platform that rents NVIDIA H100 through GB200 clusters with managed Slurm, Kubernetes and an inference API.

Ollama is a local LLM runtime that downloads, runs, and serves open models on your own hardware via a CLI and an OpenAI-compatible API.

Cloud service for developers to build with open-source AI, offering APIs, distributed training systems, and leading open-source models.

Enterprise-scale AI solutions for ultra-fast language processing and inference.

High-speed, cost-efficient generative AI for product innovation with advanced fine-tuning capabilities.

Cloud platform for running, deploying, and scaling machine learning models with ease.

Globally distributed GPU cloud for AI tasks.

Modal offers an easy way for developers to run code in the cloud with serverless compute and containerized environments.

Unified API and marketplace for the best LLMs at the best prices for any prompt.

Unified compute platform for scalable AI and Python applications using Ray

Universal LLM proxy — call 100+ LLMs (OpenAI, Anthropic, Bedrock, Vertex) with one API.

Voltage Park is a GPU cloud platform that rents NVIDIA H100 and Blackwell clusters on-demand or on dedicated reserve for AI training and inference.

Platform for software engineers to build AI applications.
Receive weekly updates so you can stay up-to-date with the world of AI
Receive weekly updates so you can stay up-to-date with the world of AI