#32 · LLM Infrastructure & Middleware

Top LLM Gateways and Proxies

Ranked List10 tools ranked

What is an LLM gateway?

An LLM gateway (sometimes called LLM proxy or LLM router) is middleware that sits between applications and the underlying LLM providers (OpenAI, Anthropic, Google, etc.), providing a unified interface, intelligent routing, observability, and governance across multiple model providers. The category effectively didn't exist in production before 2023 — applications called LLM provider APIs directly — but emerged rapidly as production deployments hit a consistent set of problems: provider lock-in (different APIs for different vendors), no redundancy when services have outages, opaque cost tracking across multiple providers, no centralized governance for PII filtering or prompt safety, and the operational complexity of managing API keys across teams. The 2026 landscape splits into three architectural patterns: *managed aggregators* (OpenRouter, Together AI, Fireworks AI) that act as marketplaces offering one API key for many models with provider markup; *self-hostable gateways* (LiteLLM, Helicone, Portkey open-source) that bring-your-own-API-keys and run as proxies on your infrastructure; and *enterprise gateways* (Portkey managed, Kong AI Gateway) that add governance, compliance, and production controls on top of routing.

Why LLM gateways matter in enterprise AI.

The strategic case has matured beyond simple "one API for many providers" into something more important: gateways have become the natural enforcement point for cross-provider governance — PII redaction, prompt injection detection, content moderation, cost controls, audit logging, and SSO/RBAC all applied uniformly regardless of which underlying model serves the request. The economic case is also significant: gateways enable cost optimization through semantic caching (Portkey claims up to 40% cost reduction on similar prompts), fallback chains during provider outages, and smart routing to cheaper models for less demanding workloads. The 2026 reality is that production LLM deployments at scale increasingly use gateways as table-stakes middleware — the question is no longer whether to use one, but which one fits the team's specific constraints (self-hosting requirements, compliance posture, multi-cloud strategy). The category has matured through Portkey making its gateway open-source (Apache 2.0) in March 2026 and various enterprise gateways (WSO2, Kong AI Gateway) extending established API management platforms into LLM-specific routing.

What to evaluate.

LLM gateway selection should consider: (1) deployment model — managed SaaS vs. self-hostable vs. on-premises/air-gapped; (2) model catalog breadth — OpenRouter at 300+ models vs. focused alternatives; (3) provider markup or platform fees — OpenRouter at 5% markup vs. LiteLLM at $0 (you pay providers directly); (4) caching capabilities — semantic vs. exact-match, with semantic caching providing meaningful cost reduction; (5) governance features — PII filtering, prompt injection detection, content moderation at the gateway layer; (6) observability integration — does the gateway include observability or require pairing with separate platforms; (7) production reliability — fallback chains, retry logic, p99 latency overhead; (8) enterprise features (SSO, RBAC, audit logs, compliance certifications). The list below ranks ten LLM gateway products most defensible for enterprise consideration.

Production-grade LLM gateway with governance and guardrails

Portkey combines gateway routing, observability, guardrails, and cost tracking in a unified platform. The platform's distinctive positioning is production safety — guardrails, PII redaction, jailbreak detection, and audit trails built into the gateway layer. In March 2026, Portkey made its entire gateway open-source (Apache 2.0), letting teams self-host the core routing and guardrails without committing to the managed platform. Best for teams building customer-facing LLM features needing guardrails, regulated industries requiring PII and content filtering, organizations wanting enterprise governance without enterprise-only commitment, and applications where prompt injection or PII leakage is a real risk. Strengths include open-source gateway core (Apache 2.0 as of March 2026), built-in guardrails (PII redaction, jailbreak detection, content moderation), comprehensive observability and cost tracking, semantic caching reducing costs up to 40%, ~8K GitHub stars (fastest-growing enterprise gateway), and clear positioning as production-safety-first. Trade-offs are complexity overhead for teams that don't need governance features, managed platform pricing starting at $49/month (open-source self-host is free), and more layers to manage than minimal gateways.

Open-source LLM proxy with self-hosting and BYOM

LiteLLM is the leading open-source LLM gateway — a Python SDK and proxy server that translates requests across 100+ LLM providers into a unified OpenAI-compatible API format. Teams self-host the proxy and manage routing, spend tracking, and access control in their own environment. LiteLLM integrates with standard observability tools (Langfuse, Helicone, Braintrust) for production monitoring. Best for engineering teams with DevOps capabilities wanting a self-hosted gateway, organizations needing full infrastructure control over their LLM routing, cost-transparent deployment paying providers directly without platform markup, and teams comfortable with code-first configuration. Strengths include MIT-licensed open-source, 100+ provider support, full self-hosting with no external dependencies, BYOM with provider API keys (you pay providers directly), comprehensive observability integration options, and active community with broad enterprise deployment. Trade-offs are requires DevOps capability for production deployment, adds ~10-20ms latency, needs separate observability stack (no built-in observability platform), and lacks built-in guardrails (must layer on separately).

Managed LLM marketplace with 300+ models

OpenRouter is the dominant managed LLM marketplace — one API key, one billing relationship, and access to 300+ models from 60+ providers including many open-weight models hosted by third parties. The platform handles routing, load balancing, and fallbacks automatically with a 5% markup on all requests. OpenRouter is the natural starting point for rapid prototyping across many models. Best for rapid prototyping and exploration across many models, organizations wanting one API key and one bill across all LLM usage, applications needing access to obscure or fine-tuned open-weight models not available via direct APIs, and teams that prefer managed convenience over self-hosting. Strengths include largest model catalog in the category (300+ models), single API key and single bill simplicity, credit-based pricing accessible for experimentation, automatic routing and fallbacks, and broad open-weight model coverage that direct vendor APIs don't offer. Trade-offs are 5% markup on all requests ($55/month on $1,000 API spend, $660/year on $13K spend), no self-hosting option, less governance and enterprise features than dedicated gateways, and the broader managed-vs-self-hosted trade-off.

Observability-first LLM gateway with smart routing

Helicone started as an LLM observability platform and expanded into an OpenAI-compatible gateway with smart routing across 100+ AI models, automatic failover, unified billing, and built-in observability. The platform is Apache 2.0 open-source with self-hosted deployment available, and is built in Rust for production-grade performance. Best for teams wanting observability-plus-gateway in one platform, organizations valuing Rust-based performance, applications where observability is as important as routing, and teams that prefer integrated platforms over composed stacks. Strengths include category-leading observability integration, Rust-based performance, Apache 2.0 license, self-hosted deployment available, caching with up to 95% cost savings on common requests, and unified observability + gateway architecture. Trade-offs are smaller GitHub presence (~3K stars) than LiteLLM, fewer governance features than enterprise-focused alternatives, and the integrated platform commitment that creates implicit dependency.

Enterprise AI gateway built on Kong API management

Kong AI Gateway extends Kong's category-leading API management platform with LLM-specific routing, governance, and observability — natural fit for enterprises already standardized on Kong for general API management who want to extend Kong patterns to AI workloads. Best for large enterprises already using Kong for API management, organizations wanting infrastructure consolidation (single platform for traditional APIs and LLM gateways), regulated industries valuing established Kong enterprise patterns, and teams that prefer extending existing infrastructure over adopting new platforms. Strengths include category-leading enterprise API management heritage, extensive plugin ecosystem, broad enterprise compliance posture, integration with existing Kong infrastructure, and clear positioning for Kong-standardized organizations. Trade-offs are Kong ecosystem alignment that creates lock-in for non-Kong shops, less LLM-specific optimization than purpose-built gateways (semantic caching, model-aware load balancing must be built rather than out-of-box), and complexity overhead for teams just wanting LLM routing.

AI gateway integrated with Vercel deployment platform

Vercel AI Gateway extends the Vercel deployment platform with AI gateway capabilities — natural fit for organizations standardized on Vercel and Next.js for application deployment who want AI routing tightly integrated with their broader stack. The platform combines routing, observability, and integration with the Vercel AI SDK. Best for Vercel and Next.js–standardized organizations, full-stack JavaScript teams using Vercel AI SDK, applications already deployed on Vercel extending into AI workloads, and teams valuing integrated deployment + AI gateway. Strengths include native Vercel and Next.js integration, tight integration with Vercel AI SDK, accessible developer experience, and clear positioning for the Vercel ecosystem. Trade-offs are Vercel ecosystem alignment, less suited for non-Vercel deployments, narrower than dedicated enterprise gateways for the most complex governance needs.

Enterprise AI gateway with unbundled approach

WSO2 AI Gateway extends WSO2's API management platform with AI gateway capabilities — distinguished by an unbundled approach where teams get a LiteLLM-weight starting point with a clear path to enterprise features without re-platforming. WSO2 brings verified compliance credentials (SOC 2, ISO 27001, HIPAA) and the ability to self-host. Best for enterprises wanting self-hostable AI infrastructure with compliance credentials, organizations that need a lightweight start with a clear growth path to enterprise features, regulated industries valuing WSO2's established compliance posture, and teams already using WSO2 for API management. Strengths include verified enterprise compliance credentials, self-hosting available, unbundled approach (start light, grow into governance), and integration with broader WSO2 enterprise patterns. Trade-offs are WSO2 ecosystem alignment, narrower mindshare than category leaders, and complexity for teams that don't need enterprise-grade governance.

Edge-deployed AI gateway with Cloudflare integration

Cloudflare AI Gateway provides AI gateway capabilities deployed at the edge across Cloudflare's global network — distinctive for teams wanting AI routing co-located with edge computing and Cloudflare Workers. The platform offers caching, rate limiting, observability, and analytics integrated with Cloudflare's broader edge platform. Best for Cloudflare Workers and edge deployment, organizations valuing edge-deployed AI gateway co-located with applications, applications using Cloudflare R2/D1/Workers AI alongside external LLM providers, and teams already standardized on Cloudflare for edge infrastructure. Strengths include edge deployment across Cloudflare's global network, integration with broader Cloudflare platform (Workers, R2, D1, Workers AI), accessible pricing, and clear positioning for edge-first deployments. Trade-offs are Cloudflare ecosystem alignment, less specialized than dedicated AI gateways for the deepest LLM-specific features, and narrower than enterprise-focused gateways for the most complex governance needs.

Optimized inference platform with gateway features

Together AI provides high-performance inference for open-weight models combined with gateway features for routing across hosted models — distinctive for teams wanting optimized inference performance alongside multi-model routing. The platform focuses heavily on open-weight model performance (Llama, Mistral, Qwen, DeepSeek). Best for organizations heavily using open-weight models, applications where inference performance on open models matters significantly, teams wanting optimized hosting plus gateway routing, and use cases benefiting from Together's inference optimization. Strengths include category-leading inference performance on open-weight models, broad open-weight model coverage, integrated hosting and gateway capabilities, and clear positioning in the open-weight inference space. Trade-offs are more specialized than general-purpose gateways (open-weight focus), less suited for teams using only frontier closed models, and narrower governance features than enterprise-focused alternatives.

High-performance open-source LLM gateway

Bifrost from Maxim AI is positioned as a high-performance open-source LLM gateway emphasizing zero-config simplicity combined with enterprise features — low latency overhead, broad provider support, governance features, and MCP support. The platform competes directly with LiteLLM and Portkey on the self-hosted open-source dimension. Best for engineering teams wanting high-performance open-source gateway, organizations valuing low-latency overhead, applications needing MCP support at the gateway layer, and teams comfortable with newer entrants offering performance differentiation. Strengths include high-performance design with minimal latency overhead, open-source license, MCP support, growing community, and clear performance-first positioning. Trade-offs are smaller installed base than LiteLLM or Portkey, newer platform with less production track record, and overlapping coverage with established alternatives.

Top LLM Gateways and Proxies | Xither | Xither