Introduction

Envoy AI Gateway

Alauda Build of Envoy AI Gateway is based on the Envoy AI Gateway project. Envoy AI Gateway is a Kubernetes-native, AI-specific gateway layer built on top of Envoy Gateway, providing intelligent traffic management, routing, and policy enforcement for AI inference workloads.

Main components and capabilities include:

AI-Aware Routing: Routes inference requests to the appropriate backend model service based on request content, model name, and backend availability — enabling transparent multi-model serving behind a single endpoint.
OpenAI-Compatible API: Exposes a unified, OpenAI-compatible API surface (/v1/chat/completions, /v1/completions, /v1/models) for all downstream inference services, regardless of the underlying runtime.
Per-Model Rate Limiting & Policies: Enforces fine-grained rate limiting, token quotas, and traffic policies at the individual model level, preventing resource starvation and ensuring fair usage across tenants.
Backend Load Balancing: Distributes inference requests across multiple replicas of the same model using configurable load-balancing strategies, with health checking and automatic failover.
Envoy Gateway Integration: Runs as an extension of Envoy Gateway, inheriting its Kubernetes Gateway API-native control plane, TLS termination, and observability features (metrics, access logs, distributed tracing).
Gateway API Inference Extension (GIE): Integrates with the Kubernetes SIG Gateway API Inference Extension for advanced, inference-aware scheduling and load balancing decisions based on real-time backend state.

Envoy AI Gateway is a required dependency of Alauda Build of KServe for exposing inference services.

For installation on the platform, see Install Envoy AI Gateway.

Guides

The following guides configure Envoy AI Gateway as a multi-tenant model serving control plane:

Authenticating Consumers — verify SSO tokens or API keys and propagate caller identity.
Configuring Token Quotas — enforce per-user, per-department, and per-tier token budgets.
Metering Token Usage — report consumption per tenant and feed chargeback.
Routing to LLM Providers — front external providers with credential injection and failover.

Create the gateway in a dedicated namespace

Create the Gateway and its AIGatewayRoute in a dedicated namespace, such as maas-system, rather than in the Envoy Gateway control-plane namespace envoy-gateway-system. This keeps tenant gateways separate from the control plane, and avoids an issue on some versions where a gateway placed in the control-plane namespace does not get the AI Gateway request-processing (ext_proc) filter or SecurityPolicy rules applied to its listener — which silently breaks model routing, token quotas, and authentication. The data-plane proxy pods are created in envoy-gateway-system either way; only the Gateway resource's namespace matters here.

Documentation

Envoy AI Gateway upstream documentation and related resources:

Envoy AI Gateway Documentation: https://aigateway.envoyproxy.io/ — Official documentation covering architecture, configuration, and API references.
Envoy AI Gateway GitHub: https://github.com/envoyproxy/ai-gateway — Source code, release notes, and issues.
Envoy Gateway: https://gateway.envoyproxy.io/ — The underlying gateway infrastructure that Envoy AI Gateway extends.
Gateway API Inference Extension (GIE): https://gateway-api-inference-extension.sigs.k8s.io/ — Kubernetes SIG project for AI-aware routing integrated with Envoy AI Gateway.
KServe (Alauda Build): ../kserve/intro — KServe uses Envoy AI Gateway as a required dependency for exposing and routing inference services.

#Introduction

#TOC

#Envoy AI Gateway

#Guides

#Documentation

Introduction

TOC

Envoy AI Gateway

Guides

Documentation