Metering Token Usage
Introduction
Envoy AI Gateway exposes Prometheus metrics that follow the OpenTelemetry GenAI semantic conventions, including token usage per request. By adding the caller's identity as a metric label and collecting the metric through the platform monitoring stack, you get a unified view of token consumption per department, namespace, and model. The same data feeds chargeback through Alauda Cost Management.
The pipeline is: the gateway emits token metrics, identity is attached as a label, a PodMonitor collects the metric into the platform, and a MonitorDashboard presents it. No raw PromQL is required for day-to-day viewing.
Use Cases
- Show each department its own token consumption by model, isolated per project.
- Track which models drive the most token usage across the platform.
- Provide the usage data that Alauda Cost Management prices into a chargeback report.
Prerequisites
-
An
AIGatewayRoutewithllmRequestCostsconfigured. See Configuring Token Quotas. WithoutllmRequestCoststhe gateway still emitsgen_ai_client_token_usage_token, but the per-request token counts will all be zero. -
Caller identity propagated as request headers. See Authenticating Consumers.
-
Platform monitoring is enabled on the cluster. Confirm by checking the Prometheus operator CRDs:
-
Sanity-check that the metric is being emitted before wiring monitoring. Send one request through the gateway, then read the ExtProc sidecar's admin port on a data-plane proxy pod:
If no
gen_ai_*sample appears, no scraping below will work — first fix the route / ExtProc wiring.
Create the Gateway and AIGatewayRoute in a dedicated namespace (for example maas-system), not in the Envoy Gateway control-plane namespace envoy-gateway-system. A gateway placed in the control-plane namespace may not have the AI Gateway request-processing filter and SecurityPolicy applied to its listener, which silently breaks routing and policy enforcement. See Envoy AI Gateway.
Steps
Add an identity label to token metrics
By default the token metric gen_ai_client_token_usage_token carries the OpenTelemetry GenAI standard labels only (model, provider, operation, token type). Enrich it with the department dimension by mapping an identity header to a metric label in the Envoy AI Gateway controller.
The controller reads the mapping from the CLI flag -metricsRequestHeaderAttributes=<header>:<label>[,<header>:<label>...] on the ai-gateway-controller Deployment. If the controller was installed via Helm, the chart renders this flag from a values key (for example controller.metricsRequestHeaderAttributes); supply your release name and chart reference and apply helm upgrade --reuse-values. If you manage the Deployment directly, patch its container args:
x-user-group: the header set by theSecurityPolicyfrom the IdPgroupsclaim.department: the resulting metric label used for aggregation.
Prefer a low-cardinality dimension such as department (x-user-group) for the default label. A per-user label (x-user-id) is possible but produces high-cardinality series (one per user × model × token type), so add it only when per-user reporting is required and a retention window keeps the series count bounded.
After the rollout, send a fresh request and confirm the new label is on the sample:
Collect the metric into the platform
The metric is emitted by the AI Gateway external processor (ExtProc), which runs as a sidecar on each data-plane proxy pod (declared as a Kubernetes native sidecar / initContainer) and exposes its admin endpoint on container port 1064 (named aigw-metrics). Scrape it directly from the proxy pods with a PodMonitor so the platform Prometheus or VictoriaMetrics collects it. For the platform workflow, see metrics management.
Discover what label your cluster's Prometheus operator uses to select PodMonitor objects, so the resource below is actually picked up:
Then apply the PodMonitor with that label in its own metadata.labels (not the selector — these are two different things):
metadata.labels: how Prometheus discovers thePodMonitor. Without the right label, the resource exists but is invisible to the scrape pipeline.spec.selector: matches every Envoy Gateway data-plane proxy pod in the cluster. To restrict to oneGateway, replace it withgateway.envoyproxy.io/owning-gateway-name: <gateway-name>andgateway.envoyproxy.io/owning-gateway-namespace: <gateway-namespace>.port: aigw-metrics: the named port on the ExtProc sidecar that serves/metricson container port1064.
Confirm Prometheus is actually scraping the target after the PodMonitor is applied. Port-forward Prometheus and check that a scrape pool exists for our PodMonitor and is healthy:
Build a unified usage dashboard
Create a MonitorDashboard to present token usage. Use variables for namespace, model, and department so consumers can filter, and rely on the Business View so each project sees only its own data. For the platform workflow, see monitoring dashboards.
The metric name in Prometheus follows OpenTelemetry's GenAI semantic conventions and is stored with dots, not underscores — gen_ai.client.token.usage_token_sum, etc. Reference it through the {__name__="..."} selector form, which works for any UTF-8 metric name:
Confirm the metric is queryable by listing all gen_ai* series names on the Prometheus UI's Status → TSDB page, or with:
Chargeback with Cost Management
Token usage can be priced and attributed for chargeback by Alauda Cost Management, which is a separate product. It ingests a custom usage metric defined by a PromQL query and attributes cost by label, so the department-labelled token metric above becomes a per-department bill. For the cost-model configuration, see custom cost model.
Verification
Send several requests with valid identity tokens that resolve to different departments, then confirm the metric carries the department label by port-forwarding the ExtProc sidecar's admin port on any proxy pod:
Expect at least one sample per department, for example:
Open the dashboard and confirm token usage appears, filterable by namespace, model, and department.