I've been building an AI agent that routes requests across multiple LLM providers, OpenAI, Anthropic etc., based on the task. But pretty quickly, I hit a real problem: how do you charge for this fairly?
Flat subscriptions didn't make sense. Token costs vary by model, input vs output, and actual usage. A user generating a two-line summary isn't the same as someone churning out 3,000-word articles, yet flat pricing treats them the same.
I looked at a few options for usage-based billing. Stripe Billing has metered subscriptions but you have to build your own token tracking pipeline on top. Orb and Metronome are good, but they're separate vendors, you'd still need something to capture token data from your LLM calls and pipe it in. What I wanted was something at the gateway level, where the traffic already flows.
I ended up using Kong AI Gateway with Konnect Metering & Billing (built on OpenMeter). The gateway proxies every LLM request, so it already knows the token counts. The metering layer plugs directly into that. No separate vendor, no custom pipeline.
So instead of debating about pricing models, I set up the billing layer. A working system where every API request flows through a gateway, gets tracked, and is priced based on real usage:
- ๐ง Route requests through AI Gateway
- ๐ช Tokens get metered per consumer
- ๐ต Pricing gets applied
- ๐งพ Invoice generated
Here's the whole setup, step by step.
- Set up the gateway
- Step 1: Create a consumer
- Step 2: Configure the AI Proxy
- Step 3: Enable token metering
- Step 4: Create a feature
- Step 5: Create a plan with a rate card
- Step 6: Create a subscription
- Step 7: Validate the invoice
- Step 8: Connect Stripe
The Setup
The billing pipeline has three layers:
Kong AI Gateway proxies the LLM requests. It sits between the app and the provider, handles auth, and this is the part that matters for billing, it logs token statistics for every request.
Konnect Metering & Billing (this is built on OpenMeter) takes those token events and aggregates them per consumer, per billing cycle. It supports defining features, pricing models, and plans on top of the raw usage data.
Stripe collects payment. The metering layer generates invoices that sync to Stripe.
Let me walk through each piece.
Prerequisites
You can do this entirely through the UI or via CLI. I'll cover both as we go.
- A Kong Konnect account
- An OpenAI API key (or any LLM provider key of your choice)
For CLI, you'll also need decK (v1.43+) installed and a PAT from Kong Konnect.
Set Up the Gateway
Once you log in, click on API Gateway and create one.
I'm using Serverless here. You can choose Self-managed too. Enter the gateway name as ai-service and click Create and configure. Once that's done, click Add a service and route and fill in:
-
Service Name:
ai-service -
Service URL:
http://httpbin.konghq.com/anything -
Route Name:
ai-chat -
Route Path:
/chat
CLI
If you prefer the command line, generate your PAT and run:
export KONNECT_TOKEN='your_konnect_pat'
curl -Ls https://get.konghq.com/quickstart | bash -s -- \
-k $KONNECT_TOKEN --deck-output
This gives you a running Kong Gateway connected to Konnect. It'll output some environment variables, export them as instructed. You'll also need:
export DECK_OPENAI_API_KEY='your_openai_api_key'
Then set up the service and route:
_format_version: "3.0"
services:
- name: ai-service
url: http://httpbin.konghq.com/anything
routes:
- name: ai-chat
paths:
- "/chat"
service:
name: ai-service
Apply it with deck gateway apply. Now you have a route at /chat that we'll wire up to an LLM.
Step 1: Create a Consumer
You can't bill anyone if the gateway doesn't know who is making the request. Consumers are how Kong identifies API callers. Later, we'll map each consumer to a billing customer.
Add a consumer with a key-auth credential:
You can enter the Key value as acme-secret-key.
Now, you need to add the key-auth plugin to the service so the gateway actually requires authentication:
- Click on Plugins in the left sidebar
- Click on New Plugin
- Select Key Authentication from the plugin list
- Select Service as the scope or keep it as Global
- Click Save
CLI
_format_version: "3.0"
consumers:
- username: acme-corp
keyauth_credentials:
- key: acme-secret-key
Then enable the key-auth plugin on the service so the gateway actually requires authentication:
_format_version: "3.0"
plugins:
- name: key-auth
service: ai-service
config:
key_names:
- apikey
Apply both with deck gateway apply.
Now every request to /chat must include an apikey header. The gateway identifies the caller as acme-corp, and that identity flows through to metering. Without this step, usage events have no subject. They're anonymous, and you can't attribute them to anyone.
Step 2: Configure the AI Proxy
Next, wire the route to an actual LLM. The AI Proxy plugin accepts requests in OpenAI's chat format and forwards them to the configured provider.
- Navigate to Plugins
- Click on New Plugin
- Select AI Proxy from the plugin list
Following the below yaml for CLI and configure the plugin fields accordingly:
_format_version: "3.0"
plugins:
- name: ai-proxy
config:
route_type: llm/v1/chat
auth:
header_name: Authorization
header_value: Bearer ${{ env "DECK_OPENAI_API_KEY" }}
model:
provider: openai
name: gpt-4o
logging:
log_payloads: true
log_statistics: true
Two things to note here:
log_statistics: true is what makes billing possible. Without it, the gateway proxies requests but doesn't record token counts. When enabled, it captures prompt tokens, completion tokens, and total tokens on every response. This is the data that metering consumes downstream.
log_payloads: true logs the actual request/response content. This is optional and useful for debugging, but you'd probably turn it off in production for privacy reasons.
Apply with deck gateway apply and test:
curl -X POST "$KONNECT_PROXY_URL/chat" \
-H "Content-Type: application/json" \
-H "apikey: acme-secret-key" \
--json '{
"messages": [
{"role": "system", "content": "You are a mathematician."},
{"role": "user", "content": "What is 1+1?"}
]
}'
You should get a response from GPT-4o. The gateway handled auth, forwarded the request, and logged the token statistics.
If you want to proxy multiple providers (say, OpenAI and Anthropic with automatic failover), you'd use [ai-proxy-advanced](https://developer.konghq.com/plugins/ai-proxy-advanced/) instead with a load balancing config. I stuck with a single provider here to keep the billing walkthrough focused.
Step 3: Enable Token Metering
Now we connect the gateway's token logs to the metering system.
In Konnect, go to Metering & Billing in the sidebar. You'll see an AI Gateway Tokens section. Click Enable Related API Gateways, select your control plane (the quickstart one), and confirm.
This activates a built-in meter called kong_konnect_llm_tokens. It uses SUM aggregation on the token count, grouped by:
-
$.model: which LLM handled the request -
$.type: whether the tokens are input (request) or output (response)
The grouping matters because LLM providers charge differently for input vs. output tokens. Output tokens are typically 3-5x more expensive because input can be parallelized across GPUs while output generation is sequential, each token depends on all previous tokens. If your metering doesn't split these, your pricing will be wrong.
At this point, every authenticated request through the AI Gateway generates a usage event that gets aggregated by the meter. But usage alone doesn't generate invoices. You need to define what's billable and how it's priced.
Step 4: Create a Feature
A feature is the link between raw metered data and something that appears on an invoice. Without it, usage is tracked but never billed.
Go to Metering & Billing โ Product Catalog โ Features and create one:
-
Name:
ai-token - Meter: AI Gateway Tokens
-
Group by filters:
- Provider =
openai - Type =
request(this tracks input tokens; you'd create a separate feature for output tokens if you want to price them differently)
- Provider =
The filters narrow the meter to a specific slice of usage. In a real setup, you'd likely create multiple features, one per model, one per token direction, to apply different rates. For this walkthrough, I'm keeping it to one feature to show the flow.
Step 5: Create a Plan with a Rate Card
Plans bundle features with pricing. Go to Product Catalog โ Plans and create one:
-
Name:
Starter - Billing cadence: 1 month
Add a rate card:
-
Feature:
ai-token - Pricing model: Usage Based
-
Price per unit:
1 - Entitlement type: Boolean (grants access to the feature)
A note on what "price per unit" means here: 1 unit = 1 token, because the meter SUMs individual tokens. So entering 1 means $1.00 per token, which is way too expensive for real use. I'm using it here because the official tutorial does the same thing: a round number that makes invoice changes easy to spot during testing.
For production, you'd enter something like 0.000003 for GPT-4o input tokens ($3.00 per 1M tokens) or 0.00001 for GPT-4o output tokens ($10.00 per 1M tokens). There's no "per 1,000" toggle in the UI. You do the math yourself and enter the per-token price as a decimal.
Publish the plan. It's now available for subscriptions.
Step 6: Create a Customer and Start a Subscription
This is where the consumer from Step 1 connects to the billing system.
Go to Metering & Billing โ Billing โ Customers and create one:
-
Name:
Acme Corp -
Include usage from: select the
acme-corpconsumer
This mapping is what ties gateway traffic to a billable entity. The consumer handles identity at the gateway level; the customer handles identity at the billing level. They're separate concepts joined here.
Now create a subscription:
- Go to the Acme Corp customer, then Subscriptions โ Create a Subscription
-
Plan:
Starter - Start the subscription
One important detail: metering only invoices events that occur after the subscription starts. If you sent test requests before creating the subscription, those tokens won't appear on any invoice. I spent some time confused by this before finding it in the docs.
Step 7: Validate the Invoice
Send a few requests through the gateway:
for i in {1..6}; do
curl -s -X POST "$KONNECT_PROXY_URL/chat" \
-H "Content-Type: application/json" \
-H "apikey: acme-secret-key" \
--json '{
"messages": [
{"role": "user", "content": "Explain what a Fourier transform does in two sentences."}
]
}'
echo ""
done
Wait a minute or two for the events to propagate, then go to Metering & Billing โ Billing โ Invoices. Click on Acme Corp, go to the Invoicing tab, and hit Preview Invoice.
You should see the ai-token feature listed with the aggregated token count and the calculated charge based on your rate card. That's the billing pipeline working end to end, from an API request to a line item on an invoice.
Connecting Stripe
Konnect syncs invoices to Stripe, which handles payment collection, receipts, and retry logic for failed payments. You connect your Stripe account in the Metering & Billing settings, and invoices flow through automatically at the end of each billing cycle.
The result for end users is a transparent invoice showing exactly what they consumed: token count, model, rate applied. Not a flat fee with no breakdown.
## Things I Ran Into
The consumer-customer mapping confused me at first. Kong Gateway has "consumers" (API identity). Metering & Billing has "customers" (billing identity). They're separate. You create both, then link them. If you skip the consumer or forget to link it, usage events come in but they're not attributed to anyone billable. Set this up before you start sending traffic.
Input vs. output pricing is a bigger deal than I expected. Output tokens from OpenAI's GPT-4o cost $10.00/1M vs. $2.50/1M for input. If you use a single flat rate for "tokens," you'll underprice output-heavy workloads significantly. Splitting features by token type (request vs. response) and pricing them separately is worth the extra configuration.
The order of operations matters. Specifically: create the consumer and link it to a customer before you start sending traffic you care about billing for. Events that arrive before a subscription exists don't retroactively appear on invoices.
Where I'd Take This Next
This walkthrough uses a single provider and a single feature. A production setup would look more like:
- Multiple features: one per model per token direction (GPT-4o input, GPT-4o output, Claude input, Claude output)
- Tiered pricing: lower per-token rates at higher usage thresholds to incentivize growth
- Entitlements with metered limits: cap total tokens per month per plan tier, so you can offer Starter (500K tokens), Pro (5M tokens), Enterprise (unlimited)
- AI Proxy Advanced: route across multiple providers with load balancing (lowest-latency, round-robin, or cost-based routing)
The docs for all of these are at developer.konghq.com/metering-and-billing and developer.konghq.com/ai-gateway.
If you're building an AI agent and thinking about how to charge for it, I'd be curious to hear your approach. Per-token, credits, flat rate? What's working, what's not? Drop your thoughts in the comments.













Top comments (17)
The per-token billing approach is smart โ most AI agent systems I have seen just track total API cost at the provider level, which makes it really hard to attribute spend to specific features or user actions. Breaking it down to the token level gives you the granularity to actually optimize.
One thing I have found useful in similar setups is adding a "token budget" per task type. Instead of just tracking what was spent, you set a ceiling before execution starts. If the agent is about to blow past the budget on a single task, it forces a checkpoint instead of running up the bill silently. Pairs well with the billing system you built here.
Yeah, totally agree, budgeting is the missing control loop.
Per-token billing (what I built here with Kong AI Gateway + Konnect Metering & Billing) gives you accurate attribution, who/what actually consumed tokens. But by itself, itโs reactive.
A token budget adds a runtime guardrail. For agent flows, that means checking expected token usage before each step and stopping or degrading (smaller model, less context, fewer tool calls) instead of silently overspending.
In practice, you need both:
metering for visibility, budgets for control.
"Hey, this is one of the cleanest and most practical token billing setups Iโve seen. Really well written!
I love that you went with Kong AI Gateway + Konnect Metering instead of building yet another custom pipeline. The fact that the gateway already knows the token counts and can meter them directly is such a smart move.
The part about splitting input vs output tokens (and why it matters for pricing) is gold โ a lot of people miss that and end up undercharging on output-heavy usage.
Quick questions for you:
Howโs the added latency from the gateway in production? Noticeable or basically zero?
Would you recommend this stack for a smaller indie AI product, or is it more suitable once you have decent volume?
Thanks for the detailed walkthrough โ saved it for future reference. Super helpful!"
Thanks, really appreciate that.
On latency:
In practice, the gateway hop is usually small relative to model/provider latency, so it hasnโt been the bottleneck in my experience. Kongโs docs also call out that Gateway and AI Gateway are designed for minimal and predictable latency, but Iโd still benchmark with your own setup (plugins, traffic, provider mix) since thatโs what really determines impact.
developer.konghq.com/ai-gateway/re...
For indie products:
Yeah, I think it can make sense earlier than most people expect, if you already know you need a gateway boundary, provider abstraction, per-consumer usage tracking, and usage-based billing.
AI Gateway gives you a consistent layer across providers, and Konnect Metering & Billing handles usage tracking, pricing models, subscriptions/invoicing, and limits on top.
dev.to/tejakummarikuntla/i-built-a...
If itโs a very small app with a single provider and you just need basic cost visibility, this might be more than you need initially. But once you care about attribution, enforcing limits, or monetizing usage cleanly, doing it at the gateway layer is a lot simpler than pushing all of that logic into app code.
โค๏ธ
โค๏ธ๐
The decision to meter at the gateway level instead of the application layer is smart โ I've seen teams build token tracking into their app code and it becomes a maintenance nightmare when you add new models or providers. The gateway already sees everything, so why duplicate that logic? One challenge I've run into with per-token billing is that users often can't predict their costs because token counts are invisible to them. A "2,000 token request" means nothing to a non-technical user. Have you considered adding a cost-estimate preview before the request actually executes, or some kind of budget cap that blocks requests once a threshold is hit? That seems like the missing UX piece for making usage-based AI billing actually work for end users.
Totally agree on both points.
Gateway-level metering was mainly about avoiding duplication and keeping model/provider changes out of the app layer.
On the UX side - youโre right, token counts arenโt intuitive at all. Right now this setup solves accurate billing, but not predictable costs. Adding:
is something makes it more solid.
Estimation is a bit tricky (especially output tokens), but even a rough preview would go a long way. Feels like thatโs the next layer needed to make this usable for non-technical users.
Love the article man. Thanks for posting it!
๐
the token budget approach only works if the price you're budgeting against is accurate. most systems hardcode a rate at build time and never update it. vendors reprice quietly, caching discounts appear or disappear, and suddenly your budget math is off by 30% or more without any visible signal. the control loop needs live pricing inputs to stay meaningful.
Solid walkthrough. I've been running a similar setup but hit an interesting edge case โ streaming responses. When you're using SSE for chat completions, token counts aren't always available until the stream ends. Had to implement a small buffer that waits for the final chunk before emitting the usage event to the gateway.
The input vs output pricing split is crucial. We started with a flat "token" rate and quickly realized we were losing money on long-form generation tasks. GPT-4o's 4x output premium adds up fast.
One question: how are you handling failed requests? If a request times out or hits a rate limit mid-stream, do you still bill for the partial tokens consumed? We ended up adding a "billable" flag that only gets set when the response completes successfully.
Hi Kai, we have
provider(e.g., Anthropic),model(e.g., opus-4),type(e.g. output), andstatus_codedimensions on metered AI requests, so you can price differently for input and output tokens and filter out non-successful requests.Really solid approach to per-token billing. The split between input and output token pricing is something a lot of teams overlook โ they just track total cost per call and lose visibility into where the money actually goes.
One thing I've been thinking about with multi-provider agent setups: do you handle rate limiting or fallback routing at the gateway level too? Because if you're already tracking tokens per provider through Kong, it seems like a natural extension to add cost-aware routing โ e.g., route lower-priority tasks to the cheaper model automatically based on the billing data you're already collecting.
The Konnect Metering + Stripe integration is clean. Way better than building a custom metering pipeline from scratch.
Hi, yes, Kong AI Gateway has both usage and cost rate limiters.
This is technically possible, but it should be an app decision, no? It's specific to what you are building what is a low or high priority task
Some comments may only be visible to logged-in visitors. Sign in to view all comments.