close

DEV Community

Cover image for ๐Ÿ’ฐI Built a Token Billing System for My AI Agent - Here's How It Works
Teja Kummarikuntla
Teja Kummarikuntla Subscriber

Posted on

๐Ÿ’ฐI Built a Token Billing System for My AI Agent - Here's How It Works

Gateway tracking to avoid custom pipelines

I've been building an AI agent that routes requests across multiple LLM providers, OpenAI, Anthropic etc., based on the task. But pretty quickly, I hit a real problem: how do you charge for this fairly?

Flat subscriptions didn't make sense. Token costs vary by model, input vs output, and actual usage. A user generating a two-line summary isn't the same as someone churning out 3,000-word articles, yet flat pricing treats them the same.

I looked at a few options for usage-based billing. Stripe Billing has metered subscriptions but you have to build your own token tracking pipeline on top. Orb and Metronome are good, but they're separate vendors, you'd still need something to capture token data from your LLM calls and pipe it in. What I wanted was something at the gateway level, where the traffic already flows.

I ended up using Kong AI Gateway with Konnect Metering & Billing (built on OpenMeter). The gateway proxies every LLM request, so it already knows the token counts. The metering layer plugs directly into that. No separate vendor, no custom pipeline.

So instead of debating about pricing models, I set up the billing layer. A working system where every API request flows through a gateway, gets tracked, and is priced based on real usage:

  1. ๐Ÿšง Route requests through AI Gateway
  2. ๐Ÿช™ Tokens get metered per consumer
  3. ๐Ÿ’ต Pricing gets applied
  4. ๐Ÿงพ Invoice generated

Here's the whole setup, step by step.

The Setup

The billing pipeline has three layers:

Kong AI Gateway proxies the LLM requests. It sits between the app and the provider, handles auth, and this is the part that matters for billing, it logs token statistics for every request.

Konnect Metering & Billing (this is built on OpenMeter) takes those token events and aggregates them per consumer, per billing cycle. It supports defining features, pricing models, and plans on top of the raw usage data.

Stripe collects payment. The metering layer generates invoices that sync to Stripe.

Let me walk through each piece.

Prerequisites

You can do this entirely through the UI or via CLI. I'll cover both as we go.

  1. A Kong Konnect account
  2. An OpenAI API key (or any LLM provider key of your choice)

For CLI, you'll also need decK (v1.43+) installed and a PAT from Kong Konnect.

Set Up the Gateway

Once you log in, click on API Gateway and create one.

I'm using Serverless here. You can choose Self-managed too. Enter the gateway name as ai-service and click Create and configure. Once that's done, click Add a service and route and fill in:

  • Service Name: ai-service
  • Service URL: http://httpbin.konghq.com/anything
  • Route Name: ai-chat
  • Route Path: /chat

CLI

If you prefer the command line, generate your PAT and run:

export KONNECT_TOKEN='your_konnect_pat'
curl -Ls https://get.konghq.com/quickstart | bash -s -- \
  -k $KONNECT_TOKEN --deck-output
Enter fullscreen mode Exit fullscreen mode

This gives you a running Kong Gateway connected to Konnect. It'll output some environment variables, export them as instructed. You'll also need:

export DECK_OPENAI_API_KEY='your_openai_api_key'
Enter fullscreen mode Exit fullscreen mode

Then set up the service and route:

_format_version: "3.0"
services:
  - name: ai-service
    url: http://httpbin.konghq.com/anything
routes:
  - name: ai-chat
    paths:
      - "/chat"
    service:
      name: ai-service
Enter fullscreen mode Exit fullscreen mode

Apply it with deck gateway apply. Now you have a route at /chat that we'll wire up to an LLM.

Step 1: Create a Consumer

You can't bill anyone if the gateway doesn't know who is making the request. Consumers are how Kong identifies API callers. Later, we'll map each consumer to a billing customer.

Add a consumer with a key-auth credential:

You can enter the Key value as acme-secret-key.

Now, you need to add the key-auth plugin to the service so the gateway actually requires authentication:

  1. Click on Plugins in the left sidebar
  2. Click on New Plugin
  3. Select Key Authentication from the plugin list
  4. Select Service as the scope or keep it as Global
  5. Click Save

CLI

_format_version: "3.0"
consumers:
  - username: acme-corp
    keyauth_credentials:
      - key: acme-secret-key
Enter fullscreen mode Exit fullscreen mode

Then enable the key-auth plugin on the service so the gateway actually requires authentication:

_format_version: "3.0"
plugins:
  - name: key-auth
    service: ai-service
    config:
      key_names:
        - apikey
Enter fullscreen mode Exit fullscreen mode

Apply both with deck gateway apply.

Now every request to /chat must include an apikey header. The gateway identifies the caller as acme-corp, and that identity flows through to metering. Without this step, usage events have no subject. They're anonymous, and you can't attribute them to anyone.

Step 2: Configure the AI Proxy

Next, wire the route to an actual LLM. The AI Proxy plugin accepts requests in OpenAI's chat format and forwards them to the configured provider.

  1. Navigate to Plugins
  2. Click on New Plugin
  3. Select AI Proxy from the plugin list

Following the below yaml for CLI and configure the plugin fields accordingly:

_format_version: "3.0"
plugins:
  - name: ai-proxy
    config:
      route_type: llm/v1/chat
      auth:
        header_name: Authorization
        header_value: Bearer ${{ env "DECK_OPENAI_API_KEY" }}
      model:
        provider: openai
        name: gpt-4o
      logging:
        log_payloads: true
        log_statistics: true
Enter fullscreen mode Exit fullscreen mode

Two things to note here:

log_statistics: true is what makes billing possible. Without it, the gateway proxies requests but doesn't record token counts. When enabled, it captures prompt tokens, completion tokens, and total tokens on every response. This is the data that metering consumes downstream.

log_payloads: true logs the actual request/response content. This is optional and useful for debugging, but you'd probably turn it off in production for privacy reasons.

Apply with deck gateway apply and test:

curl -X POST "$KONNECT_PROXY_URL/chat" \
  -H "Content-Type: application/json" \
  -H "apikey: acme-secret-key" \
  --json '{
    "messages": [
      {"role": "system", "content": "You are a mathematician."},
      {"role": "user", "content": "What is 1+1?"}
    ]
  }'
Enter fullscreen mode Exit fullscreen mode

You should get a response from GPT-4o. The gateway handled auth, forwarded the request, and logged the token statistics.

If you want to proxy multiple providers (say, OpenAI and Anthropic with automatic failover), you'd use [ai-proxy-advanced](https://developer.konghq.com/plugins/ai-proxy-advanced/) instead with a load balancing config. I stuck with a single provider here to keep the billing walkthrough focused.

Step 3: Enable Token Metering

Now we connect the gateway's token logs to the metering system.

In Konnect, go to Metering & Billing in the sidebar. You'll see an AI Gateway Tokens section. Click Enable Related API Gateways, select your control plane (the quickstart one), and confirm.

This activates a built-in meter called kong_konnect_llm_tokens. It uses SUM aggregation on the token count, grouped by:

  • $.model : which LLM handled the request
  • $.type : whether the tokens are input (request) or output (response)

The grouping matters because LLM providers charge differently for input vs. output tokens. Output tokens are typically 3-5x more expensive because input can be parallelized across GPUs while output generation is sequential, each token depends on all previous tokens. If your metering doesn't split these, your pricing will be wrong.

At this point, every authenticated request through the AI Gateway generates a usage event that gets aggregated by the meter. But usage alone doesn't generate invoices. You need to define what's billable and how it's priced.

Step 4: Create a Feature

A feature is the link between raw metered data and something that appears on an invoice. Without it, usage is tracked but never billed.

Go to Metering & Billing โ†’ Product Catalog โ†’ Features and create one:

  • Name: ai-token
  • Meter: AI Gateway Tokens
  • Group by filters:
    • Provider = openai
    • Type = request (this tracks input tokens; you'd create a separate feature for output tokens if you want to price them differently)

The filters narrow the meter to a specific slice of usage. In a real setup, you'd likely create multiple features, one per model, one per token direction, to apply different rates. For this walkthrough, I'm keeping it to one feature to show the flow.

Step 5: Create a Plan with a Rate Card

Plans bundle features with pricing. Go to Product Catalog โ†’ Plans and create one:

  • Name: Starter
  • Billing cadence: 1 month

Add a rate card:

  • Feature: ai-token
  • Pricing model: Usage Based
  • Price per unit: 1
  • Entitlement type: Boolean (grants access to the feature)

A note on what "price per unit" means here: 1 unit = 1 token, because the meter SUMs individual tokens. So entering 1 means $1.00 per token, which is way too expensive for real use. I'm using it here because the official tutorial does the same thing: a round number that makes invoice changes easy to spot during testing.

For production, you'd enter something like 0.000003 for GPT-4o input tokens ($3.00 per 1M tokens) or 0.00001 for GPT-4o output tokens ($10.00 per 1M tokens). There's no "per 1,000" toggle in the UI. You do the math yourself and enter the per-token price as a decimal.

Publish the plan. It's now available for subscriptions.

Step 6: Create a Customer and Start a Subscription

This is where the consumer from Step 1 connects to the billing system.

Go to Metering & Billing โ†’ Billing โ†’ Customers and create one:

  • Name: Acme Corp
  • Include usage from: select the acme-corp consumer

This mapping is what ties gateway traffic to a billable entity. The consumer handles identity at the gateway level; the customer handles identity at the billing level. They're separate concepts joined here.

Now create a subscription:

  • Go to the Acme Corp customer, then Subscriptions โ†’ Create a Subscription
  • Plan: Starter
  • Start the subscription

One important detail: metering only invoices events that occur after the subscription starts. If you sent test requests before creating the subscription, those tokens won't appear on any invoice. I spent some time confused by this before finding it in the docs.

Step 7: Validate the Invoice

Send a few requests through the gateway:

for i in {1..6}; do
  curl -s -X POST "$KONNECT_PROXY_URL/chat" \
    -H "Content-Type: application/json" \
    -H "apikey: acme-secret-key" \
    --json '{
      "messages": [
        {"role": "user", "content": "Explain what a Fourier transform does in two sentences."}
      ]
    }'
  echo ""
done
Enter fullscreen mode Exit fullscreen mode

Wait a minute or two for the events to propagate, then go to Metering & Billing โ†’ Billing โ†’ Invoices. Click on Acme Corp, go to the Invoicing tab, and hit Preview Invoice.

You should see the ai-token feature listed with the aggregated token count and the calculated charge based on your rate card. That's the billing pipeline working end to end, from an API request to a line item on an invoice.

Connecting Stripe

Konnect syncs invoices to Stripe, which handles payment collection, receipts, and retry logic for failed payments. You connect your Stripe account in the Metering & Billing settings, and invoices flow through automatically at the end of each billing cycle.

The result for end users is a transparent invoice showing exactly what they consumed: token count, model, rate applied. Not a flat fee with no breakdown.

## Things I Ran Into

The consumer-customer mapping confused me at first. Kong Gateway has "consumers" (API identity). Metering & Billing has "customers" (billing identity). They're separate. You create both, then link them. If you skip the consumer or forget to link it, usage events come in but they're not attributed to anyone billable. Set this up before you start sending traffic.

Input vs. output pricing is a bigger deal than I expected. Output tokens from OpenAI's GPT-4o cost $10.00/1M vs. $2.50/1M for input. If you use a single flat rate for "tokens," you'll underprice output-heavy workloads significantly. Splitting features by token type (request vs. response) and pricing them separately is worth the extra configuration.

The order of operations matters. Specifically: create the consumer and link it to a customer before you start sending traffic you care about billing for. Events that arrive before a subscription exists don't retroactively appear on invoices.

Where I'd Take This Next

This walkthrough uses a single provider and a single feature. A production setup would look more like:

  • Multiple features: one per model per token direction (GPT-4o input, GPT-4o output, Claude input, Claude output)
  • Tiered pricing: lower per-token rates at higher usage thresholds to incentivize growth
  • Entitlements with metered limits: cap total tokens per month per plan tier, so you can offer Starter (500K tokens), Pro (5M tokens), Enterprise (unlimited)
  • AI Proxy Advanced: route across multiple providers with load balancing (lowest-latency, round-robin, or cost-based routing)

The docs for all of these are at developer.konghq.com/metering-and-billing and developer.konghq.com/ai-gateway.

If you're building an AI agent and thinking about how to charge for it, I'd be curious to hear your approach. Per-token, credits, flat rate? What's working, what's not? Drop your thoughts in the comments.

Top comments (17)

Collapse
 
novaelvaris profile image
Nova Elvaris

The per-token billing approach is smart โ€” most AI agent systems I have seen just track total API cost at the provider level, which makes it really hard to attribute spend to specific features or user actions. Breaking it down to the token level gives you the granularity to actually optimize.

One thing I have found useful in similar setups is adding a "token budget" per task type. Instead of just tracking what was spent, you set a ceiling before execution starts. If the agent is about to blow past the budget on a single task, it forces a checkpoint instead of running up the bill silently. Pairs well with the billing system you built here.

Collapse
 
tejakummarikuntla profile image
Teja Kummarikuntla

Yeah, totally agree, budgeting is the missing control loop.

Per-token billing (what I built here with Kong AI Gateway + Konnect Metering & Billing) gives you accurate attribution, who/what actually consumed tokens. But by itself, itโ€™s reactive.

A token budget adds a runtime guardrail. For agent flows, that means checking expected token usage before each step and stopping or degrading (smaller model, less context, fewer tool calls) instead of silently overspending.

In practice, you need both:
metering for visibility, budgets for control.

Collapse
 
vuleolabs profile image
vuleolabs

"Hey, this is one of the cleanest and most practical token billing setups Iโ€™ve seen. Really well written!
I love that you went with Kong AI Gateway + Konnect Metering instead of building yet another custom pipeline. The fact that the gateway already knows the token counts and can meter them directly is such a smart move.
The part about splitting input vs output tokens (and why it matters for pricing) is gold โ€” a lot of people miss that and end up undercharging on output-heavy usage.
Quick questions for you:

Howโ€™s the added latency from the gateway in production? Noticeable or basically zero?
Would you recommend this stack for a smaller indie AI product, or is it more suitable once you have decent volume?

Thanks for the detailed walkthrough โ€” saved it for future reference. Super helpful!"

Collapse
 
tejakummarikuntla profile image
Teja Kummarikuntla

Thanks, really appreciate that.

On latency:

In practice, the gateway hop is usually small relative to model/provider latency, so it hasnโ€™t been the bottleneck in my experience. Kongโ€™s docs also call out that Gateway and AI Gateway are designed for minimal and predictable latency, but Iโ€™d still benchmark with your own setup (plugins, traffic, provider mix) since thatโ€™s what really determines impact.

developer.konghq.com/ai-gateway/re...

For indie products:

Yeah, I think it can make sense earlier than most people expect, if you already know you need a gateway boundary, provider abstraction, per-consumer usage tracking, and usage-based billing.

AI Gateway gives you a consistent layer across providers, and Konnect Metering & Billing handles usage tracking, pricing models, subscriptions/invoicing, and limits on top.

dev.to/tejakummarikuntla/i-built-a...

If itโ€™s a very small app with a single provider and you just need basic cost visibility, this might be more than you need initially. But once you care about attribution, enforcing limits, or monetizing usage cleanly, doing it at the gateway layer is a lot simpler than pushing all of that logic into app code.

Collapse
 
sumsuzzaman profile image
Sumsuzzaman Chowdhury

โค๏ธ

Collapse
 
tejakummarikuntla profile image
Teja Kummarikuntla

โค๏ธ๐Ÿš€

Collapse
 
novaelvaris profile image
Nova Elvaris

The decision to meter at the gateway level instead of the application layer is smart โ€” I've seen teams build token tracking into their app code and it becomes a maintenance nightmare when you add new models or providers. The gateway already sees everything, so why duplicate that logic? One challenge I've run into with per-token billing is that users often can't predict their costs because token counts are invisible to them. A "2,000 token request" means nothing to a non-technical user. Have you considered adding a cost-estimate preview before the request actually executes, or some kind of budget cap that blocks requests once a threshold is hit? That seems like the missing UX piece for making usage-based AI billing actually work for end users.

Collapse
 
tejakummarikuntla profile image
Teja Kummarikuntla

Totally agree on both points.

Gateway-level metering was mainly about avoiding duplication and keeping model/provider changes out of the app layer.

On the UX side - youโ€™re right, token counts arenโ€™t intuitive at all. Right now this setup solves accurate billing, but not predictable costs. Adding:

  • cost previews
  • usage alerts
  • hard budget caps

is something makes it more solid.

Estimation is a bit tricky (especially output tokens), but even a rough preview would go a long way. Feels like thatโ€™s the next layer needed to make this usable for non-technical users.

Collapse
 
bryan_rhee_8abd4b18e955a8 profile image
Bryan Rhee

Love the article man. Thanks for posting it!

Collapse
 
tejakummarikuntla profile image
Teja Kummarikuntla

๐Ÿš€

Collapse
 
steriani_karamanlis_ad61a profile image
Steriani Karamanlis

the token budget approach only works if the price you're budgeting against is accurate. most systems hardcode a rate at build time and never update it. vendors reprice quietly, caching discounts appear or disappear, and suddenly your budget math is off by 30% or more without any visible signal. the control loop needs live pricing inputs to stay meaningful.

Collapse
 
trinhcuong-ast profile image
Kai Alder

Solid walkthrough. I've been running a similar setup but hit an interesting edge case โ€” streaming responses. When you're using SSE for chat completions, token counts aren't always available until the stream ends. Had to implement a small buffer that waits for the final chunk before emitting the usage event to the gateway.

The input vs output pricing split is crucial. We started with a flat "token" rate and quickly realized we were losing money on long-form generation tasks. GPT-4o's 4x output premium adds up fast.

One question: how are you handling failed requests? If a request times out or hits a rate limit mid-stream, do you still bill for the partial tokens consumed? We ended up adding a "billable" flag that only gets set when the response completes successfully.

Collapse
 
pmarton profile image
Peter Marton

Hi Kai, we have provider (e.g., Anthropic), model (e.g., opus-4), type (e.g. output), and status_code dimensions on metered AI requests, so you can price differently for input and output tokens and filter out non-successful requests.

Collapse
 
apex_stack profile image
Apex Stack

Really solid approach to per-token billing. The split between input and output token pricing is something a lot of teams overlook โ€” they just track total cost per call and lose visibility into where the money actually goes.

One thing I've been thinking about with multi-provider agent setups: do you handle rate limiting or fallback routing at the gateway level too? Because if you're already tracking tokens per provider through Kong, it seems like a natural extension to add cost-aware routing โ€” e.g., route lower-priority tasks to the cheaper model automatically based on the billing data you're already collecting.

The Konnect Metering + Stripe integration is clean. Way better than building a custom metering pipeline from scratch.

Collapse
 
pmarton profile image
Peter Marton • Edited

Hi, yes, Kong AI Gateway has both usage and cost rate limiters.

route lower-priority tasks to the cheaper model automatically based on the billing data you're already collecting.

This is technically possible, but it should be an app decision, no? It's specific to what you are building what is a low or high priority task

Some comments may only be visible to logged-in visitors. Sign in to view all comments.