How to Bill for LLM Inference: The Complete Guide

The hard part of charging for AI isn’t the model call — it’s everything downstream. A request can burn 200 tokens or 200,000. Each provider reports usage with a different schema. Cached tokens are cheaper than fresh ones. And underneath the price you quote a customer sits a cost you pay a model provider that shifts every quarter. Get the gap between those two numbers wrong and you either bleed money on power users or scare off everyone else. This guide compares the two realistic paths — build your own metering stack, or run it on Macropay — and then shows how to set margin, included credits, and overage so your AI product actually makes money.

Macropay is a Merchant of Record. When a customer pays for your AI product, we are the legal seller, so we calculate and remit sales tax and VAT worldwide, keep you off the hook for that liability, and stay PCI DSS Level 1 compliant. You ship the product; we handle the money, the tax, and the compliance.

What “billing for inference” actually requires

Before picking an approach, it helps to see the full surface area. Charging for LLM usage means owning all of this:

Layer	Job
Capture	Record tokens, model, and cost on every request
Normalize	Reconcile `prompt_tokens` / `input_tokens` / `promptTokenCount` across providers
Aggregate	Roll raw events up to per-customer, per-period quantities
Price	Apply markup, tiers, included credits, and overage
Collect	Invoice, charge the card, retry failures, handle dunning
Reconcile	Track your cost separately from their price to know margin
Comply	Remit tax/VAT, absorb PCI scope, manage disputes

Whoever owns that last row is the merchant of record. With a DIY stack, that’s you. With Macropay, that’s us.

Approach 1 — Build it yourself

The do-it-yourself route is a real option, and for some teams the right one. Here’s the shape of what you’re signing up for.

┌─────────────┐    ┌──────────────┐    ┌─────────────┐
│  Your App    │───>│  LLM API     │───>│  Response    │
│  (prompt)    │    │  (OpenAI,    │    │  (tokens,    │
│              │    │   Anthropic) │    │   usage)     │
└─────────────┘    └──────────────┘    └──────┬──────┘
                                              │
                                    ┌─────────▼─────────┐
                                    │  Usage Database     │
                                    │  (customer_id,      │
                                    │   tokens, model,    │
                                    │   cost, timestamp)  │
                                    └─────────┬──────────┘
                                              │
                                    ┌─────────▼──────────┐
                                    │  Billing Engine      │
                                    │  (aggregate, price,  │
                                    │   invoice, collect)   │
                                    └──────────────────────┘

To stand this up you’ll write and then maintain:

A usage table keyed by customer, with tokens, model, and cost on every event
An aggregation layer that computes usage per customer per billing period
A pricing engine that applies your markup and produces a charge
A payment integration to actually collect — plus retries and dunning
A customer-facing view of usage and estimated charges

Budget roughly 4–8 weeks to a first version, then ongoing work every time a provider changes pricing or response shape.

Three traps teams hit

Provider schemas don't agree

OpenAI returns usage.prompt_tokens and usage.completion_tokens. Anthropic returns usage.input_tokens and usage.output_tokens. Google returns usageMetadata.promptTokenCount. Support more than one model and you’ve built a normalization layer whether you wanted to or not.

Disputes demand an audit trail

When a customer challenges a charge, “trust me, you used 4.2M tokens” doesn’t hold up. You need per-request logs you can point to — not just monthly aggregates — which means storing and querying detail at scale.

Cost and price are two different ledgers

What you paid the model provider and what you charge the customer are separate numbers. Margin lives in the gap, and you can’t see it without tracking both — a second accounting layer most teams discover late.

Approach 2 — Run it on Macropay

Macropay is usage-based billing infrastructure built for AI products. You instrument your LLM calls with the @macropayments/ingestion SDK; we handle metering, aggregation, invoicing, collection, tax, and the customer portal.

Step 1 — Install the SDK

npm install @macropayments/ingestion ai @ai-sdk/openai

pnpm add @macropayments/ingestion ai @ai-sdk/openai

pip install macropay

Step 2 — Wrap your model with the LLM ingestion strategy

The strategy intercepts each call and records token counts, model, and your cost automatically — attributed to the customer who made the request. TypeScript (Vercel AI SDK)

import { Ingestion } from "@macropayments/ingestion";
import { LLMStrategy } from "@macropayments/ingestion/strategies/LLM";
import { generateText } from "ai";
import { openai } from "@ai-sdk/openai";

// One ingestion pipeline, reused across requests
const search = Ingestion({
  accessToken: process.env.MACROPAY_ACCESS_TOKEN,
})
  .strategy(new LLMStrategy(openai("gpt-4o")))
  .cost((ctx) => ({
    amount: ctx.metadata.totalTokens * 0.25, // your cost, in cents
    currency: "usd",
  }))
  .ingest("research-query");

export async function POST(req: Request) {
  const { question, customerId } = await req.json();

  // A wrapped model that meters every call to this customer
  const model = search.client({ customerId });

  const { text } = await generateText({
    model,
    system: "You answer questions with cited sources.",
    prompt: question,
  });

  return Response.json({ text });
}

Python (PydanticAI)

import os
from macropay.ingestion import Ingestion
from macropay.ingestion.strategies import PydanticAIStrategy
from pydantic_ai import Agent

ingestion = Ingestion(os.getenv("MACROPAY_ACCESS_TOKEN"))
strategy = ingestion.strategy(PydanticAIStrategy, "research-query")

agent = Agent("gpt-4.1-nano")

result = agent.run_sync("Summarize the latest filing for ACME Corp...")
strategy.ingest("customer_123", result)

Every call now lands in Macropay with:

Input, output, and cached token counts
Model name and provider
Your cost for the request
The customer it belongs to

Prefer to keep your code untouched? Point your OpenAI client at the Macropay AI proxy — an OpenAI-compatible endpoint at POST /ai/v1/chat/completions. It forwards the request and captures token cost on the way through, no SDK wiring required.

Step 3 — Create a meter

A meter turns raw events into a billable quantity. For token billing, sum totalTokens from your ingestion events.

Open Usage Based Billing in the dashboard

Go to the Meters section and choose Create Meter.

Configure the meter

Name: LLM Tokens
Filter: events named research-query
Aggregation: sum of metadata.totalTokens

Attach a metered price to your product

Add a metered price to the product — for example, $0.01 per 1,000 tokens.

Step 4 — Customers watch it update live

Usage and estimated charges appear in the self-serve customer portal and refresh as events arrive — so the invoice is never a surprise.

Setting your margin

Capturing usage is the easy half. Choosing what to charge is where products win or lose. Three patterns cover most AI businesses.

Flat markup

Add a fixed percentage on top of your model cost. Easy to reason about, easy to explain — but your margin moves whenever the provider’s pricing does.

Your cost (per 1M tokens)	Markup	Customer price	Your margin
$2.50 (GPT-4o input)	100%	$5.00	$2.50
$10.00 (GPT-4o output)	100%	$20.00	$10.00
$0.15 (GPT-4o-mini input)	200%	$0.45	$0.30

Volume tiers

Lower the per-token rate as usage climbs, which rewards your biggest customers for consolidating spend with you. Configure the tiers directly on the metered price — no custom code.

Tier	Tokens / month	Price per 1K
Starter	0 – 100K	$0.020
Growth	100K – 1M	$0.015
Scale	1M – 10M	$0.010
Enterprise	10M+	Custom

Prepaid credits

Sell credits up front and burn them down per request. You get predictable revenue; the customer gets a price that’s easy to budget against. Grant a credit allotment with each tier using the meter credits benefit.

1 credit = 1,000 tokens
Starter:    10,000 credits / month  ($29/mo)
Pro:       100,000 credits / month  ($199/mo)
Enterprise: unlimited               ($999/mo)

Beyond tokens: billing agents on outcomes

Tokens are the obvious meter, but they measure your cost, not the value a customer receives. If you ship autonomous agents, Macropay lets you bill on what the agent did or achieved — not just what it consumed.

Activity and outcome signals. Send POST /v1/signals to record agent activity (a research run, a ticket resolved) or an outcome (a meeting booked, a refund prevented), attributed to the agent that produced it.
Value receipts. Certify ROI — time saved, revenue generated, cost avoided — so a customer sees the value behind the invoice, not just the line item.
Agentic margin. Compare billed revenue against the AI cost (COGS) behind each agent, so you know which agents actually earn their keep.

This is the difference between charging for compute and charging for results — and outcome-based pricing is where agent products command the strongest margins.

Worked example: an AI writing tool

Here’s the whole thing end to end for a tool that drafts blog posts, emails, and marketing copy. The model: a monthly subscription with included credits and metered overage. Plans

Plan	Price	Included	Overage	Seats
Writer	$29/mo	50K tokens	$0.020 / 1K	1
Team	$99/mo	250K tokens	$0.015 / 1K	5
Agency	$299/mo	1M tokens	$0.010 / 1K	Unlimited

Implementation

import { Ingestion } from "@macropayments/ingestion";
import { LLMStrategy } from "@macropayments/ingestion/strategies/LLM";
import { generateText } from "ai";
import { openai } from "@ai-sdk/openai";

const writer = Ingestion({
  accessToken: process.env.MACROPAY_ACCESS_TOKEN,
})
  .strategy(new LLMStrategy(openai("gpt-4o")))
  .cost((ctx) => ({
    // Track real cost so margin stays visible
    amount: Math.ceil(
      ctx.metadata.inputTokens * 0.00025 +
      ctx.metadata.outputTokens * 0.001
    ),
    currency: "usd",
  }))
  .ingest("writing-assistant");

export async function generateDraft(
  customerId: string,
  type: "blog" | "email" | "copy",
  brief: string,
) {
  const model = writer.client({ customerId });

  const systemPrompts = {
    blog: "You are an expert blog writer...",
    email: "You are a professional email copywriter...",
    copy: "You are a conversion-focused marketing writer...",
  };

  const { text } = await generateText({
    model,
    system: systemPrompts[type],
    prompt: brief,
  });

  return text;
}

That’s the entire billing integration. From there, Macropay:

Meters tokens per customer
Draws down the included credits on their tier
Bills overage at the right tier rate
Issues the invoice each period and collects the payment
Remits sales tax and VAT in every market you sell to
Surfaces live usage in the customer portal

Know your margins in real time

Pricing isn’t set-and-forget — upstream model costs drift, and a power user who was profitable in January may not be in June. Cost Insights puts your upstream LLM spend next to customer revenue so you can answer:

Gross margin per customer — are your heaviest users still in the black?
Cost trend — is a provider price change quietly eating your margin?
Model efficiency — which model gives the best margin for this workload?

Start here

Create your account

Usage-based billing

Events, meters, and metered pricing, explained.

LLM ingestion strategy

SDK reference for capturing token usage and cost.

Cost Insights

Track upstream costs and watch margins live.

​What “billing for inference” actually requires

​Approach 1 — Build it yourself

​Three traps teams hit

​Approach 2 — Run it on Macropay

​Step 1 — Install the SDK

​Step 2 — Wrap your model with the LLM ingestion strategy

​Step 3 — Create a meter

​Step 4 — Customers watch it update live

​Setting your margin

​Flat markup

​Volume tiers

​Prepaid credits

​Beyond tokens: billing agents on outcomes

​Worked example: an AI writing tool

​Know your margins in real time

​Start here

Create your account

Usage-based billing

LLM ingestion strategy

Cost Insights

What “billing for inference” actually requires

Approach 1 — Build it yourself

Three traps teams hit

Approach 2 — Run it on Macropay

Step 1 — Install the SDK

Step 2 — Wrap your model with the LLM ingestion strategy

Step 3 — Create a meter

Step 4 — Customers watch it update live

Setting your margin

Flat markup

Volume tiers

Prepaid credits

Beyond tokens: billing agents on outcomes

Worked example: an AI writing tool

Know your margins in real time

Start here