The AI Billing Problem

You have built an AI product. Users send prompts, your backend calls an LLM, and you return results. Now you need to charge for it. This sounds simple until you confront the reality:
  • Token counts vary wildly between requests (a simple question might use 200 tokens; a document analysis might use 50,000)
  • Costs differ by model (GPT-4o vs Claude vs Llama 3 have very different per-token pricing)
  • Cached tokens cost less than fresh tokens, and you need to track both
  • Customers expect transparency about what they are being charged for
  • You need margins but pricing too high loses customers; pricing too low loses money
This guide walks through every approach, from rolling your own metering to using a managed solution like Macropay.

Approach 1: DIY Token Metering

The brute-force approach is to build your own usage tracking system.

What You Need to Build

┌─────────────┐    ┌──────────────┐    ┌─────────────┐
│  Your App    │───>│  LLM API     │───>│  Response    │
│  (prompt)    │    │  (OpenAI,    │    │  (tokens,    │
│              │    │   Anthropic) │    │   usage)     │
└─────────────┘    └──────────────┘    └──────┬──────┘

                                    ┌─────────▼─────────┐
                                    │  Usage Database     │
                                    │  (customer_id,      │
                                    │   tokens, model,    │
                                    │   cost, timestamp)  │
                                    └─────────┬──────────┘

                                    ┌─────────▼──────────┐
                                    │  Billing Engine      │
                                    │  (aggregate, price,  │
                                    │   invoice, collect)   │
                                    └──────────────────────┘
You will need:
  1. A database table to store every usage event (customer, tokens, model, cost)
  2. An aggregation layer to compute per-customer usage per billing period
  3. A pricing engine to apply your margin and compute the charge
  4. Integration with a payment processor to actually collect payment
  5. A customer-facing dashboard showing usage and estimated charges
This is 4-8 weeks of engineering work, and it requires ongoing maintenance as LLM providers change their pricing and response formats.

The DIY Trap

Most teams that go this route discover three painful realities:
  1. Token counts from different providers use different schemas. OpenAI returns usage.prompt_tokens and usage.completion_tokens. Anthropic returns usage.input_tokens and usage.output_tokens. Google returns usageMetadata.promptTokenCount. You end up building a normalization layer.
  2. Billing disputes require audit trails. When a customer questions a charge, you need to show them exactly which requests generated which costs. This means storing detailed logs, not just aggregates.
  3. Cost tracking is separate from billing. You need to know your own costs (what you paid the LLM provider) separately from what you charge customers. This requires a second layer of accounting.

Approach 2: Managed Usage-Based Billing with Macropay

Macropay provides a complete usage-based billing infrastructure purpose-built for AI applications. Instead of building the entire stack yourself, you integrate the @macropay/ingestion SDK and let Macropay handle metering, aggregation, and billing.

Step 1: Install the SDK

npm install @macropay/ingestion ai @ai-sdk/openai

Step 2: Set Up the LLM Ingestion Strategy

The SDK wraps your LLM calls and automatically captures token usage, model information, and cost data. TypeScript (Vercel AI SDK)
import { Ingestion } from "@macropay/ingestion";
import { LLMStrategy } from "@macropay/ingestion/strategies/LLM";
import { generateText } from "ai";
import { openai } from "@ai-sdk/openai";

// Configure ingestion with your Macropay access token
const llmIngestion = Ingestion({
  accessToken: process.env.MACROPAY_ACCESS_TOKEN,
})
  .strategy(new LLMStrategy(openai("gpt-4o")))
  .cost((ctx) => ({
    amount: ctx.metadata.totalTokens * 0.25, // Your cost in cents
    currency: "usd",
  }))
  .ingest("llm-usage");

export async function POST(req: Request) {
  const { prompt, customerId } = await req.json();

  // Get a wrapped model that automatically tracks usage
  const model = llmIngestion.client({
    customerId,
  });

  const { text } = await generateText({
    model,
    system: "You are a helpful assistant.",
    prompt,
  });

  return Response.json({ text });
}
Python (PydanticAI)
import os
from macropay.ingestion import Ingestion
from macropay.ingestion.strategies import PydanticAIStrategy
from pydantic_ai import Agent

ingestion = Ingestion(os.getenv("MACROPAY_ACCESS_TOKEN"))
strategy = ingestion.strategy(PydanticAIStrategy, "llm-usage")

agent = Agent("gpt-4.1-nano")

result = agent.run_sync("Summarize this document...")
strategy.ingest("customer_123", result)
Every LLM call is now automatically tracked with:
  • Input and output token counts
  • Cached token counts
  • Model name and provider
  • Your cost per request
  • Customer attribution

Step 3: Create a Meter in the Macropay Dashboard

Meters aggregate your raw usage events into billable quantities. For AI billing, you typically create a meter that sums totalTokens from your ingestion events.
1

Go to Usage Based Billing in your dashboard

Navigate to the Meters section and click “Create Meter.”
2

Configure the meter

  • Name: “LLM Tokens”
  • Filter: Events named llm-usage
  • Aggregation: Sum of metadata.totalTokens
3

Attach a metered price to your product

Add a metered price to your subscription product. For example: $0.01 per 1,000 tokens.

Step 4: Customers See Usage in Real Time

Customers can view their current usage and estimated charges in the Customer Portal, updated in real time as events are ingested.

Margin Management Strategies

Knowing what to charge is as important as knowing how to charge. Here are the most common margin strategies for AI products.

Flat Percentage Markup

The simplest approach: add a fixed percentage to your LLM costs.
Your Cost (per 1M tokens)MarkupCustomer PriceYour Margin
$2.50 (GPT-4o input)100%$5.00$2.50
$10.00 (GPT-4o output)100%$20.00$10.00
$0.15 (GPT-4o-mini input)200%$0.45$0.30
Pros: Simple to implement, easy for customers to understand. Cons: Your margin is tied directly to upstream pricing changes.

Tiered Pricing

Charge less per token at higher volumes to incentivize usage.
TierTokens/MonthPrice per 1K Tokens
Starter0 - 100K$0.02
Growth100K - 1M$0.015
Scale1M - 10M$0.01
Enterprise10M+Custom
With Macropay, you configure tiered pricing directly on your metered product price. No custom billing logic required.

Credit-Based Model

Sell credits upfront and deduct as customers use tokens. This provides predictable revenue and simpler customer-facing pricing.
1 credit = 1,000 tokens
Starter: 10,000 credits/month ($29/mo)
Pro: 100,000 credits/month ($199/mo)
Enterprise: Unlimited ($999/mo)
Macropay supports this through the Meter Credits benefit, which lets you grant a fixed number of meter credits with each subscription tier.

Real-World Example: Building an AI Writing Tool

Let us put it all together with a concrete example. Product: An AI writing assistant that helps users draft blog posts, emails, and marketing copy. Pricing model: Monthly subscription with included credits plus overage billing.

Product Setup in Macropay

Plan 1: Writer ($29/mo)
  • Includes 50,000 tokens/month (Meter Credits benefit)
  • Overage: $0.02 per 1,000 tokens
Plan 2: Team ($99/mo)
  • Includes 250,000 tokens/month
  • Overage: $0.015 per 1,000 tokens
  • 5 team seats included
Plan 3: Agency ($299/mo)
  • Includes 1,000,000 tokens/month
  • Overage: $0.01 per 1,000 tokens
  • Unlimited team seats

Implementation

import { Ingestion } from "@macropay/ingestion";
import { LLMStrategy } from "@macropay/ingestion/strategies/LLM";
import { generateText } from "ai";
import { openai } from "@ai-sdk/openai";

const ingestion = Ingestion({
  accessToken: process.env.MACROPAY_ACCESS_TOKEN,
})
  .strategy(new LLMStrategy(openai("gpt-4o")))
  .cost((ctx) => ({
    // Track your actual cost for margin analysis
    amount: Math.ceil(
      ctx.metadata.inputTokens * 0.00025 +
      ctx.metadata.outputTokens * 0.001
    ),
    currency: "usd",
  }))
  .ingest("writing-assistant");

export async function generateDraft(
  customerId: string,
  type: "blog" | "email" | "copy",
  brief: string,
) {
  const model = ingestion.client({ customerId });

  const systemPrompts = {
    blog: "You are an expert blog writer...",
    email: "You are a professional email copywriter...",
    copy: "You are a conversion-focused marketing writer...",
  };

  const { text } = await generateText({
    model,
    system: systemPrompts[type],
    prompt: brief,
  });

  return text;
}
That is the entire billing integration. Macropay handles:
  • Tracking token usage per customer
  • Applying included credits from their subscription tier
  • Calculating overage charges at the correct tier rate
  • Generating invoices at the end of each billing period
  • Collecting payment automatically
  • Showing customers their usage in the Customer Portal

Cost Insights: Know Your Margins

Macropay’s Cost Insights feature lets you track your upstream LLM costs alongside customer revenue. This gives you real-time visibility into:
  • Gross margin per customer: Are your heaviest users still profitable?
  • Cost trends over time: Are upstream price changes eroding your margins?
  • Model efficiency: Which models deliver the best margin for your use case?

Getting Started

Create Your Account

Sign up and configure your first metered product in under 10 minutes.

Usage Based Billing Docs

Detailed documentation on events, meters, and metered pricing.

LLM Ingestion Strategy

SDK reference for the LLM ingestion strategy.

Cost Insights

Track your upstream costs and monitor margins in real time.