FinOps8 min readApril 22, 2026

How to Track LLM Costs by Feature, Team, and Customer

Your AI bill is not one number. It is hundreds of hidden decisions across features, teams, users, and models. Here is how to tag requests so you can see what is actually driving spend.

The problem: your AI bill tells you almost nothing

Most teams only see one number at the end of the month. OpenAI, Anthropic, or Google sends a bill, someone drops it in a finance spreadsheet, and engineering has to guess what happened.

That is not enough once AI becomes part of the product. You need to know which feature is expensive, which customer is driving usage, which team shipped the spike, and which model is getting used more than expected.

Without that context, every cost conversation turns into vibes. Support says it is product. Product says it is search. Search says it is the new onboarding flow. Nobody actually knows.

Tag every request at the edge

The fix is simple: attach metadata to each LLM request before it leaves your app. At minimum, tag the feature, team, environment, and customer or workspace id.

For example, a chat support request might include feature=support_chat, team=support, environment=production, and customer_id=acme. A code assistant request might include feature=autocomplete, team=devtools, and plan=enterprise.

Prismo reads those tags at the proxy layer, logs token usage and cost, then rolls everything up in the dashboard. You get spend by feature, by team, by customer, by model, and by environment without building your own analytics pipeline.

The tags that actually matter

Feature is the most important tag. It tells you whether spend is coming from support chat, document search, code generation, onboarding, summarization, or some internal tool nobody remembers shipping.

Team is the second most useful. It lets you do chargeback or at least show each team what their AI usage costs. This changes behavior fast because people stop treating the model API like a free utility.

Customer or workspace id matters if you sell B2B. One large customer can quietly consume more LLM spend than their contract supports. Without customer-level cost tracking, you may not notice until margins get ugly.

Set budgets before you need them

A budget is not just a finance number. It is a runtime control. If a feature hits 80% of its monthly budget, you should know immediately. If it hits 100%, you should decide what happens next.

Some teams block non-critical requests. Others downgrade to a cheaper model. Most start with alerts, then add enforcement once they understand the traffic pattern.

The important part is having the budget tied to the same tags you use for attribution. A global monthly cap is blunt. A budget per feature, team, or customer is actually useful.

What you learn after a week

Once requests are tagged, the patterns show up quickly. Usually one or two features account for most of the spend. Often staging or internal testing burns more than expected. Sometimes a single customer is responsible for a huge chunk of usage.

You also start seeing which prompts are bloated. Long system prompts, repeated context, unnecessary examples, and oversized retrieval payloads all become obvious when you sort by cost per request.

That is when optimization stops being theoretical. You can fix the expensive paths first instead of trying to shave tokens everywhere.

How Prismo handles it

Prismo accepts attribution headers on every request. You keep your existing SDK, point it at api.getprismo.dev/v1, and add metadata headers like X-Prismo-Feature, X-Prismo-Team, and X-Prismo-Customer.

From there, Prismo tracks spend, latency, token usage, model choice, cache behavior, and routing decisions for each tagged request. The dashboard turns that into cost breakdowns you can actually act on.

That is the difference between knowing your AI bill is high and knowing exactly what to do about it.

Start optimizing your LLM costs today

Change one line of code. See your costs drop in the first billing cycle.