Skip to main content
home / blog / Chargeback Without Spreadsheets: The 4-Field Schema That Replaced Our 200-Tag Mess
Back to blog

Chargeback Without Spreadsheets: The 4-Field Schema That Replaced Our 200-Tag Mess

Riya Mittal
Riya Mittal Engineer · Zop.Dev
· · 10 min read
Chargeback Without Spreadsheets: The 4-Field Schema That Replaced Our 200-Tag Mess

The tag taxonomy starts at 30 keys and climbs from there. Year one, every team agrees on env, team, cost-center, service. Year two, finance asks for customer, feature, data-classification, pci-scope. Year three, the security team adds compliance-tier, the platform team adds iac-tool, and an exec asks for revenue-stream so they can correlate cost with bookings. By year four the spreadsheet of “official tags” runs to 200 keys, three of which are required and the rest of which are aspirational.

The numbers from this taxonomy are wrong, and everyone knows it. Per-team chargeback accuracy hovers around 70 to 80 percent. The misattribution is consistent enough that finance has stopped pushing back: the platform team always overpays by 8 percent, the recommendation team always underpays by 6 percent, the unattributed bucket holds 12 percent of total spend. Every month the same conversation happens. Every quarter someone proposes “tag governance” again. The taxonomy keeps growing.

The structural problem isn’t engineer discipline. It’s that tags require a human to set them at resource creation time, and the taxonomy has no lifecycle. Stale tags don’t generate errors; they generate misattribution. Adding a new tag means backfilling thousands of resources. Renaming a tag is functionally impossible. The system is set up to grow but never to shrink, and the people whose costs depend on the data have no leverage to change the resource-creation behavior of the engineers writing the IaC.

The fix is to stop trying to tag perfectly and start absorbing tag variance at the cost-record stage. Four fields per cost record, populated by a lookup table that the FinOps team owns. Engineers tag what they remember; the lookup absorbs what they forget. Per-team chargeback accuracy goes from 70-80 percent to 92-96 percent over six weeks. The 200-tag spreadsheet stops being the source of truth and becomes one input among several.

This pattern composes with the chargeback vs showback work on the foundational shape of team-level accountability.

Why the 200-tag taxonomy is a structural problem, not a hygiene problem

Year one of cloud chargeback, every org’s tag taxonomy looks the same: about 30 keys, with three to six marked required. Engineers comply because the list is short. Finance gets reasonable numbers. Everyone agrees the system works.

YearTag count% of resources with all required tagsEngineer comment
13085”Yeah, tagging is fine”
28065”Why do I need to set ‘compliance-tier’ on a Lambda?“
315045”What does ‘revenue-stream’ even mean for a CI job?“
422030”I just copied the tags from the last service”

The compliance ratio drops because the marginal tag added in year two has no obvious meaning at the resource level. An engineer creating an EC2 instance for a CI runner doesn’t know what revenue-stream should be. They guess, copy from somewhere, or leave it blank. The cost system attributes the resource to whatever the guess produced, or to the unattributed bucket if blank. Either way the chargeback number for that team is wrong by some amount the team can’t measure and finance can’t audit.

The standard response is “tag governance”: automated checks that block resource creation if required tags are missing. This works for new resources but doesn’t fix the historical drift, doesn’t help with tags that are present-but-wrong, and creates friction that engineering pushes back on. The CI runner case is the typical pushback example: forcing the engineer to figure out revenue-stream for a build worker is friction with no operational value.

The deeper problem is that tags are the wrong abstraction for chargeback. Tags are properties of a resource. Chargeback is a mapping from resources to chargeable units (teams, cost centers, services). A taxonomy of 200 properties of resources is too detailed for the mapping (most tags don’t matter for chargeback) and not flexible enough for the mapping (changing the mapping requires changing tags on resources). The mapping needs to live somewhere else.

The 4 fields that actually matter for chargeback

Strip the chargeback schema down to what every consumer actually needs.

FieldCardinalityConsumerWhat it answers
cost_center30-100 per orgFinanceWhich department’s budget pays for this
service50-300 per orgEngineering leadershipWhich product surface is this resource for
env3-5 per orgOn-call, security, auditProduction / staging / dev / experiment / sandbox
owner_emailone per serviceAnyone debugging costWho do I ask about this spend

Cost_center is the only field finance cares about for the actual chargeback. Everything else is operational. Service drives the per-team breakdown engineering uses to know where to focus optimization. Env separates prod from non-prod so production overspend isn’t hidden in dev experimentation. Owner_email closes the loop on overspend questions; if a number looks weird, there’s a human to ask.

Why four and not five: every additional field doubles the maintenance work on the lookup table while adding marginal value to the chargeback report. Customer attribution sounds important but is volatile (customers churn, services get re-targeted) and high-cardinality (cost records explode). Compliance tier is real for security audits but not for chargeback; it lives in a separate system that joins on resource_id when needed. Feature flag attribution is interesting for product analytics but not for cost allocation.

Why four and not three: dropping owner_email seems tempting (cost_center owner is in HR, look it up). In practice, looking up the cost_center owner during a 2 AM cost spike is friction nobody pays. Having owner_email on the cost record means anyone reading the dashboard can email the right person without a directory lookup.

The four-field shape also gives finance the structure they actually use. The monthly chargeback rollup is SUM(cost) GROUP BY cost_center — one query, no joins. The per-team breakdown is SUM(cost) GROUP BY service filtered to the cost_center. The “what’s running in prod” question is WHERE env = 'prod'. Three queries, four fields, no spreadsheet.

The lookup table absorbs tag variance

Engineers don’t stop tagging. They keep doing what they do. The cost-record pipeline has a lookup-table step that converts whatever tag soup the resource has into the four canonical fields.

Diagram 1

The lookup table is owned by the FinOps team. It’s a yaml file checked into a git repo, reviewed via pull requests, deployed alongside the cost-record pipeline. A typical entry:

Match conditioncost_centerserviceowner_email
team:platform AND env:prodENG-PLATFORM-001platform-apiplatform-leads@example.com
team:platform AND env:stagingENG-PLATFORM-001platform-apiplatform-leads@example.com
team:rec-engENG-RECOMMENDATIONS-002rec-pipelinerec-leads@example.com
account:acct-12345 (no team tag)ENG-PLATFORM-001platform-apiplatform-leads@example.com
account:acct-67890 AND service:billingFIN-BILLING-PROD-005billing-apibilling-eng@example.com

When a tag taxonomy changes, only the lookup table changes. Engineers don’t get tag-rename PRs. The historical cost records keep their attribution because they were written through the lookup snapshot in effect at that time. The cost of changing chargeback policy collapses from “rewrite tags on N thousand resources” to “edit one yaml file.”

The lookup table is the only thing finance owns end-to-end. The tag taxonomy is owned by engineering (because tagging happens at resource creation). The cost-record pipeline is owned by data engineering. The reports are consumed by finance. Putting the policy in the middle, in a place finance owns, is what makes the chargeback numbers actually trustworthy.

Inference rules for tag absence

Tags will be missing. Half the resources in any given month don’t have all four required tags set. The lookup table needs explicit fallback rules for absence.

Field missingFallback derivationExample
team tagUse the account-to-cost-center mapaccount 12345 → cost_center=ENG-PLATFORM-001
service tagUse the most-tagged service in the account70% of resources tagged service=platform-api → default to platform-api
env tagUse account tier (prod accounts → prod, sandbox accounts → dev)account named prod-us-east → env=prod
owner_emailLook up cost_center owner from the cost_center mapcost_center=ENG-PLATFORM-001 → owner_email=platform-leads@
Everything missingRoute to “unattributed” bucket; finance reviews monthlyAny resource that survives all fallbacks

Each fallback is explicit and auditable. The cost record carries a derivation field showing which rules fired (“team-tag-missing → account-fallback → ENG-PLATFORM-001”). When finance asks “why is this charged to platform,” the answer is in the record, not in someone’s head.

The unattributed bucket is small (typically under 5 percent of monthly spend after the lookup is set up). It’s the safety valve: anything the rules can’t attribute lands here, finance reviews monthly, and either a new rule gets added (if it’s a recurring case) or the bucket is allocated proportionally (if it’s truly one-off).

The trick to keeping the unattributed bucket small is that the fallback rules cascade. A resource with no tags but in a known account gets attributed by the account rule. A resource in an unknown account but with a service tag gets attributed by the service rule. Each rule pulls some percentage of resources out of “unattributed” without requiring the engineer to add a tag. After three months of tuning, the cascading rules cover 95 percent of resources.

Migration: 6 weeks for a 200-engineer org

The migration is shorter than people expect because engineers don’t have to do anything. The work is concentrated in the FinOps team writing lookup tables and validating the new attribution.

WeekWorkDeliverable
1Inventory existing tags + their actual usageSpreadsheet of tag → resource count → uniqueness
2Write the initial lookup table from the inventoryyaml lookup table covering 80% of resources
3Run the new pipeline alongside the old; collect both attributionsSide-by-side report of old vs new chargeback per cost_center
4Tune the lookup table where the two diverged; add cascading rulesLookup table v2 + cascading inference rules
5Cut over reports to use the new attribution; old pipeline still runs for auditNew chargeback reports go live
6Retire stale tags; communicate to engineering what the source of truth is nowStale tag list deleted from the canonical taxonomy doc

The shadow-run weeks (3 and 4) are where the work converges. The two pipelines produce different numbers; the FinOps team investigates each delta over $5,000/month and adjusts the lookup table. Most deltas are explained by inference-rule edge cases (account boundaries, service rename, env-tag misuse). A few are real bugs in the old pipeline that the new attribution exposed.

Engineering involvement is minimal. One sync at week 1 (we’re collecting tag data, no action needed). One sync at week 5 (the chargeback report you see is now generated this way; here’s the doc). Engineers don’t change their tagging behavior because the lookup absorbs the variance. The friction cost on engineering is roughly two hours of meetings.

The retirement of stale tags at week 6 is optional but worth doing. The taxonomy doc gets pruned to the small set of tags that the lookup actually reads. Engineers stop seeing the 200-tag wishlist; they see the 20 tags that matter. New tag proposals go through the FinOps team because adding a tag now means adding lookup-table logic, not just adding a row to the wishlist.

Lookup-table version history for time-travel attribution

The lookup table will change. Cost centers split when teams reorganize. Services rename when products rebrand. The chargeback system needs to handle change without rewriting historical records.

Diagram 2

The cost-record pipeline writes a lookup_version field on each record. The lookup table is versioned (a git tag or a row in a snapshot table). Reports query through the version that was current when the cost record was written.

The historical Q4 2025 chargeback report keeps showing ENG-PLATFORM-001 even after the cost center splits. Comparing 2026 Q2 to 2025 Q4 produces a footnote (“ENG-PLATFORM-001 split into PLATFORM-API-009 and PLATFORM-DATA-010 in 2026-Q1”) rather than a misattribution. Finance gets honest historical comparisons; engineering gets the current cost-center structure.

The version history also lets the FinOps team experiment safely. A proposed lookup change can be shadow-run on historical data to see how it would have changed the attribution. If the change would have produced large unexpected swings, the team investigates before going live.

Accuracy goes from 70-80 percent to 92-96 percent

The improvement isn’t from cleaner tag data. The data is the same. The improvement is from absorbing tag variance at the attribution layer instead of demanding tag perfection at the resource layer.

DimensionTag-only chargeback accuracy4-field schema accuracy
Per-team rollup70-80%92-96%
Per-service breakdown60-75%88-94%
Per-env split (prod vs non-prod)80-88%95-98%
Unattributed bucket8-15% of spend2-5% of spend

Per-team accuracy improves the most because the lookup table gives every resource a deterministic team mapping, even when the tag is missing. Under tag-only chargeback, a missing team tag meant the resource sat in unattributed; under the lookup-table model, the account-fallback rule covers it. The team that owns the account gets charged, even without the tag.

Per-service breakdown improves the second-most because services are the most volatile dimension. New services get created, old services get renamed, services get split or merged. The tag taxonomy can’t keep up; the lookup table absorbs the changes in one yaml edit.

Per-env split is the easiest to fix because env is high-signal at the account level. Most orgs have separate AWS accounts per env, so even when the env tag is missing on a resource, the account tells the lookup what env it’s in.

The unattributed bucket dropping from 8-15 percent to 2-5 percent is what makes finance trust the numbers again. A monthly chargeback report where 12 percent of the spend is “we don’t know” is hard to act on. A report where 3 percent is unattributed (with a list of specific resources finance can investigate) is operational.

The 200-tag taxonomy isn’t the problem. Demanding that 200 tags be set correctly on every resource is the problem. Move the policy to a lookup table the FinOps team owns, accept that tags are noisy inputs, and the chargeback numbers stop being a quarterly argument.

Tagged
Riya Mittal

Riya Mittal

Engineer · Zop.Dev

Riya works on the autonomous remediation engine at Zop.Dev. Before that she was a security engineer at a SaaS company that learned the hard way what 14 days of exposure looks like. She writes about cloud security, automation, and the trade-off between speed and safety.

Stop watching the waste.
Start cutting it.

See. Find. Fix. Automatic.

Connect your first cloud account in under 5 minutes. See your first remediation in under 7. No credit card required.

CDCR connect detect classify remediate
full audit every action traceable
read-only default access
Multi-cloud automation· Production-ready in 30 min· SOC 2 · ISO 27001 · zero-trust· 30% average cloud cost cut· 4 platforms · 1 console· Multi-cloud automation· Production-ready in 30 min· SOC 2 · ISO 27001 · zero-trust· 30% average cloud cost cut· 4 platforms · 1 console·