CDCR Whitepaper · Continuous Detection & Remediation

01 · executive summary

CI/CD did this for code. CDCR does it for the cloud.

CI/CD took software deployment from "click buttons in Jenkins and hope" to a continuous, automated loop that runs on every commit. Drift between intent and reality used to be the norm. Now it is the exception.

Cloud cost management is still in the pre-CI/CD era. Teams detect waste, then click through cloud consoles to fix it. The fixes hold for a week. Then the drift comes back. The FinOps Foundation's 2025 State of FinOps survey puts the waste rate at 27 to 32 percent. Flexera's 2024 State of the Cloud report agrees. The figure has not moved in three years.

This brief introduces CDCR: Continuous Detection, Continuous Remediation. It is the category of platform that runs the same kind of automated loop CI/CD pipelines run, but for cloud cost state. CDCR detects cost drift continuously, classifies findings by dollar impact and severity, remediates the safe classes automatically, and verifies every action with a full audit trail.

CDCR is the action layer inside FinOps. It does not replace the Inform layer (cost dashboards) or the Optimize layer (recommendations). It executes them.

The straightforward read

Cloud teams that still run quarterly cost reviews against a cloud that drifts daily are operating on the wrong cadence. CDCR is the cadence correction.

02 · the state of cloud cost drift

$830B in spend. 27–32% of it is drift.

Worldwide public cloud spend will cross $830 billion in 2026 (Gartner, IDC). Wasted spend is 27 to 32 percent of that (Flexera, FinOps Foundation). The waste rate has been stable for three years despite measurable growth in FinOps tools, certified practitioners, and dedicated teams.

Cost drift is the more useful frame than cost waste. Cost waste is a snapshot. Drift is the continuous motion that produces the waste. Resources get oversized. Tags fall off. Schedules expire. Idle resources accumulate. Anomalies happen. Each one is small. The cumulative effect is the 27 to 32 percent.

A representative cost drift inventory from a mid-sized cloud estate:

Drift class	Typical monthly volume	Dollar impact
Newly untagged resources	800–1,200	Low individually, blocks chargeback
Expired schedule overrides	40–80	Medium
Idle non-production resources	200–400	High
Cost anomalies (WoW > 25%)	15–30	High
Oversized instances	60–150	High
Orphaned resources (vols, snaps, EIPs)	100–300	Medium
Storage class drift (gp2, old gp3, unused IOPS)	50–200	Medium

Every line in this table is detectable today. The persistence of the drift across all of them is the same: detection without continuous remediation produces reports, not lower bills.

03 · why FinOps stalls at Operate

Inform is solved. Optimize is solved. Operate isn’t.

The FinOps Foundation framework has three phases: Inform, Optimize, Operate.

Inform

Visibility, allocation, chargeback. The major platforms (CloudHealth, Apptio Cloudability, CloudZero, Vantage, Anodot) ship reliable dashboards. Phase 1 is solved.

Optimize

Recommendations. The same platforms produce accurate right-sizing, commitment, and idle-resource recommendations. Native cloud advisors (AWS Trusted Advisor, GCP Active Assist, Azure Advisor) supplement them. Phase 2 is largely solved.

Operate

Execute change. The 2025 FinOps Foundation State of FinOps survey ranks "getting engineers to take action on recommendations" as the top reported challenge for the fourth consecutive year. The reasons are structural, not cultural.

A mid-sized cloud estate produces 800 to 2,000 recommendations per month. Manual action at that scale requires either a dedicated remediation team or a heavy ticketing process. Most organizations action 5 to 15 percent of recommendations. The rest age out.

Cost dashboards generate recommendations they do not execute. Executing carries blast radius they do not own. The Operate phase has historically depended on engineering teams to do the executing. That dependency is the bottleneck.

The fix

CDCR removes the dependency by running the loop continuously, with policy-bound automation for the safe classes and guided execution for the rest.

04 · cdcr defined

A continuous loop for the Operate phase.

Continuous Detection, Continuous Remediation (CDCR) is the category of platform that runs an automated detect-classify-remediate-verify loop across cloud cost state. The loop runs continuously, not on a scan schedule. Every action it takes is logged. Every action it takes can be rolled back.

The clearest analogy is CI/CD. Before continuous integration, teams ran tests manually and deployed weekly. Most software production was drift management: code in production diverged from main, fixes regressed, releases broke. CI/CD did not change the discipline of testing or deploying. It changed the execution model — from manual-on-schedule to automated-on-event.

CDCR does the same thing for the cloud cost Operate phase. It does not change the FinOps framework. It changes the execution model. The loop runs on events, not on schedule.

The working definition

A CDCR platform must run all four functions of the loop:

Detect cost drift continuously across the cloud estate
Classify findings by severity and projected dollar impact
Remediate safe classes automatically and guide humans through the rest
Verify every action in an audit trail that meets compliance evidence requirements

Platforms that do (1) only are cost dashboards. Platforms that do (1), (2), and partial (3) without (4) are recommendation engines. CDCR requires all four.

05 · the four functions

Detect · Classify · Remediate · Verify.

┌────────────────────────────┐    ┌────────────────────────────┐
│  ◼  DETECT                 │    │  ◼  CLASSIFY               │
│  Cost drift                │────│  Severity + dollar         │
│  K8s · schedules · tags    │    │  450+ audit rules          │
└────────────────────────────┘    └────────────────────────────┘
              │                                │
              │            ┌──────────────┐    │
              └────────────┤     LOOP     ├────┘
                           │    ACTIVE    │
              ┌────────────┤              ├────┐
              │            └──────────────┘    │
              │                                │
┌──────────────────────────────┐    ┌────────────────────────────┐
│  ◼  REMEDIATE              │    │  ◼  VERIFY                 │
│  Certified or guided       │────│  Audit log                 │
│  scoped · logged           │    │  actor · timestamp · delta │
└────────────────────────────┘    └────────────────────────────┘

Detect, continuously

Cost drift detection runs at minute-level granularity. The signals collected go wider than the cloud bill itself.

Anomaly detection runs across five dimensions — org, cloud account, resource group, resource, and team. A spike that hides at the org level often shows up at the resource group level. The five-dimension scan catches the spikes that single-dimension anomaly tools miss.

Other detection signals: drift on Kubernetes clusters, expired schedule overrides, newly untagged resources, idle resource accumulation, storage class regression, and compliance gaps that quietly reopened after a prior fix.

Sample rate matters. A platform that polls every 15 minutes will miss anomalies that a platform polling every 60 seconds catches.

Classify, by impact

Classification sets the work order. Drift on a production resource ranks above an idle dev box. A $2,000/month anomaly ranks above a $30/month one. The queue reflects the actual stakes, not just the rule count. Without classification, a CDCR platform reduces to a noisy alert system.

Remediate, continuously

Remediation is two-tier.

Auto-remediate

For the safe classes: tag application from accepted Tagger predictions, schedule enforcement, idle resources stopped, scale-to-zero on certified workloads, pause on certified service-tier targets. These run without human approval because the actions are reversible, the blast radius is bounded, and the policy is explicit.

Guided remediation

For the rest. Each guided action carries a confidence score (how sure the platform is that this is the right fix) and a complexity score (how risky the action is to execute). The human approves; the platform executes.

Production writes are admin-gated, scoped to the resources covered by the policy, and fully logged. Customer databases are explicitly excluded from mutation. This is a design choice, not a limitation. The platform refuses to touch state that should never be auto-touched.

The architectural choice

Two-tier remediation is what separates a serious CDCR platform from a recommendation engine.

Verify, every action

Every action the platform takes lands in the audit trail. The record includes actor (platform or human), timestamp, the policy that triggered the action, the resources affected, and the dollar delta where applicable.

Quarterly reviews read measurable outcomes, not forecasts. "We saved $312K this quarter on these 1,400 actions" is a different conversation from "we estimate we could save $400K if we acted on these recommendations."

Verification is also the layer that satisfies SOC 2 Type II and ISO 27001 evidence requirements. Without it, CDCR is unusable in regulated industries.

06 · anatomy of three CDCR actions

Watch one finding move through the loop.

The clearest way to understand the loop is to watch one finding move through it. Three composite examples below.

Action 1 · Idle development cluster (auto-remediated)

Stage	What happens
Detect	A development EKS cluster shows below 5% average CPU utilization for 14 consecutive days. Detected on day 14.
Classify	Projected monthly impact: $1,840. Severity: low (non-production resource). Tagged `env:dev`. No `keep-on` override.
Remediate	Policy match: scheduled shutdown for dev clusters with sustained low utilization. Schedule applied: weekdays 8 PM to 8 AM local. No approval required.
Verify	Action logged with policy ID, timestamp, resources affected. After 30 days: $1,120 actual savings vs $1,840 projected (variance from teams toggling the override).

Action 2 · Oversized production RDS (guided remediation)

Stage	What happens
Detect	An RDS `db.r6g.4xlarge` instance shows sustained average CPU below 18% over 30 days. Memory utilization 32%.
Classify	Projected monthly impact: $1,820. Severity: medium (production, customer-facing service). Confidence: 94%. Complexity: medium (requires maintenance window).
Remediate	Guided action surfaced to the on-call DBA. Recommendation: downsize to `db.r6g.2xlarge`. Maintenance window suggested. DBA approves; platform executes during the next pre-approved window.
Verify	Action logged. Performance metrics monitored for 14 days post-change. P95 query latency held within baseline. $1,820/month savings confirmed.

Action 3 · Customer database, oversized (rejected)

Stage	What happens
Detect	A customer-managed RDS instance shows utilization patterns consistent with right-size opportunity.
Classify	Projected monthly impact: $4,200. Severity: high (customer database).
Remediate	Platform refuses. Customer databases are excluded from mutation by design. The finding is surfaced as an advisory to the customer’s DBA team via dashboard and weekly digest. No write is attempted.
Verify	Advisory logged. No execution. Audit log shows the finding was raised and excluded per policy.

The third example is the most important

A CDCR platform that does not refuse to act on certain resource classes is not safe to run in production. The refusal is part of the design.

07 · vendor landscape

The seven questions a buyer should ask.

CDCR is an early category. Most vendors making the claim do not yet pass the four-function test in section 5. A buyer evaluating the space should ask:

Does the platform poll continuously, or scan on schedule?
How are findings classified by impact, and on what rule base?
Are remediation actions executed by the platform, or generated as recommendations for a human to apply?
What is the maximum blast radius the platform accepts under policy without human approval?
What resource classes does the platform refuse to touch, by design?
What does the audit log show? Does it meet SOC 2 Type II evidence?
Does coverage extend across AWS, GCP, and Azure, or only one cloud?

A read of the current market (May 2026, expect movement)

Vendor	Type	CDCR loop coverage
CloudHealth (VMware)	Inform layer	Detect + partial Classify only
Apptio Cloudability	Inform layer	Detect + partial Classify only
CloudZero	Inform (unit economics)	Detect + Classify, no Remediate
Vantage	Inform (mid-market)	Detect only
Spot.io (NetApp)	Workload automation	Full loop, EC2 Spot and EKS only
Cast.ai	Workload automation	Full loop, Kubernetes only
Zesty	Workload automation	Full loop, EC2 commitments and EBS only
Densify	Right-sizing	Detect + Classify, limited Remediate
ZopNight	CDCR	Full loop across the cloud cost estate

08 · the maturity model

Crawl, Walk, Run, Fly.

The FinOps Foundation’s Crawl-Walk-Run model maps onto CDCR adoption.

Crawl · 18% of organizations

Cost reviews are quarterly. Allocation is partial. No automation.

Walk · 54% of organizations

A cost dashboard is in place. Allocation and chargeback work for most spend. The team acts on top recommendations each month. No continuous remediation. This is the industry’s largest stuck point.

Run · 22% of organizations

Continuous remediation is in place for predictable drift classes (non-production scheduling, snapshot lifecycle, storage class migration). Right-sizing and orphan cleanup are automated under policy.

Fly · 6% of organizations

The cost estate self-optimizes against explicit policy. Humans approve only high-blast-radius decisions.

The single largest unlock in FinOps maturity

Moving from Walk to Run. That move is gated on Operate-layer tooling. CDCR is the tooling.

09 · customer patterns & 90-day adoption

Three estates. Three outcomes. One pattern.

Pattern A · Mid-stage SaaS, $180K/month AWS

A Series B SaaS company with eight engineers and a Notion-based FinOps practice. Existing recommendations were reviewed monthly and rarely executed. CDCR adoption began with non-production scheduling, unattached EBS cleanup, gp2 to gp3 migration, and RDS right-sizing.

After 90 days the bill was $124K/month, a 31% reduction. Engineering hours on cost work dropped from roughly 6/week to under 1.

Pattern B · Late-stage consumer, $2.4M/month across AWS and GCP

A Series D consumer company with a four-person FinOps team and custom Looker dashboards. Despite mature Inform-layer tooling, the team estimated $600K/month of recoverable waste. CDCR adoption added Event Readiness automation for product launches, autoscaling policies, snapshot lifecycle rules, and Auto Tagging for cross-team chargeback.

After six months the bill was $1.78M/month, a 26% reduction.

Pattern C · Regulated financial services, $5.1M/month

A US financial services firm with strict change management. Every cost optimization required a CAB ticket. The team executed roughly 3 cost actions per quarter. CDCR adoption inverted the CAB process: the CAB pre-approved the policy, and the platform executed within it with full audit logging.

After 12 months the bill was $3.9M/month, a 23% reduction. Action velocity moved from 3/quarter to ~400/month.

The 90-day adoption path

Window	What happens
Weeks 1–2	Connect to cloud accounts in read-only mode. Auto-discovery completes inventory. Reconcile against the existing cost dashboard. Output: a baseline of recoverable drift by category and account.
Weeks 3–4	Turn on non-production scheduling. Highest-savings, lowest-risk first move. First-month bill reduction is typically 12–18%.
Weeks 5–8	Move to right-sizing and orphan cleanup. Approve in batch. Implement Auto Tagging to fix the ownership gap. Cumulative reduction is typically 22–28%.
Weeks 9–12	Convert from approval-required to policy-driven for the action classes that have shown reliable safety. Define guardrails for autonomous classes versus approval-required ones. The loop now runs continuously.

Cross-customer pattern: 23 to 31% bill reduction within 6 to 12 months. Variance correlates with how aggressively the team trusts policy-driven automation.

10 · where the category goes next

Four predictions for the next 24 months.

CDCR is the working name for what the next decade of cloud cost operations will run on. A few predictions for the next 24 months.

1 · The Inform-layer vendors will consolidate

Cost dashboards will buy or build CDCR capability, or they will lose ground to platforms that already run the full loop. At least two acquisitions in the category are probable by mid-2027.

2 · The category will deepen in cost before it widens

Expect richer policy languages, better confidence and complexity scoring on guided actions, and more resource classes safely covered by auto-remediation. The bar for "what the platform will run without human approval" will rise as track records accumulate.

3 · The FinOps Foundation framework will evolve

The Operate phase, historically the least defined of the three, will get explicit tooling capability requirements. The next FinOps Framework revision (expected late 2026) is likely to include automation maturity as a first-class evaluation dimension.

4 · The CI/CD analogy holds up here too

In 2010, "we deploy weekly via SSH scripts" was acceptable. By 2018, it was a competitive liability. The same arc applies to cloud cost. Teams that still run quarterly reviews against a cloud that drifts daily are operating on the wrong cadence.

CDCR is the cadence correction.

That is the entire premise. The rest of this brief is implementation detail.

about · zopnight

About ZopNight.

ZopNight is a cloud cost optimization platform that runs the CDCR loop across AWS, GCP, and Azure. It covers schedules, idle resources, right-sizing, cost anomaly detection across five dimensions (org, cloud account, resource group, resource, team), auto-tagging, snapshot lifecycle, and Kubernetes cost drift.

450+ audit rules. Two-tier remediation (auto-remediate the safe classes, guided remediation for the rest). Full audit trail with actor, timestamp, and dollar delta. Customer databases are excluded from mutation by design.

For evaluation hello@zopnight.com

For a working demo zopnight.com/demo →

Chapter 01 · Executive summary

CI/CD did this for code. CDCR does it for the cloud.

CI/CD took software deployment from "click buttons in Jenkins and hope" to a continuous, automated loop that runs on every commit. Cloud cost management is still in the pre-CI/CD era.

Teams detect waste, then click through cloud consoles to fix it. The fixes hold for a week. Then the drift comes back. The FinOps Foundation’s 2025 State of FinOps survey puts the waste rate at 27 to 32 percent. Flexera’s 2024 State of the Cloud report agrees. The figure has not moved in three years.

This brief introduces CDCR: Continuous Detection, Continuous Remediation. It is the category of platform that runs the same kind of automated loop CI/CD pipelines run, but for cloud cost state. CDCR detects cost drift continuously, classifies findings by dollar impact and severity, remediates the safe classes automatically, and verifies every action with a full audit trail.

CDCR is the action layer inside FinOps. It does not replace the Inform layer (cost dashboards) or the Optimize layer (recommendations). It executes them.

Chapter 02 · The state of cloud cost drift

$830B in spend. 27–32% of it is drift.

Worldwide public cloud spend will cross $830 billion in 2026 (Gartner, IDC). Wasted spend is 27 to 32 percent of that. The waste rate has been stable for three years.

Despite measurable growth in FinOps tools, certified practitioners, and dedicated teams, the number has not budged. The frame that’s more useful here is cost drift, not cost waste. Waste is a snapshot. Drift is the continuous motion that produces the waste.

Resources get oversized. Tags fall off. Schedules expire. Idle resources accumulate. Anomalies happen. Each one is small. The cumulative effect is the 27 to 32 percent.

27–32%median wasted spend, stable across three years

800–2,000recommendations a mid-sized estate produces monthly

5–15%of those recommendations the team actually actions

Every line in the drift inventory is detectable today. The persistence of the drift across all of them is the same: detection without continuous remediation produces reports, not lower bills.

Chapter 03 · Why FinOps stalls at Operate

Inform is solved. Optimize is solved. Operate isn’t.

The FinOps Foundation framework has three phases — Inform, Optimize, Operate. The first two are largely solved. The third is the bottleneck.

Inform handles visibility, allocation, and chargeback. The major platforms (CloudHealth, Apptio Cloudability, CloudZero, Vantage, Anodot) ship reliable dashboards. Optimize generates recommendations — accurate right-sizing, commitment, and idle-resource recs. Native advisors supplement them.

Operate is where it stops

The 2025 State of FinOps survey ranks "getting engineers to take action on recommendations" as the top reported challenge for the fourth consecutive year. The reasons are structural, not cultural.

A mid-sized cloud estate produces 800 to 2,000 recommendations per month. Manual action at that scale requires either a dedicated remediation team or a heavy ticketing process. Most organisations action 5 to 15 percent of recommendations. The rest age out.

Chapter 04 · CDCR defined

A continuous loop for the Operate phase.

Continuous Detection, Continuous Remediation (CDCR) is the category of platform that runs an automated detect-classify-remediate-verify loop across cloud cost state.

The loop runs continuously, not on a scan schedule. Every action it takes is logged. Every action it takes can be rolled back.

The clearest analogy is CI/CD. Before continuous integration, teams ran tests manually and deployed weekly. Most software production was drift management: code in production diverged from main, fixes regressed, releases broke. CI/CD did not change the discipline of testing or deploying — it changed the execution model, from manual-on-schedule to automated-on-event.

CDCR does the same thing for the cloud cost Operate phase. It does not change the FinOps framework. It changes the execution model. The loop runs on events, not on schedule.

The working definition

A CDCR platform must run all four functions of the loop:

(1) Detect cost drift continuously across the cloud estate. (2) Classify findings by severity and projected dollar impact. (3) Remediate safe classes automatically and guide humans through the rest. (4) Verify every action in an audit trail that meets compliance evidence requirements.

Platforms that do (1) only are cost dashboards. Platforms that do (1), (2), and partial (3) without (4) are recommendation engines. CDCR requires all four.

Chapter 05 · The four functions

Detect · Classify · Remediate · Verify.

Each function has a specific shape and a specific failure mode. Together they form the loop.

Detect, continuously

Cost drift detection runs at minute-level granularity. The signals collected go wider than the cloud bill itself. Anomaly detection runs across five dimensions — org, cloud account, resource group, resource, and team. A spike that hides at the org level often shows up at the resource group level.

Other signals: drift on Kubernetes clusters, expired schedule overrides, newly untagged resources, idle accumulation, storage class regression, compliance gaps that quietly reopened after a prior fix. Sample rate matters. A platform that polls every 15 minutes will miss anomalies that one polling every 60 seconds catches.

Classify, by impact

A flat list of 2,000 findings is unworkable. A classified queue is. CDCR platforms apply rule libraries that score every finding by severity and projected dollar impact. ZopNight runs 450+ audit rules across AWS, GCP, and Azure. Classification sets the work order. Drift on a production resource ranks above an idle dev box. A $2,000/month anomaly ranks above a $30/month one.

Remediate, continuously

Remediation is two-tier. Auto-remediate for the safe classes: tag application, schedule enforcement, idle resources stopped, scale-to-zero on certified workloads. Guided remediation for the rest — each action carries a confidence score and a complexity score; the human approves; the platform executes.

Verify, every action

Every action lands in the audit trail. The record includes actor (platform or human), timestamp, the policy that triggered the action, the resources affected, and the dollar delta. Quarterly reviews read measurable outcomes, not forecasts. Verification is also the layer that satisfies SOC 2 Type II and ISO 27001 evidence requirements. Without it, CDCR is unusable in regulated industries.

Chapter 06 · Anatomy of three actions

Watch one finding move through the loop.

The clearest way to understand the loop is to watch one finding move through it. Three composite examples.

Action 1 · Idle development cluster (auto-remediated)

A development EKS cluster shows below 5% average CPU utilisation for 14 consecutive days. Detected on day 14. Projected monthly impact: $1,840. Severity: low (non-production resource). Tagged env:dev. No keep-on override.

Policy match: scheduled shutdown for dev clusters with sustained low utilisation. Schedule applied: weekdays 8 PM to 8 AM local. No approval required. After 30 days: $1,120 actual savings versus $1,840 projected (variance from teams toggling the override).

Action 2 · Oversized production RDS (guided remediation)

An RDS db.r6g.4xlarge shows sustained average CPU below 18% over 30 days. Memory utilisation 32%. Projected impact: $1,820/month. Severity: medium. Confidence: 94%. Complexity: medium (requires maintenance window).

Guided action surfaced to the on-call DBA. Recommendation: downsize to db.r6g.2xlarge. Maintenance window suggested. DBA approves; platform executes during the next pre-approved window. P95 query latency held within baseline. $1,820/month savings confirmed.

Action 3 · Customer database, oversized (rejected)

A customer-managed RDS instance shows utilisation patterns consistent with a right-size opportunity. Projected impact: $4,200/month. Severity: high. Platform refuses. Customer databases are excluded from mutation by design. The finding is surfaced as an advisory via dashboard and weekly digest. No write is attempted.

Chapter 07 · Vendor landscape

The seven questions a buyer should ask.

CDCR is an early category. Most vendors making the claim do not yet pass the four-function test.

A buyer evaluating the space should ask: (1) Does the platform poll continuously, or scan on schedule? (2) How are findings classified by impact, and on what rule base? (3) Are remediation actions executed by the platform, or generated as recommendations for a human to apply? (4) What is the maximum blast radius the platform accepts under policy without human approval? (5) What resource classes does the platform refuse to touch, by design?

(6) What does the audit log show? Does it meet SOC 2 Type II evidence? (7) Does coverage extend across AWS, GCP, and Azure, or only one cloud?

A read of the current market (May 2026, expect movement): CloudHealth and Apptio Cloudability cover Detect + partial Classify only. CloudZero adds Classify, no Remediate. Vantage is Detect only. Spot.io, Cast.ai, and Zesty run the full loop but only on narrow surfaces (EC2 Spot, Kubernetes, EBS respectively). Densify stops at limited Remediate.

ZopNight is the only platform on the list running the full loop across the entire cloud cost estate — not just one cloud, not just one resource class. Full vendor landscape table in the standalone whitepaper page.

Chapter 08 · The maturity model

Crawl, Walk, Run, Fly.

The FinOps Foundation Crawl-Walk-Run model maps onto CDCR adoption. The single largest unlock in maturity is the Walk-to-Run move.

Crawl · 18% of organisations. Cost reviews are quarterly. Allocation is partial. No automation.

Walk · 54% of organisations. A cost dashboard is in place. Allocation and chargeback work for most spend. The team acts on top recommendations each month. No continuous remediation. This is the industry’s largest stuck point.

Run · 22% of organisations. Continuous remediation is in place for predictable drift classes (non-production scheduling, snapshot lifecycle, storage class migration). Right-sizing and orphan cleanup are automated under policy.

Fly · 6% of organisations. The cost estate self-optimises against explicit policy. Humans approve only high-blast-radius decisions.

Chapter 09 · Customer patterns & 90-day adoption

Three estates. Three outcomes. One pattern.

Three composite customers across the SaaS, consumer, and regulated finance segments. The pattern is consistent: 23–31% bill reduction in 6–12 months.

Pattern A · Mid-stage SaaS, $180K/month AWS

A Series B SaaS company with eight engineers and a Notion-based FinOps practice. CDCR adoption began with non-production scheduling, unattached EBS cleanup, gp2 to gp3 migration, and RDS right-sizing. After 90 days the bill was $124K/month, a 31% reduction. Engineering hours on cost work dropped from roughly 6/week to under 1.

Pattern B · Late-stage consumer, $2.4M/month across AWS + GCP

A Series D consumer company with a four-person FinOps team and custom Looker dashboards. Despite mature Inform-layer tooling, the team estimated $600K/month of recoverable waste. CDCR added Event Readiness automation for product launches, autoscaling policies, snapshot lifecycle rules, and Auto Tagging. After six months the bill was $1.78M/month, a 26% reduction.

Pattern C · Regulated financial services, $5.1M/month

A US firm with strict change management. Every cost optimisation required a CAB ticket. The team executed roughly 3 cost actions per quarter. CDCR inverted the CAB process: the CAB pre-approved the policy, the platform executed within it with full audit logging. After 12 months the bill was $3.9M/month, a 23% reduction. Action velocity moved from 3/quarter to ~400/month.

The 90-day adoption path

Weeks 1–2: connect cloud accounts in read-only mode. Auto-discovery completes inventory. Reconcile against the existing dashboard. Output: a baseline of recoverable drift by category and account.

Weeks 3–4: turn on non-production scheduling. Highest-savings, lowest-risk first move. First-month reduction typically 12–18%.

Weeks 5–8: move to right-sizing and orphan cleanup. Approve in batch. Implement Auto Tagging. Cumulative reduction typically 22–28%.

Weeks 9–12: convert from approval-required to policy-driven for the action classes with reliable safety track records. Define guardrails. The loop now runs continuously.

“The hardest part wasn’t the math. It was getting the CAB to pre-approve a policy instead of approving each action one ticket at a time. Once they did, the savings velocity was a step-change.” — Platform lead, Pattern C customer (financial services)

Chapter 10 · Where the category goes next