Blast Radius Before Execution: Why Autonomous Cloud Must Check Idle Resources First
Autonomous cloud remediation fails the same way every time. The recommendation is correct. The action is correct. The scope is wrong.
ZopDev writing tagged autonomous-cloud. Engineering and FinOps notes, post-mortems, and benchmarks.
Autonomous cloud remediation fails the same way every time. The recommendation is correct. The action is correct. The scope is wrong.
The Autonomous Action Log: Auditing Every ZopNight Decision in Production CloudTrail is excellent at recording what happened. A node group scaled up at 14:32:07. A pod restarted at 14:32:41. An alarm…
Every cost-optimisation product has shipped an "Apply Fix" button at some point. Most teams do not click them. Either the button is a stub that opens a ticket somewhere (and nothing happens until a…
A developer asks Claude Code at 2 AM: "this terraform plan is failing admission, fix the bucket so it deploys." Claude reads the error, generates a slightly different bucket config, runs the plan…
The 2 a.m. compute runaway is the canonical FinOps incident. A Spark job is misconfigured to provision new EMR nodes every minute it cannot find a leader. A test agent left running on a developer's…
The first ninety days of an MCP server in production are about correctness, not abuse. The team is busy proving the agents do the right thing: the policy lookups return what they should, the audit…
The closed-loop pipeline runs the easy part well. Detect fires, decide picks a remediation, act executes, verify confirms. The hard part is the line between "auto-execute" and "page a human." Most…
A platform team picks Cloud Custodian in week one. By week six they realize Custodian fires after the resource is created, the wrong shape for blocking misconfigured Terraform plans. They add OPA. By…
A team writes the cron job that shuts non-prod down at 8 PM. The cron runs three commands in parallel: scale the EKS Deployments to zero, pause the Aurora cluster, stop the ElastiCache Redis nodes.…
An alert fires at 2:47 AM. A pod in the namespace is in CrashLoopBackOff. The on-call engineer reads the alert, opens Slack to find the team that owns , opens the wiki to find what policies apply to…
The read-only MCP server work shipped clean. AI agents could read tags, search logs, query costs, walk topology. Operators saved hours per incident. The next question was obvious: which writes can…
The 3am page is rarely about something that needs a human. The on-call gets paged at 03:14 because a pod has crashlooped four times in five minutes. They open Slack, look at the logs, see "OOMKilled"…
The 47th agent is when finance shows up. Below 30 agents in production, the Anthropic invoice is one tolerable line item somewhere south of $25,000 a month, and nobody asks who is spending what. Past…
A single Claude API call is predictable. An agent with tool access is not.
A runaway Lambda burns $200 an hour at 100 concurrent invocations. By the time your cost anomaly alert fires, three days have passed and $14,400 of unnecessary spend is already in the bill.
The trust ceiling on AI in cloud automation is not capability. It is write access.
A plain AI cloud assistant tells you the S3 bucket is public. ZopNight + Claude via MCP tells you the bucket is public AND violates policy 47, which requires EU-only buckets for any object tagged…
A FinOps team produces a recommendation report on Monday morning. It identifies $185,000 of monthly waste across 240 cloud resources. By Friday, 12 of those 240 are remediated. By the end of week 4,…
The average time to remediate an IAM misconfiguration in ticket-driven teams is 14 days. The fix takes 4 minutes. The DERA loop — Detect, Evaluate, Remediate, Audit — closes the gap automatically. Here's the full AWS architecture.
The average remediation event takes 47 minutes in runbook-driven ops. The fix takes 4. Closed-loop remediation eliminates the overhead — here's the full technical architecture and how to start with your first policy.
Most cloud platforms tell you what happened. They do not fix it. This release moves ZopNight from a visibility layer into an execution layer — VM autoscaling across 3 clouds, 43 read-only AI tools, tag-level cost attribution, and more.
One post a week. Sundays. No "10 ways to think about cloud" listicles, just the engineering and FinOps notes we'd want to read.
See. Find. Fix. Automatic.
Connect your first cloud account in under 5 minutes. See your first remediation in under 7. No credit card required.