The same picture the on-call SRE has.
Most cost dashboards stop at the cloud bill. Most observability stacks stop at logs and traces. Neither tells you what your cluster is actually doing right now. Kubernetes View sits in the middle — it reads the cluster state directly via the Kubernetes API, joins it to billing data, and renders it as a single live topology.
Use it to answer questions like:
- Which namespace is burning the most CPU this hour, and which deployment inside it is responsible?
- Which pods have been pending for more than 15 minutes, and why?
- Where are HPAs flapping? Where is VPA scaling against itself?
- Which clusters have drifted from their target node pool size since the last deploy?
Connect a cluster with read scopes and you get the full view. Mutating actions (cordon, drain, scale, schedule) require an explicit policy grant and are admin-gated.
One canvas, every cluster.
The topology panel is the entry point. It groups your estate by provider → account → region → cluster → namespace → workload. Every node carries live status: node count, pod count, CPU/memory pressure, cost per hour, drift count.
| Level | What it shows |
|---|---|
| Provider | AWS / GCP / Azure / self-managed totals: clusters, monthly cost, health. |
| Account / Project | Per-account cluster count, region spread, top spenders. |
| Cluster | Node groups, pool autoscaler state, control-plane version, last reconcile timestamp. |
| Namespace | Cost, pod count, restart rate, OOMKilled count, pending pods. |
| Workload | Deployment / StatefulSet / DaemonSet detail: replicas, HPA target, VPA recommendation, last-deploy SHA. |
Click any node and the side panel opens with the live state, the 24-hour trend, and the open audit findings. No tab switching.
Cost down to the pod.
The cluster bill is rarely the question. The question is usually: "which team's workloads moved the number last week?"
Kubernetes View allocates spend three ways:
By owner
Joins live pod metadata to your Auto Tagging dictionary (team, env, service). Surfaces unowned pods so they don’t silently roll up into "shared infrastructure".
By workload type
Splits batch jobs, long-running services, sidecars, and DaemonSets. A noisy sidecar costing $14K/month shouldn’t hide inside the parent service.
By scheduler decision
Tracks how much of your hourly bill comes from Spot vs On-Demand vs Reserved capacity, and how many pods are evicted per hour. If your Spot mix is silently degrading, this is where you see it first.
"Our Kubernetes bill went up 18% this month" is unactionable. "Three namespaces in the prod-us-east cluster moved from 60% Spot to 12% Spot after the Karpenter consolidation rule changed" is fixable in 20 minutes.
What changed since the last green deploy.
Kubernetes clusters drift constantly — HPAs adjust, autoscalers add nodes, operators rotate. Most of it is fine. Some of it costs you money or quietly breaks SLOs.
Kubernetes View runs continuous drift detection on:
- Node pool size — current vs target, with explanation (autoscaler vs manual vs Karpenter consolidation).
- HPA flapping — scale events crossing the same boundary more than 6 times in an hour.
- VPA contention — pods where VPA wants to resize but HPA is also active.
- Restart loops — pods with >3 restarts in the last hour, grouped by CrashLoopBackoff cause.
- Stuck rollouts — Deployments where the new ReplicaSet has been Progressing for more than 30 minutes.
- Untagged workloads — pods landing in a namespace without the required owner label.
- Image bloat — new image >25% larger than the prior one, with a link to the diff.
Every drift finding carries severity, projected dollar impact, and a one-click jump to the workload, the audit rule, and the suggested remediation.
Karpenter, Cluster Autoscaler, KEDA. One panel.
The autoscaler is usually the single biggest cost lever in a cluster, and the single most opaque component. Kubernetes View surfaces the scheduler decision stream as a first-class panel:
| Signal | What you see |
|---|---|
| Karpenter provisioning events | Why a node was added, which pods triggered it, instance type, hourly cost. |
| Cluster Autoscaler scale-down | Eligible nodes, reasons nodes are blocked from scale-down (PDBs, system pods, kubelet config). |
| KEDA scaler events | External-metric scalers (SQS, Kafka, Cron) and their current trigger thresholds. |
| Spot interruptions | Last 24 hours of interruptions, per instance type, with workload impact. |
| Pending pod reasons | FailedScheduling events grouped by reason (insufficient memory, taints, affinity). |
Read-only is the default. Always.
Kubernetes View ships with three discrete permission tiers. Most customers run forever on Tier 1.
Tier 1 · Read-only
Cluster API permissions: get, list, watch. Billing read scopes on the cloud account. No mutations. Sufficient for the entire topology, cost, and drift surface.
Tier 2 · Guided
Adds the ability to propose a remediation (scale, schedule, drain) into the policy console. Execution still requires an explicit human approval. Useful when an on-call wants the platform to draft the kubectl for them.
Tier 3 · Policy-driven
The platform executes within an explicitly-scoped policy — e.g. "after 21:00 IST on non-prod namespaces, scale ReplicaSets matching label env=dev to zero, unless the namespace carries the label keep-on=true." Every action is admin-gated, scoped, and logged with actor + timestamp + diff.
Customer-managed CRDs, operator-managed StatefulSets, and any workload labelled zopdev.io/protected=true. The platform refuses, by design. Section 6 of the CDCR whitepaper goes into the architectural rationale.
Already in your stack.
| Source | What we read |
|---|---|
| Kubernetes API (any conformant cluster) | Workloads, pods, nodes, events, metrics-server, HPA/VPA state. |
| EKS / GKE / AKS control planes | Cluster version, addon state, node pool config, control-plane logs. |
| Karpenter / Cluster Autoscaler / KEDA | Provisioner CRDs, scaler events, decisions. |
| Prometheus / OpenTelemetry | Workload-level CPU/memory/network. Optional — falls back to metrics-server. |
| AWS CUR / GCP BigQuery billing / Azure Cost Mgmt | Hourly cost allocation joined to live pod state. |
| GitHub / GitLab | Last-deploy SHA, image source, blame link on drift findings. |
| Slack / PagerDuty | Drift alerts, budget burn-down, weekly digest. |
Common questions.
Do you require an agent in the cluster?
No. The default mode uses the Kubernetes API directly via an IAM role / workload identity. An optional in-cluster agent is available for sub-30s reconcile intervals and richer container metadata.
What’s the latency?
API-mode reconciles every 60 seconds by default (configurable to 30s). Agent-mode streams events in real time. The UI itself updates within ~500ms of a cluster event.
Multi-tenancy?
Yes. Org → team → namespace scoping is enforced server-side. A team that owns billing-prod sees only their namespaces; the FinOps lead sees the whole estate.
Self-managed clusters (kops, kubeadm, RKE2)?
Supported. Anything that exposes a conformant Kubernetes API works. The cloud-specific integrations (CUR, billing) are optional.
How does this relate to the rest of ZopDev?
Kubernetes View is the live operational surface inside ZopDay. The cost lens is shared with ZopNight. The topology graph is shared with ZopCloud. One inventory, three lenses.