CPU throttling has a visibility problem that the Kubernetes community partially fixed. container_cpu_cfs_throttled_seconds_total exposes the throttle. Grafana dashboards flag it. Engineers know to look for it.
Memory limits have no equivalent. There is no container_memory_approaching_limit_seconds_total. The container either runs fine or it dies. The Linux kernel OOM killer fires, the process is terminated, and Kubernetes schedules a replacement. The event shows up in kubectl get events for a few hours, then disappears.
Meanwhile, your p99 latency dashboard looked normal until the pod count dropped, then spiked 300% for 90 seconds, then recovered. No CPU throttle metric fired. No memory warning fired. Your on-call engineer spends 20 minutes trying to correlate the latency spike to a deployment, a config change, or a dependency failure before finding the OOMKill event in the pod history.
This is a pattern we see consistently. CPU throttling is a metric problem that degrades gradually. OOMKill is an availability problem that hits hard and disappears fast.
OOMKill vs CPU Throttling: Two Different Failure Modes
CPU throttling happens when a container tries to use more CPU than its limit allows. The kernel throttles the process: it still runs, just slower. The degradation is gradual and proportional. A container at 2x its CPU limit runs at roughly half speed. Traffic increases linearly, latency increases gradually, the throttle metric climbs. There is time to detect and respond.
OOMKill is different. When a container’s RSS (resident set size) hits the cgroup memory limit, the kernel OOM killer terminates the process. Not throttled. Terminated. No grace period, no drain, no SIGTERM sequence. The pod status changes to OOMKilled. Kubernetes schedules a replacement pod. The replacement pulls its image, initializes its runtime, warms its connection pool, and starts serving traffic. That process takes 30-90 seconds depending on the application. During those 30-90 seconds, the pod is not serving requests.
| Dimension | CPU Throttling | OOMKill |
|---|---|---|
| Failure type | Gradual degradation | Hard stop |
| Application impact | Increased latency | Requests dropped during restart |
| Detection metric | container_cpu_cfs_throttled_seconds_total | container_oom_events_total |
| Visibility | High: shows in dashboards | Low: appears in events only |
| Recovery time | Immediate when load drops | 30-90 seconds for pod restart |
| Warning before failure | Yes: throttle metric climbs | No: kill fires at limit hit |
The practical consequence: CPU throttling gives you time to respond. OOMKill gives you a post-mortem.
The OOMKill Latency Signature
OOMKill produces a specific pattern in latency and availability graphs that is easy to identify once you know what to look for.
Phase 1: normal operation. Memory increases gradually as the application processes requests, the JVM heap fills, or a cache warms up. This phase can last hours. Nothing looks wrong.
Phase 2: limit hit. RSS hits the cgroup limit. The OOM killer fires. The pod terminates. Pod count drops by 1 (or more, if multiple replicas are hitting the same limit simultaneously).
Phase 3: latency spike. The remaining pods absorb 100% of traffic. If the service was running at 60% capacity on 5 pods, now 4 pods are serving at 75% capacity. If the pod was handling a disproportionate fraction of requests (sticky sessions, sharding), the remaining pods may spike significantly higher. p99 latency increases 200-400%.
Phase 4: recovery. The replacement pod comes up. Image pull: 10-30 seconds. Runtime initialization: 5-20 seconds. Connection pool warmup: 10-30 seconds. Health check passing: 5-10 seconds. Total: 30-90 seconds from kill to serving traffic.
The cycle repeats if the root cause is not fixed. An application with a gradual memory leak will OOMKill on a predictable schedule: every N hours as the RSS climbs from baseline to limit. Each kill looks like a random latency event until you plot OOMKill events against the latency timeline and see the correlation.
Language-Specific Memory Behavior Under cgroup Limits
Different runtimes fail in different ways when memory limits are set incorrectly.
JVM (Java, Kotlin, Scala): Before Java 10, the JVM does not read cgroup memory limits. It allocates heap based on the host machine’s total RAM: typically 25% of host memory by default. On a 64 GB node, the JVM allocates a 16 GB heap for a container with a 2 GB memory limit. The heap grows. The container hits 2 GB RSS. The OOM killer fires. The JVM had no idea the limit existed. From Java 10 onward, -XX:+UseContainerSupport is enabled by default and the JVM reads the cgroup limit. But legacy images built on JDK 8 without this flag still exist in production.
Go: The Go garbage collector is designed to use as much memory as available before triggering a full GC cycle. As RSS approaches the cgroup limit, GC triggers more frequently to stay under the limit. Each GC cycle creates a CPU burst. The container looks like it is CPU throttling (GC CPU spike) when the actual driver is memory pressure. The fix is GOMEMLIMIT, introduced in Go 1.19, which tells the GC to target a specific memory ceiling below the cgroup limit.
Node.js: The V8 heap has a default max of 1.5 GB on 64-bit systems regardless of available RAM. Containers with less than 1.5 GB memory limit can have the V8 heap allocation exceed the container limit. The fix: --max-old-space-size flag set to 75% of the container’s memory limit.
| Runtime | Failure mode | Detection signal | Fix |
|---|---|---|---|
| JVM (pre-Java 10) | Heap ignores cgroup limit | OOMKill + large heap config | Add -XX:+UseContainerSupport |
| JVM (Java 10+) | GC thrashing near limit | High GC pause time + CPU spikes | Set -Xmx to 75% of limit |
| Go | GC pressure masking as CPU throttle | CPU burst without high RSS | Set GOMEMLIMIT to 90% of limit |
| Node.js | V8 heap exceeds container limit | OOMKill on memory-light containers | Set --max-old-space-size to 75% of limit |
The Memory Limit Formula: Request, Limit, and Headroom
The most common misconfiguration is setting memory limit equal to memory request. This comes from Kubernetes QoS documentation: Guaranteed QoS class (the highest priority, least likely to be evicted) requires that limits equal requests for every container.
The QoS benefit is real. But the cost is that any transient memory spike above the request value causes an OOMKill. A JVM running GC, a Go runtime sweeping a large heap, or a Node.js V8 compiling a hot function will spike RSS above steady-state by 20-40% for seconds at a time. If limit equals request at steady-state RSS, those GC spikes kill the pod.
The correct formula: requests set at P90 steady-state RSS measured over a 7-day window. Limit set at 1.5x to 2x the request, providing headroom for GC burst, traffic spikes, and initialization overhead.
In production, teams that set limit equal to request see 3-8x higher OOMKill rates than teams with 1.5x headroom. Setting limit to 1.5x request reduces OOMKills by 70-80% in the applications we have measured.
The trade-off: 1.5x headroom means each pod can use 50% more memory than its request before being killed. The scheduler sees requests, not limits, when placing pods. A node can technically over-commit memory by the aggregate headroom delta across all its pods. Monitor node_memory_MemAvailable_bytes to ensure actual node memory stays above 20% free.
Detection: The 4 Metrics to Catch OOMKills Before Users Do
OOMKills are detectable before they cause user-visible incidents if you have the right metrics in place.
| Metric | Source | What it measures | Alert threshold |
|---|---|---|---|
container_oom_events_total | kube-state-metrics | Cumulative OOMKill count per container | Any increase in a 5-min window |
container_memory_working_set_bytes / limit | cAdvisor | Memory utilization ratio | Above 85% for 5 minutes |
kube_pod_container_status_last_terminated_reason | kube-state-metrics | Last termination reason (OOMKilled) | Value = OOMKilled in last 15 min |
kube_node_status_condition{condition="MemoryPressure"} | kube-state-metrics | Node-level memory pressure | Value = true on any node |
container_memory_working_set_bytes above 85% of the limit is the early warning. At 85%, the container has consumed 85% of its headroom. A GC cycle that adds 20% RSS will hit the limit. Alert at 85% and investigate before the kill fires.
container_oom_events_total is the kill confirmation. Alert on any increase, not on a threshold. One OOMKill on a production pod is one too many. Investigate immediately: check if it is a memory leak (RSS growing monotonically) or a sizing issue (RSS plateauing near the limit).
The live pod state visibility tooling that surfaces crashloop and OOMKill events covers this detection surface directly. The pattern is the same: make the event visible in real time instead of discovering it in a post-mortem. Memory limits lie by being invisible. Make them visible and OOMKill becomes a configuration problem, not an incident.