← Back to Knowledge Base

Preventing EKS Pod Evictions Under Memory Pressure

Managing resource allocation in AWS Elastic Kubernetes Service (EKS) is notoriously difficult at scale. One of the most terrifying events for an on-call Site Reliability Engineer is watching critical production pods randomly terminate with the dreaded OOMKilled or Evicted status during a traffic spike. This guide explains how senior DevOps teams prevent Kubernetes pod evictions in high-throughput EKS clusters using advanced scheduling, QoS class tuning, and precise node configuration.

The Difference Between OOMKilled and Evicted

Understanding the difference between a pod exhausting its own limits and the node evicting the pod to save itself is the first step toward true cluster stability. When a pod status shows OOMKilled, it means the application inside the container attempted to allocate more RAM than the limits specified in its YAML deployment. This is an application-level problem. However, when a pod shows Evicted, the pod itself might have been behaving perfectly well within its limits. The problem is that the underlying EC2 server ran out of physical memory. To prevent the entire server from crashing, the kubelet agent steps in and starts terminating pods.

The Real Problem With Burstable QoS Pods

Kubernetes categorizes every pod into one of three Quality of Service (QoS) classes: Guaranteed, Burstable, or BestEffort. Most production outages caused by node evictions involve Burstable QoS workloads. This happens when a developer sets a memory request of 512MB but a limit of 2GB. Kubernetes schedules the pod onto a node based on the 512MB request. If multiple pods on the same node suddenly spike their memory usage up to their 2GB limits simultaneously, the underlying EC2 instance completely runs out of physical RAM. When the node is starved, the kubelet targets Burstable pods first.

Preventing the Linux OOM Killer

Sometimes, the node runs out of memory so rapidly that the Kubernetes kubelet doesn't even have time to gracefully evict pods. In these catastrophic scenarios, the Linux kernel's internal Out-Of-Memory killer wakes up and starts assassinating random processes indiscriminately. To prevent this, enterprise EKS configurations must utilize the --eviction-hard and --system-reserved flags in their node launch templates. Explicitly carve out 1GB or 2GB of RAM solely for the OS processes and Kubelet.