My Pod Got OOMKilled And I Took It Personally
It was 3am. The pager went off. My pod had been killed—murdered, really—by the OOM killer. No warning. No goodbye. Just gone.
The Autopsy
resources:
requests:
memory: '128Mi'
limits:
memory: '128Mi' # hubris
I had set requests equal to limits. “Best practice,” they said. “Predictable behavior,” they said. They didn’t mention that my app would casually allocate 200MB during a garbage collection spike.
The Stages of Grief
- Denial: “The metrics must be wrong”
- Anger: “Who wrote this garbage collector”
- Bargaining: “What if I just set limits to 10Gi”
- Depression: stares at Grafana dashboard
- Acceptance: “Memory is a social construct”
The Fix
resources:
requests:
memory: '128Mi'
limits:
memory: '512Mi' # wisdom
Sometimes you just need room to breathe.
“The cloud is just someone else’s computer, waiting to kill your processes.” — Ancient DevOps Proverb