I Cached Null For 24 Hours And Called It A Feature
The bug report was simple: “User profiles are blank.” The cause was simpler: I had cached the absence of data with the same enthusiasm as actual data.
The Code
def get_user(user_id):
cached = redis.get(f"user:{user_id}")
if cached is not None:
return json.loads(cached)
user = db.query("SELECT * FROM users WHERE id = %s", user_id)
redis.setex(f"user:{user_id}", 86400, json.dumps(user)) # 24 hour TTL
return user
Spot the bug? When user is None, we cache "null" for 24 hours. Every subsequent request returns None instantly. Efficiently wrong.
The Timeline
- Hour 0: Deploy to production
- Hour 1: “Wow, cache hit rate is amazing!”
- Hour 3: First support ticket
- Hour 6: “Cache is working fine, must be a frontend issue”
- Hour 12: Existential realization
- Hour 24: Cache expires, users return, nobody learns anything
The Fix
if user is not None:
redis.setex(f"user:{user_id}", 86400, json.dumps(user))
One if statement. That’s it. That’s the whole fix.
The Lesson
Cache invalidation isn’t one of the two hard problems in computer science. Remembering to not cache garbage is.