All posts

I Cached Null For 24 Hours And Called It A Feature

The bug report was simple: “User profiles are blank.” The cause was simpler: I had cached the absence of data with the same enthusiasm as actual data.

The Code

def get_user(user_id):
    cached = redis.get(f"user:{user_id}")
    if cached is not None:
        return json.loads(cached)

    user = db.query("SELECT * FROM users WHERE id = %s", user_id)
    redis.setex(f"user:{user_id}", 86400, json.dumps(user))  # 24 hour TTL
    return user

Spot the bug? When user is None, we cache "null" for 24 hours. Every subsequent request returns None instantly. Efficiently wrong.

The Timeline

  • Hour 0: Deploy to production
  • Hour 1: “Wow, cache hit rate is amazing!”
  • Hour 3: First support ticket
  • Hour 6: “Cache is working fine, must be a frontend issue”
  • Hour 12: Existential realization
  • Hour 24: Cache expires, users return, nobody learns anything

The Fix

if user is not None:
    redis.setex(f"user:{user_id}", 86400, json.dumps(user))

One if statement. That’s it. That’s the whole fix.

The Lesson

Cache invalidation isn’t one of the two hard problems in computer science. Remembering to not cache garbage is.