| Joey

Posted Mar 8, 2026

By Joey

3 min read

1. Core Principles (Read This First)

Before touching any cleanup script, internalize these rules:

Redis memory issues are almost always caused by retention mistakes, not leaks
*KEYS ** is forbidden in production
DEL is dangerous for large keys — UNLINK is preferred
Backups must come before cleanup
TTL is the only sustainable memory strategy

If Redis data can grow forever, it eventually will.

2. Redis in Kubernetes: What Makes It Tricky

Kubernetes adds unique failure modes:

Pods can restart unexpectedly → memory spikes repeat
RSS vs used_memory confusion in container limits
Eviction by the kubelet if Redis exceeds memory limits
PersistentVolumes hide real memory growth

Recommendation

Always set Redis pod memory limits
Always configure Redis maxmemory
Never rely on K8s eviction alone

3. Baseline Health Checks (Run These First)

Memory overview

redis-cli INFO memory

Key fields:

used_memory_human
used_memory_rss_human
mem_fragmentation_ratio
maxmemory
maxmemory_policy

Keyspace overview

redis-cli INFO keyspace

This tells you where keys live, not how large they are.

4. The Silent Killers: Large Keys

Redis is fast — until you store huge values.

Common offenders:

stat:* (Sidekiq statistics)
Large JSON strings
Unbounded hashes or lists
Job payloads stored as strings

Find the biggest keys safely

  
DB=0
TOP=20

redis-cli -n "$DB" --scan \
| while read -r key; do
    bytes=$(redis-cli -n "$DB" MEMORY USAGE "$key" 2>/dev/null)
    [ -z "$bytes" ] && bytes=0
    printf "%12s  %s\n" "$bytes" "$key"
  done \
| sort -nr \
| head -n "$TOP"

Never use KEYS *.

5. Backups Before Cleanup (Non‑Negotiable)

Recommended: RDB snapshot

redis-cli BGSAVE

Locate and copy:

  
redis-cli CONFIG GET dir
redis-cli CONFIG GET dbfilename
cp /var/lib/redis/dump.rdb /backup/redis/pre-cleanup-$(date +%F).rdb

Why this works:

Handles very large keys
Fast
Easy restore

6. Safe Cleanup Patterns

Rule: UNLINK > DEL

DEL blocks Redis while freeing memory. UNLINK frees memory asynchronously.

Pattern 1: Delete keys by pattern

  
DB=0

redis-cli -n "$DB" --scan MATCH 'Course#linked_course_uuids_and_self*' \
| while read -r key; do
    redis-cli -n "$DB" UNLINK "$key"
  done

Pattern 2: Rate-limited cleanup (extra safe)

  
DB=0

redis-cli -n "$DB" --scan MATCH 'stat:*' \
| while read -r key; do
    redis-cli -n "$DB" UNLINK "$key"
    sleep 0.01
  done

7. Sidekiq: The Biggest Redis Memory Trap

Why Sidekiq causes Redis memory explosions

By default:

stat:* keys never expire
Retry jobs accumulate
Dead jobs remain for months

This is expected behavior — and dangerous without tuning.

Fix 1: Apply TTL to Sidekiq stats

config/initializers/sidekiq.rb

  
Sidekiq.configure_server do |config|
  config.on(:startup) do
    Sidekiq.redis do |conn|
      retention_days = 30
      ttl = retention_days * 24 * 60 * 60

      conn.scan_each(match: 'stat:*') do |key|
        conn.expire(key, ttl)
      end
    end
  end
end

Fix 2: Reduce retry pressure

  
class MyWorker
  include Sidekiq::Worker
  sidekiq_options retry: 5
end

Disable retries for non-critical jobs:

  
sidekiq_options retry: false

Fix 3: Tune dead job retention

  
Sidekiq.configure_server do |config|
  config.options[:dead_timeout] = 30 * 24 * 60 * 60
  config.options[:dead_max_jobs] = 2000
end

8. Redis maxmemory (K8s Safety Net)

Unbounded Redis is dangerous in containers.

Recommended baseline

redis-cli CONFIG SET maxmemory 512mb
redis-cli CONFIG SET maxmemory-policy allkeys-lru

Choose a value below your pod memory limit.

9. Fragmentation & RSS Troubleshooting

When RSS is much higher than used_memory

Run:

redis-cli MEMORY DOCTOR

If caused by historical peak:

Harmless
RSS will be reused

Try:

redis-cli MEMORY PURGE

Guaranteed fix:

Rolling restart

10. Production Troubleshooting Checklist

Check eviction & hit rate

redis-cli INFO stats | egrep 'evicted_keys|expired_keys|keyspace_hits|keyspace_misses'

Check retry & dead size

redis-cli ZCARD retry
redis-cli ZCARD dead

Check biggest keys again after cleanup

  
redis-cli --scan | head -n 20

11. Kubernetes-Specific Recommendations

Use StatefulSet for Redis
Set resources.limits.memory
Avoid OOMKills by setting Redis maxmemory
Prefer managed Redis for critical workloads

12. Final Takeaways

Redis problems are predictable
TTL beats cleanup scripts
UNLINK beats DEL
Backups beat regret
Sidekiq defaults are not production-safe

If you fix retention, Redis becomes boring again — and boring is good.

If you want, this guide can be adapted into:

An internal runbook
A Helm chart checklist
A Sidekiq-specific hardening guide
A Grafana alerting spec

Just say the word.

This post is licensed under CC BY 4.0 by the author.