Most real-world systems are read-heavy. Twitter serves 600K reads per second but only 6K writes. Netflix handles millions of concurrent streams but catalog updates are rare. The read-to-write ratio is often 100:1 or higher.
The good news: reads are far easier to scale than writes. You can cache them, replicate them, and push them to the edge.
Caching Strategies in Depth
Cache-Aside (Lazy Loading)
The most common pattern. The application manages the cache directly.
import redis, json
cache = redis.Redis(host='cache.internal', port=6379)
def get_user_profile(user_id: int) -> dict:
cache_key = f"user:{user_id}:profile"
# Step 1: Check cache
cached = cache.get(cache_key)
if cached:
return json.loads(cached) # Cache HIT
# Step 2: Cache MISS — read from database
profile = db.execute("SELECT * FROM users WHERE id = %s", (user_id,))
# Step 3: Populate cache for next time
cache.setex(cache_key, 300, json.dumps(profile)) # TTL = 5 min
return profileBest for: Read-heavy workloads where stale data is acceptable during the TTL window. Weakness: First request is always slow (cold cache). Cache and DB can drift.
Write-Through
Every write goes to both cache and database. Reads always hit the cache.
def update_user_profile(user_id: int, updates: dict) -> dict:
# Write to database
db.execute("UPDATE users SET name=%s WHERE id=%s", (updates['name'], user_id))
# Write to cache in the same operation
updated = db.execute("SELECT * FROM users WHERE id=%s", (user_id,)).fetchone()
cache.setex(f"user:{user_id}:profile", 300, json.dumps(updated))
return updatedBest for: Strong consistency between cache and DB. Weakness: Higher write latency. Cache fills with data that may never be read.
Write-Behind (Write-Back)
Write to cache immediately, asynchronously persist to database.
def update_user_async(user_id: int, updates: dict):
# Write to cache (fast — user gets immediate response)
cache.setex(f"user:{user_id}:profile", 300, json.dumps(updates))
# Queue the database write
write_queue.put({"user_id": user_id, "data": updates})
# Background worker drains the queue and writes to DB in batchesBest for: Low write latency, batch writes. Weakness: Data loss if cache crashes before background flush. Complex error handling.
Read-Through
The cache itself loads data from the database on a miss. The application only talks to the cache — the loading logic is encapsulated.
class ReadThroughCache:
def __init__(self, cache_client, ttl=300):
self.cache = cache_client
self.ttl = ttl
def get(self, key: str, loader_fn):
"""Get from cache. On miss, loader_fn fetches from source."""
value = self.cache.get(key)
if value:
return json.loads(value)
# Cache loads from DB on miss -- transparent to caller
value = loader_fn()
if value:
self.cache.setex(key, self.ttl, json.dumps(value))
return value
# Usage -- caching logic is hidden from application code
user_cache = ReadThroughCache(cache, ttl=300)
profile = user_cache.get(
f"user:{user_id}",
lambda: db.execute("SELECT * FROM users WHERE id=%s", (user_id,)).fetchone()
)Best for: Keeping caching logic out of application code. Weakness: Less control over what gets cached and when.
The Five Questions of Caching
Before adding a cache to any system, answer these:
1. WHAT to cache? Hot data, expensive computations, static content
2. WHERE to cache? Browser, CDN, app server, distributed cache, DB cache
3. HOW to populate? Cache-aside, read-through, write-through, write-behind
4. WHEN to invalidate? TTL, event-based, version-based
5. WHAT on miss? Fetch from source, queue and wait, return staleCache Invalidation
TTL-Based
Every cached value expires after a fixed duration. Simple and self-cleaning.
Content Type Suggested TTL Reasoning
──────────────────────────────────────────────────
Static assets 24 hours+ Rarely change
Product catalog 1 hour Changes daily
User profile 5 minutes Changes occasionally
Account balance 0 (no cache) Must be real-time
Trending topics 30 seconds Changes constantlyEvent-Based
Invalidate on write events. Near-instant consistency, more complexity.
def update_product(product_id: int, updates: dict):
db.execute("UPDATE products SET ... WHERE id = %s", (product_id,))
# Invalidate product cache and all related caches
cache.delete(f"product:{product_id}")
cache.delete(f"category:{updates['category_id']}:products")
# Notify other services via event bus
event_bus.publish("product.updated", {"product_id": product_id})Version-Based
Change the cache key when data changes. Old entries expire naturally via TTL.
def get_product_versioned(product_id: int) -> dict:
version = cache.get(f"product:{product_id}:version") or "0"
cache_key = f"product:{product_id}:v{version}"
cached = cache.get(cache_key)
if cached:
return json.loads(cached)
product = db.get_product(product_id)
cache.setex(cache_key, 3600, json.dumps(product))
return product
def update_product(product_id: int, updates: dict):
db.update_product(product_id, updates)
cache.incr(f"product:{product_id}:version") # Old key becomes unreachableThe Thundering Herd Problem
When a popular cache key expires, hundreds of requests simultaneously miss the cache and all hit the database.
def get_with_lock(key: str, loader_fn, ttl: int = 300):
"""Only one request fetches from DB on cache miss."""
value = cache.get(key)
if value:
return json.loads(value)
lock_key = f"lock:{key}"
if cache.set(lock_key, "1", nx=True, ex=10): # Acquire lock
try:
value = loader_fn()
cache.setex(key, ttl, json.dumps(value))
return value
finally:
cache.delete(lock_key)
else:
time.sleep(0.05) # Losers wait, then retry from cache
return json.loads(cache.get(key)) or loader_fn()Alternative: Stale-while-revalidate. Serve the expired cached value immediately while refreshing in the background. The user gets a fast response with slightly stale data.
Read Replicas
Caching works for hot data, but not every query can be cached. Read replicas distribute database reads across multiple copies.
Read/Write Splitting
class DatabaseRouter:
def __init__(self, primary_dsn, replica_dsns):
self.primary = connect(primary_dsn)
self.replicas = [connect(dsn) for dsn in replica_dsns]
def get_read_connection(self):
return random.choice(self.replicas)
def get_write_connection(self):
return self.primary
def get_read_after_write_connection(self):
"""For reads that MUST see the latest write."""
return self.primary
router = DatabaseRouter(
primary_dsn="postgresql://primary:5432/app",
replica_dsns=["postgresql://replica1:5432/app", "postgresql://replica2:5432/app"]
)Replication Lag
The gap between a write on the primary and when it appears on a replica.
SYNCHRONOUS: 0ms lag, slower writes, replica failure blocks all writes
ASYNCHRONOUS: 10-1000ms lag, fast writes, stale reads possible
SEMI-SYNC: At least 1 replica confirms. Balance of both.Handling Read-Your-Writes Consistency
def handle_read_your_writes(user_id: int, last_write_ts: float):
"""Route to primary if the user's last write was recent."""
if time.time() - last_write_ts < 2.0:
return router.get_write_connection() # Read from primary
replica = router.get_read_connection()
lag = get_replication_lag(replica)
if lag > 1.0:
return router.get_write_connection() # Fallback to primary
return replicaCDN (Content Delivery Network)
A CDN caches content at edge locations close to users. Instead of crossing the ocean, the nearest CDN node serves it.
What to Put on a CDN
STATIC (always CDN): Images, CSS, JS, fonts. Cache forever, bust with filename hash.
SEMI-STATIC (short TTL): Product pages, blog posts. Cache 1-60 min, invalidate on update.
DYNAMIC (usually not CDN): User-specific data, authenticated responses.Cache Headers
@app.route("/api/products/<int:product_id>")
def get_product(product_id):
product = db.get_product(product_id)
response = make_response(json.dumps(product))
# CDN caches 5 min, browser caches 1 min
response.headers['Cache-Control'] = 'public, max-age=60, s-maxage=300'
response.headers['ETag'] = f'"{hash(json.dumps(product))}"'
return response
@app.route("/api/users/<int:user_id>/profile")
def get_profile(user_id):
response = make_response(json.dumps(db.get_user(user_id)))
response.headers['Cache-Control'] = 'private, max-age=60' # No CDN
return responsePull vs Push CDN
PULL: CDN fetches from origin on first request. Simple, cold start on first hit.
PUSH: You upload to CDN proactively. No cold start, more operational complexity.Redis vs Memcached: When to Use Which
CHOOSE REDIS WHEN:
- You need data structures (sorted sets, lists, hashes, HyperLogLog)
- You need persistence (RDB snapshots or AOF)
- You need pub/sub for real-time features
- You need Lua scripting for atomic operations
- You need built-in replication and clustering
- Use cases: sessions, rate limiting, leaderboards, queues, pub/sub
CHOOSE MEMCACHED WHEN:
- You need simple key-value caching with the largest possible cache
- You need multi-threaded performance on multi-core machines
- You do not need persistence or data structures
- You want a simpler operational model
- Use cases: HTML fragment caching, DB query result cachingRedis Cluster for Horizontal Scaling
from redis.cluster import RedisCluster
rc = RedisCluster(
startup_nodes=[
{"host": "redis-1", "port": 6379},
{"host": "redis-2", "port": 6379},
{"host": "redis-3", "port": 6379},
],
decode_responses=True
)
# Cluster shards data using CRC16(key) % 16384 hash slots
rc.set("user:1001:session", json.dumps(session_data), ex=1800)
# Pipeline for batch operations (reduces round trips)
pipe = rc.pipeline()
for user_id in user_ids:
pipe.get(f"user:{user_id}:profile")
results = pipe.execute()Putting It All Together
How to scale reads for 100K requests per second:
LAYER 1: BROWSER CACHE → ~40% of requests never leave browser
LAYER 2: CDN → ~30% served from edge (5ms latency)
LAYER 3: REDIS → ~20% served from cache (1ms latency)
LAYER 4: READ REPLICAS → ~8% served from replicas (10ms latency)
LAYER 5: PRIMARY DATABASE → ~2% reach the primary (20ms latency)
Of 100,000 req/sec:
40,000 → browser | 18,000 → CDN | 8,400 → Redis
2,800 → replicas | 800 → primary DBKey Takeaways
- Most systems are read-heavy (90%+ reads). Scaling reads is your first and most common challenge. The tools are caching, read replicas, and CDNs.
- Cache-aside is the most common pattern. Write-through gives stronger consistency at higher write latency. Write-behind gives low write latency but risks data loss.
- Cache invalidation is the hardest part. TTL-based is simple but allows stale data. Event-based gives near-instant consistency but adds complexity. Version-based avoids race conditions but wastes memory.
- The thundering herd problem is real. Use distributed locks or stale-while-revalidate to prevent cache misses from crushing your database.
- Read replicas scale database reads but introduce replication lag. Handle read-your-writes by routing recent writes to the primary.
- CDNs are the most effective read scaling tool for static and semi-static content. Use proper Cache-Control headers.
- Layer your caches: browser, CDN, application cache, read replicas, primary database. Each layer absorbs traffic so only a fraction reaches the DB.
