Database selection is one of the highest-impact decisions in system design. Get it right and your system scales naturally. Get it wrong and you spend years fighting your data layer — migrating mid-production, working around limitations, and building application-level workarounds for things the database should handle.
The common mistake is treating this as an ideological debate: SQL vs NoSQL. That framing is useless. The real question is: what are your access patterns, consistency requirements, and scale needs? Different databases optimize for different tradeoffs. The right choice depends on your problem, not on trends.
The Default: Start with PostgreSQL
If you have no specific reason to choose something else, use PostgreSQL. This is not a controversial opinion among engineers who have built production systems at scale.
Why PostgreSQL is the default:
- ACID transactions: Guarantees data integrity. No partial writes, no phantom reads.
- Rich query language: JOINs, subqueries, window functions, CTEs, full-text search.
- JSON support: The
jsonbtype gives you document-store flexibility within a relational database. - Extensions: PostGIS for geospatial, pg_trgm for fuzzy search, TimescaleDB for time-series, pgvector for embeddings.
- Battle-tested: Powers Instagram, Discord, Notion, Supabase, and thousands of others.
- Vertical scaling: A single PostgreSQL instance on modern hardware (96 cores, 768GB RAM, NVMe SSDs) handles more load than most applications will ever need.
-- PostgreSQL can do a lot more than most people realize
-- JSONB: document-store-like flexibility within SQL
CREATE TABLE events (
id BIGSERIAL PRIMARY KEY,
event_type TEXT NOT NULL,
payload JSONB NOT NULL,
created_at TIMESTAMPTZ DEFAULT NOW()
);
-- Index specific JSON fields
CREATE INDEX idx_events_user ON events ((payload->>'user_id'));
CREATE INDEX idx_events_type_time ON events (event_type, created_at);
-- Query JSON fields like any other column
SELECT payload->>'user_id' AS user_id,
payload->>'action' AS action,
created_at
FROM events
WHERE event_type = 'page_view'
AND payload->>'country' = 'US'
AND created_at > NOW() - INTERVAL '1 hour'
ORDER BY created_at DESC
LIMIT 100;
-- Window functions — ranking, running totals, moving averages
SELECT
user_id,
order_total,
SUM(order_total) OVER (
PARTITION BY user_id
ORDER BY created_at
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS running_total,
RANK() OVER (
PARTITION BY user_id
ORDER BY order_total DESC
) AS order_rank
FROM orders;
-- Full-text search (no Elasticsearch needed for many use cases)
ALTER TABLE articles ADD COLUMN search_vector tsvector;
UPDATE articles SET search_vector =
setweight(to_tsvector('english', title), 'A') ||
setweight(to_tsvector('english', body), 'B');
CREATE INDEX idx_articles_search ON articles USING GIN(search_vector);
SELECT title, ts_rank(search_vector, query) AS rank
FROM articles, to_tsquery('english', 'distributed & systems') AS query
WHERE search_vector @@ query
ORDER BY rank DESC;When PostgreSQL Is Not Enough
PostgreSQL’s limitations are real, and they show up at specific scale thresholds:
-
Write throughput beyond a single machine: PostgreSQL does not natively shard. If you need 100K+ writes/second sustained, you need a distributed database (Cassandra, DynamoDB, CockroachDB).
-
Sub-millisecond latency: PostgreSQL’s fastest reads are ~1ms. If you need sub-millisecond access (session lookups, rate limiting), use Redis.
-
Massive horizontal read scale: Read replicas help, but there is a practical limit. At millions of reads/second, a purpose-built distributed store may be simpler.
-
Complex relationship traversals: Finding “friends of friends of friends who liked X” is possible in SQL but hideously slow. Graph databases (Neo4j) do this in milliseconds.
Database Types Deep Dive
Document Stores (MongoDB, Firestore, CouchDB)
Document databases store data as JSON-like documents (BSON in MongoDB’s case). Each document can have a different structure — no fixed schema.
// MongoDB: A document in the "products" collection
{
"_id": ObjectId("64f1a2b3c4d5e6f7a8b9c0d1"),
"name": "Wireless Headphones",
"price": 79.99,
"category": "electronics",
"specs": {
"bluetooth": "5.3",
"battery_hours": 40,
"driver_size_mm": 40,
"noise_cancellation": true
},
"reviews": [
{ "user": "alice", "rating": 5, "text": "Great sound quality" },
{ "user": "bob", "rating": 4, "text": "Comfortable, good battery" }
],
"tags": ["wireless", "bluetooth", "anc"],
"created_at": ISODate("2024-01-15T10:30:00Z")
}Strengths:
- Schema flexibility — evolve your data model without migrations
- Embedded documents eliminate many JOINs (denormalized by design)
- Horizontal scaling via built-in sharding
- Developer-friendly (your data looks like your code objects)
Weaknesses:
- No JOINs (or limited, slow ones). You must denormalize.
- Data duplication — updating “category name” across 10 million products requires updating each document
- Weaker transaction guarantees (MongoDB now supports multi-document transactions, but they are slower than PostgreSQL)
When to use: Content management, product catalogs, user profiles, event logging, any case where your schema is highly variable or you embed related data within a document.
# MongoDB: Common operations
from pymongo import MongoClient
client = MongoClient("mongodb://localhost:27017")
db = client.myapp
# Insert
db.products.insert_one({
"name": "Keyboard",
"price": 149.99,
"specs": {"switches": "Cherry MX Brown", "layout": "TKL"}
})
# Query with nested field access
results = db.products.find({
"price": {"$lt": 200},
"specs.switches": {"$regex": "Cherry"}
}).sort("price", 1).limit(10)
# Aggregation pipeline — like SQL GROUP BY but more flexible
pipeline = [
{"$match": {"category": "electronics"}},
{"$group": {
"_id": "$specs.brand",
"avg_price": {"$avg": "$price"},
"count": {"$sum": 1}
}},
{"$sort": {"avg_price": -1}},
{"$limit": 10}
]
results = db.products.aggregate(pipeline)Wide-Column Stores (Cassandra, DynamoDB, HBase)
Wide-column stores organize data by partition key and sort key. They are optimized for write-heavy workloads and can scale horizontally by adding nodes.
# Cassandra data model for a messaging app
# Partition key: conversation_id (all messages in a conversation co-located)
# Clustering key: sent_at (messages sorted by time within a partition)
CREATE TABLE messages (
conversation_id UUID,
sent_at TIMESTAMP,
sender_id UUID,
message_text TEXT,
attachments LIST<TEXT>,
PRIMARY KEY (conversation_id, sent_at)
) WITH CLUSTERING ORDER BY (sent_at DESC);The critical concept: you must model your data around your queries, not your entities. In a relational database, you normalize your data and write any query you want. In Cassandra, you denormalize and design your tables to answer specific queries efficiently.
# Cassandra: Model for access pattern
from cassandra.cluster import Cluster
cluster = Cluster(['10.0.0.1', '10.0.0.2', '10.0.0.3'])
session = cluster.connect('messaging')
# Write a message — extremely fast, distributed across nodes
session.execute("""
INSERT INTO messages (conversation_id, sent_at, sender_id, message_text)
VALUES (%s, %s, %s, %s)
""", [conversation_id, datetime.utcnow(), sender_id, text])
# Read latest 50 messages in a conversation — partition key lookup + range scan
rows = session.execute("""
SELECT sender_id, message_text, sent_at
FROM messages
WHERE conversation_id = %s
ORDER BY sent_at DESC
LIMIT 50
""", [conversation_id])
# This is fast because all messages for a conversation are on the same partition
# and sorted by sent_at. It's a single-partition range scan.
# This is SLOW and should be avoided:
# SELECT * FROM messages WHERE sender_id = some_id
# (Full table scan — sender_id is not the partition key)DynamoDB follows similar principles with partition key and sort key, but is fully managed by AWS:
import boto3
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('Messages')
# Write
table.put_item(Item={
'conversation_id': str(conversation_id),
'sent_at': datetime.utcnow().isoformat(),
'sender_id': str(sender_id),
'message_text': text
})
# Query — partition key + sort key range
response = table.query(
KeyConditionExpression='conversation_id = :conv AND sent_at > :since',
ExpressionAttributeValues={
':conv': str(conversation_id),
':since': one_hour_ago.isoformat()
},
ScanIndexForward=False, # Descending order
Limit=50
)When to use: IoT sensor data, messaging (billions of messages), time-series logs, activity feeds, any workload that is write-heavy (100K+ writes/sec) with partition-key-based reads.
Key-Value Stores (Redis, Memcached)
The simplest data model: a key maps to a value. Redis extends this with rich data structures (strings, lists, sets, sorted sets, hashes, streams).
import redis
r = redis.Redis(host='localhost', port=6379)
# Session storage
r.setex("session:abc123", 3600, json.dumps({
"user_id": 42,
"email": "[email protected]",
"role": "admin"
}))
# Rate limiting with sliding window
def is_rate_limited(user_id, limit=100, window=60):
key = f"rate:{user_id}"
now = time.time()
pipe = r.pipeline()
pipe.zremrangebyscore(key, 0, now - window) # Remove old entries
pipe.zadd(key, {str(now): now}) # Add current request
pipe.zcard(key) # Count requests in window
pipe.expire(key, window) # Set TTL
results = pipe.execute()
count = results[2]
return count > limit
# Leaderboard with sorted sets
r.zadd("leaderboard:daily", {"player_a": 1500, "player_b": 2300, "player_c": 1800})
top_10 = r.zrevrange("leaderboard:daily", 0, 9, withscores=True)
# Pub/Sub for real-time notifications
r.publish("notifications:user:42", json.dumps({
"type": "new_message",
"from": "bob",
"preview": "Hey, are you coming to..."
}))
# Distributed lock
lock = r.lock("resource:order-123", timeout=10)
if lock.acquire(blocking=True, blocking_timeout=5):
try:
process_order("order-123")
finally:
lock.release()When to use: Caching (database query results, API responses, session data), rate limiting, leaderboards, real-time counters, distributed locks, pub/sub messaging. Any use case where sub-millisecond latency matters and data fits in memory.
Do not use for: Primary data storage (Redis can persist to disk, but it is not designed as a primary database). Data that does not fit in memory. Complex queries.
Graph Databases (Neo4j, Amazon Neptune)
Graph databases store data as nodes (entities) and edges (relationships). Traversing relationships is their core operation, and they do it orders of magnitude faster than SQL JOINs for deep traversals.
// Neo4j Cypher query language
// Create social graph
CREATE (alice:User {name: "Alice", age: 30})
CREATE (bob:User {name: "Bob", age: 28})
CREATE (charlie:User {name: "Charlie", age: 32})
CREATE (alice)-[:FOLLOWS]->(bob)
CREATE (bob)-[:FOLLOWS]->(charlie)
CREATE (alice)-[:FOLLOWS]->(charlie)
// Find friends of friends (2-hop traversal) — fast in a graph DB
MATCH (me:User {name: "Alice"})-[:FOLLOWS]->()-[:FOLLOWS]->(fof:User)
WHERE NOT (me)-[:FOLLOWS]->(fof) AND me <> fof
RETURN fof.name
// Shortest path between two users
MATCH path = shortestPath(
(a:User {name: "Alice"})-[:FOLLOWS*..10]-(b:User {name: "Zara"})
)
RETURN path
// Recommendation: "Users who follow the same people as you also follow..."
MATCH (me:User {name: "Alice"})-[:FOLLOWS]->(common)<-[:FOLLOWS]-(similar)
WHERE me <> similar
MATCH (similar)-[:FOLLOWS]->(recommended)
WHERE NOT (me)-[:FOLLOWS]->(recommended) AND me <> recommended
RETURN recommended.name, COUNT(*) AS score
ORDER BY score DESC
LIMIT 10The same “friends of friends” query in SQL requires self-joins and becomes exponentially slower with each hop. In a graph database, it is a constant-time traversal per edge.
When to use: Social networks, fraud detection (suspicious transaction patterns), recommendation engines, knowledge graphs, network topology, any domain where relationships between entities are the primary query pattern.
Time-Series Databases (InfluxDB, TimescaleDB, QuestDB)
Optimized for append-heavy workloads with timestamps. They compress time-series data efficiently and support time-based aggregations natively.
-- TimescaleDB (PostgreSQL extension) — best of both worlds
CREATE TABLE metrics (
time TIMESTAMPTZ NOT NULL,
host TEXT NOT NULL,
cpu_usage DOUBLE PRECISION,
memory_mb INTEGER,
disk_iops INTEGER
);
-- Convert to hypertable (time-series optimized)
SELECT create_hypertable('metrics', 'time');
-- Insert (same as regular PostgreSQL)
INSERT INTO metrics (time, host, cpu_usage, memory_mb, disk_iops)
VALUES (NOW(), 'web-01', 72.5, 4096, 1200);
-- Time-bucket aggregation — built-in function
SELECT
time_bucket('5 minutes', time) AS bucket,
host,
AVG(cpu_usage) AS avg_cpu,
MAX(cpu_usage) AS max_cpu,
AVG(memory_mb) AS avg_memory
FROM metrics
WHERE time > NOW() - INTERVAL '1 hour'
GROUP BY bucket, host
ORDER BY bucket DESC;
-- Continuous aggregate (materialized view that auto-updates)
CREATE MATERIALIZED VIEW metrics_hourly
WITH (timescaledb.continuous) AS
SELECT
time_bucket('1 hour', time) AS hour,
host,
AVG(cpu_usage) AS avg_cpu,
MAX(cpu_usage) AS max_cpu,
COUNT(*) AS sample_count
FROM metrics
GROUP BY hour, host;
-- Automatic data retention — drop data older than 90 days
SELECT add_retention_policy('metrics', INTERVAL '90 days');When to use: Infrastructure monitoring, IoT sensor data, financial market data, application performance metrics, any workload that is timestamp-ordered, write-heavy, and queried by time range.
NewSQL: The Middle Ground
NewSQL databases aim to combine the best of both worlds: the SQL interface and ACID guarantees of relational databases with the horizontal scalability of NoSQL.
CockroachDB
Distributed SQL database that automatically shards and replicates data across nodes. It uses a Raft consensus protocol for strong consistency.
-- CockroachDB: Standard SQL with automatic distribution
CREATE TABLE orders (
id UUID DEFAULT gen_random_uuid() PRIMARY KEY,
user_id UUID NOT NULL,
total DECIMAL(10, 2) NOT NULL,
status TEXT NOT NULL DEFAULT 'pending',
created_at TIMESTAMPTZ DEFAULT NOW(),
INDEX idx_user_orders (user_id, created_at DESC)
);
-- Multi-region configuration
ALTER DATABASE myapp SET PRIMARY REGION = 'us-east1';
ALTER DATABASE myapp ADD REGION 'eu-west1';
ALTER DATABASE myapp ADD REGION 'ap-southeast1';
-- Pin latency-sensitive tables to the user's region
ALTER TABLE user_profiles SET LOCALITY REGIONAL BY ROW;Google Spanner
Google’s globally distributed database uses TrueTime (atomic clocks + GPS) for globally consistent reads without the latency penalty of traditional consensus.
When to use NewSQL: You need SQL semantics + ACID transactions + horizontal write scaling. Financial systems that outgrow single-node PostgreSQL. Global applications that need consistent reads across regions.
Tradeoff: Higher operational complexity (CockroachDB) or vendor lock-in (Spanner). Query latency is higher than a single PostgreSQL node for simple queries because of the distributed consensus overhead.
The Decision Framework
When choosing a database for a system design, work through these questions:
1. What are your access patterns?
Simple key-value lookups -> Redis / DynamoDB
CRUD with complex queries and JOINs -> PostgreSQL
Write-heavy time-ordered data -> Cassandra / TimescaleDB
Full-text search -> Elasticsearch
Graph traversals -> Neo4j2. What consistency do you need?
Strong consistency (financial, inventory) -> PostgreSQL / CockroachDB
Tunable (mostly eventual, sometimes strong) -> Cassandra / DynamoDB
Eventual is fine (feeds, analytics) -> Cassandra / DynamoDB / MongoDB3. What scale do you need?
< 10K requests/sec -> Single PostgreSQL handles this easily
10K-100K req/sec -> PostgreSQL with read replicas + Redis cache
100K+ writes/sec -> Cassandra / DynamoDB / CockroachDB
Millions of reads/sec -> Redis cluster + CDN4. What is your team’s expertise?
This matters more than most engineers admit. A team that knows PostgreSQL deeply will build a more reliable system on PostgreSQL than on Cassandra, even if Cassandra is theoretically a better fit. The operational overhead of running a database you do not understand is enormous.
Polyglot Persistence — Using Multiple Databases
Most production systems use 2-4 databases, each for its strengths:
class ECommerceDataLayer:
"""
Typical e-commerce data architecture:
- PostgreSQL: Orders, users, inventory (ACID, source of truth)
- Redis: Sessions, cart, rate limiting, caching (speed)
- Elasticsearch: Product search, autocomplete (full-text)
- S3: Product images, invoices (blob storage)
"""
def __init__(self):
self.postgres = PostgresPool(dsn="...")
self.redis = RedisCluster(hosts=["..."])
self.elasticsearch = Elasticsearch(hosts=["..."])
self.s3 = boto3.client('s3')
def search_products(self, query, filters):
"""Elasticsearch for search, PostgreSQL for full details."""
# Fast search in Elasticsearch
es_results = self.elasticsearch.search(
index="products",
body={
"query": {
"bool": {
"must": {"multi_match": {
"query": query,
"fields": ["name^3", "description", "tags"]
}},
"filter": [
{"range": {"price": {"lte": filters.get("max_price", 99999)}}},
{"term": {"in_stock": True}}
]
}
}
}
)
# Get full product details from PostgreSQL (source of truth)
product_ids = [hit["_id"] for hit in es_results["hits"]["hits"]]
products = self.postgres.fetch_all(
"SELECT * FROM products WHERE id = ANY(%s)",
[product_ids]
)
return products
def place_order(self, user_id, cart_items):
"""PostgreSQL transaction for order placement."""
with self.postgres.transaction() as tx:
# Check and decrement inventory (atomic)
for item in cart_items:
rows_updated = tx.execute("""
UPDATE inventory
SET quantity = quantity - %s
WHERE product_id = %s AND quantity >= %s
""", [item.qty, item.product_id, item.qty])
if rows_updated == 0:
raise OutOfStockError(item.product_id)
# Create order
order_id = tx.execute("""
INSERT INTO orders (user_id, total, status)
VALUES (%s, %s, 'confirmed')
RETURNING id
""", [user_id, sum(i.price * i.qty for i in cart_items)])
# Clear cart from Redis
self.redis.delete(f"cart:{user_id}")
return order_idCommon Mistakes in Database Selection
-
Choosing based on hype. “Everyone uses MongoDB” is not a technical argument. Choose based on access patterns and requirements.
-
Premature NoSQL. Many teams pick Cassandra or DynamoDB for a workload that PostgreSQL handles easily at their scale. You are introducing operational complexity for theoretical future scale that may never arrive.
-
Using one database for everything. PostgreSQL is great, but it is not a cache, a search engine, or a message queue. Use the right tool for each job.
-
Ignoring operational cost. Running a 9-node Cassandra cluster requires expertise. If your team has two backend engineers, a managed service (DynamoDB, Cloud SQL) may be worth the premium.
-
Not modeling for access patterns. In Cassandra and DynamoDB, your table design must match your query patterns. Designing tables first and queries second (the relational mindset) leads to full table scans and hot partitions.
-
Forgetting about data migration. Switching databases after 2 years of production data is painful. Consider future access patterns, not just today’s MVP.
Key Takeaways
-
Default to PostgreSQL. It covers 80% of use cases with ACID transactions, JSONB flexibility, full-text search, and extensions. Only move to specialized databases when you hit a specific limitation.
-
The question is not SQL vs NoSQL. It is: what data model and access pattern does your problem require? Relational for complex queries and joins. Document for flexible schemas. Wide-column for massive write throughput. Key-value for sub-millisecond caching. Graph for relationship traversals.
-
Model your data around your queries. In relational databases, you normalize and query flexibly. In wide-column stores, you denormalize and design tables to answer specific queries. In document stores, you embed related data. The data model must match the access pattern.
-
Use polyglot persistence. Most production systems use 2-3 databases: a relational database as the source of truth, Redis for caching and sessions, and optionally a search engine or specialized store. This is normal and expected.
-
Consistency is a spectrum. PostgreSQL gives you strong consistency. Cassandra gives you eventual consistency (tunable to strong per-query). DynamoDB offers both. Choose based on what your application actually requires, not a blanket policy.
-
Scale limits are higher than you think. A single PostgreSQL instance on good hardware handles millions of rows and thousands of queries per second. Add read replicas and a Redis cache, and you cover most applications. Do not over-engineer for scale you do not have.
-
Operational cost is part of the decision. A database your team knows how to run, monitor, and debug is more valuable than a theoretically optimal choice that nobody understands. Factor in team expertise, managed service availability, and the cost of hiring specialists.
-
NewSQL is real but niche. CockroachDB and Spanner solve the “I need SQL semantics at NoSQL scale” problem. If that is your problem, evaluate them. If you are not yet at that scale, you probably do not need them.
