software-design|March 21, 2026|14 min read

Deep Dive on Elasticsearch: A System Design Interview Perspective

TL;DR

Elasticsearch is a distributed search engine built on Apache Lucene. It stores data in inverted indexes across shards distributed over a cluster. Writes go through a near real-time pipeline: buffer → refresh (1s) → searchable segment → flush to disk. Reads use scatter-gather: the coordinating node fans out to all shards, each returns top-N doc IDs with scores, and the coordinator merges results globally. For system design interviews, know when to reach for ES (full-text search, log analytics, faceted navigation) vs when not to (primary data store, strong consistency, frequent updates). Key scaling levers: shard count, replica count, index lifecycle management, and denormalized document design.

Deep Dive on Elasticsearch: A System Design Interview Perspective

“If you’re searching, filtering, or aggregating over large volumes of semi-structured data, Elasticsearch is probably the right tool. If you need ACID transactions, it’s definitely the wrong one.”

Elasticsearch appears in almost every system design interview that involves search — product search, log analytics, autocomplete, geospatial queries, or real-time dashboards. This article covers what you need to know: how it works under the hood, how to design around its strengths and weaknesses, and the tradeoffs that come up in interviews.

What is Elasticsearch?

Elasticsearch is a distributed, RESTful search and analytics engine built on top of Apache Lucene. It stores JSON documents, indexes them into inverted indexes, and provides fast full-text search, structured queries, and aggregations.

Core properties:

  • Schema-free (but schema-aware via mappings)
  • Near real-time search (~1 second delay after indexing)
  • Horizontally scalable via sharding
  • High availability via replication
  • REST API — every operation is an HTTP request

The inverted index is the data structure that makes Elasticsearch fast. Instead of scanning every document for a term, you look up the term and instantly get the list of documents containing it.

Inverted Index

How Text Analysis Works

When a document is indexed, each text field goes through an analyzer pipeline:

"The Quick Brown Fox Jumps!"
        ↓ Character Filter (strip HTML, etc.)
"The Quick Brown Fox Jumps"
        ↓ Tokenizer (split on whitespace/punctuation)
["The", "Quick", "Brown", "Fox", "Jumps"]
        ↓ Token Filters (lowercase, stemming, stop words)
["quick", "brown", "fox", "jump"]

The same analyzer runs on the search query, so “Quick FOX” becomes ["quick", "fox"] — matching the indexed terms.

// Custom analyzer example
PUT /products
{
  "settings": {
    "analysis": {
      "analyzer": {
        "product_analyzer": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": ["lowercase", "english_stemmer", "english_stop"]
        }
      },
      "filter": {
        "english_stemmer": { "type": "stemmer", "language": "english" },
        "english_stop": { "type": "stop", "stopwords": "_english_" }
      }
    }
  }
}

BM25: The Relevance Algorithm

Elasticsearch uses BM25 (Best Matching 25) to score documents. Two key factors:

  • Term Frequency (TF) — how often the term appears in the document (more = higher score, with diminishing returns)
  • Inverse Document Frequency (IDF) — how rare the term is across all documents (rarer = higher score)
score(q, d) = Σ IDF(t) × [tf(t,d) × (k1 + 1)] / [tf(t,d) + k1 × (1 - b + b × |d| / avgdl)]

In interviews, you don’t need the formula — just know: rare terms matching frequently in short documents score highest.

Cluster Architecture

An Elasticsearch cluster is a collection of nodes, each serving one or more roles.

Elasticsearch Architecture

Node Roles

Role Purpose Resource Profile
Master-eligible Cluster state management, index creation/deletion, shard allocation Low CPU/memory, high stability
Data Stores shards, executes queries and aggregations High memory, high disk I/O
Coordinating Routes requests, merges results from shards Moderate CPU/memory
Ingest Pre-processing pipeline (transforms before indexing) Moderate CPU
ML Machine learning jobs High CPU/memory

Interview tip: In a production cluster, always have dedicated master nodes (typically 3) to prevent the cluster state from being affected by heavy data operations.

# elasticsearch.yml — dedicated master node
node.master: true
node.data: false
node.ingest: false

# elasticsearch.yml — dedicated data node
node.master: false
node.data: true
node.ingest: false

Shards: The Unit of Scale

An index is split into shards, each being a self-contained Lucene index.

// Create index with 5 primary shards and 1 replica
PUT /products
{
  "settings": {
    "number_of_shards": 5,
    "number_of_replicas": 1
  }
}

Shard routing formula:

shard_number = hash(_routing) % number_of_primary_shards

By default, _routing = _id. This means the number of primary shards is fixed at index creation — you can’t change it later without reindexing.

Index: products (5 shards, 1 replica)

replica

replica

replica

replica

replica

P0

R0

P1

R1

P2

R2

P3

R3

P4

R4

Shard sizing guidelines:

  • Target 10-50 GB per shard (sweet spot for most workloads)
  • Each shard has overhead (~500MB heap), so don’t over-shard
  • Too few shards = can’t scale writes; too many = overhead and slow queries
  • Rule of thumb: number of shards = ceil(expected_data_size / 30GB)

Write Path: How Documents Get Indexed

Understanding the write path is critical for interview discussions about consistency and latency.

Write and Read Paths

Step-by-Step Write Flow

Client: POST /index/_doc

Coordinating Node

Route to Primary Shard

Write to In-Memory Buffer

Append to Translog

Refresh interval 1s

Create New Lucene Segment

Document NOW Searchable

Flush trigger

fsync Segments to Disk

Clear Translog

Replicate to Replica Shards

Key concepts:

  1. Translog (Transaction Log): Write-ahead log for durability. Every write is appended to the translog before acknowledgment. If the node crashes before flush, the translog replays on restart.

  2. Refresh: Every 1 second (configurable), the in-memory buffer is written to a new Lucene segment. This is when the document becomes searchable. This is why ES is “near real-time,” not real-time.

  3. Flush: Periodically, all in-memory segments are fsynced to disk and the translog is cleared. This is the Lucene “commit” operation.

  4. Merge: Background process that combines small segments into larger ones. Reduces the number of segments to search and reclaims space from deleted documents.

// Force refresh (make recent writes searchable immediately)
POST /products/_refresh

// Adjust refresh interval (e.g., for bulk indexing)
PUT /products/_settings
{
  "index.refresh_interval": "30s"
}

// Disable refresh during bulk load, re-enable after
PUT /products/_settings { "index.refresh_interval": "-1" }
// ... bulk index ...
PUT /products/_settings { "index.refresh_interval": "1s" }

Write Consistency

ES uses a primary-backup model:

  1. Write hits the primary shard
  2. Primary replicates to all in-sync replica shards in parallel
  3. Write is acknowledged only after primary + all in-sync replicas confirm

You can control this with the wait_for_active_shards parameter:

// Wait for all shards (primary + replicas) before acknowledging
PUT /products/_doc/1?wait_for_active_shards=all
{ "name": "laptop" }

// Wait for just the primary (faster, less durable)
PUT /products/_doc/1?wait_for_active_shards=1
{ "name": "laptop" }

Read Path: How Search Works

Scatter-Gather Pattern

Every search query follows the scatter-gather (two-phase) pattern:

Query Phase:

  1. Coordinating node receives the search request
  2. Forwards to one copy (primary or replica) of every relevant shard
  3. Each shard runs the query locally, returns only doc IDs + scores (top N)
  4. Coordinating node merges results into a global top-N list

Fetch Phase: 5. Coordinating node requests full documents for only the final top-N results 6. Returns complete JSON response to the client

// Search with explanation of scoring
GET /products/_search
{
  "query": {
    "bool": {
      "must": [
        { "match": { "title": "wireless headphones" } }
      ],
      "filter": [
        { "term": { "category": "electronics" } },
        { "range": { "price": { "lte": 100 } } }
      ]
    }
  },
  "sort": [
    { "_score": "desc" },
    { "created_at": "desc" }
  ],
  "from": 0,
  "size": 20
}

Interview insight: The filter context is critical for performance. Filters don’t calculate relevance scores and their results are cached in a bitset. Always put exact-match conditions (category, status, price range) in filter, not must.

Deep Pagination Problem

// Page 1: fast
GET /products/_search { "from": 0, "size": 20 }

// Page 500: SLOW — each shard must return 10,000 results
GET /products/_search { "from": 9980, "size": 20 }

Each shard must produce from + size results, the coordinator merges num_shards × (from + size) results. At deep offsets, this is extremely expensive.

Solutions:

// Solution 1: search_after (recommended for deep pagination)
// First page
GET /products/_search
{
  "size": 20,
  "sort": [{ "created_at": "desc" }, { "_id": "asc" }]
}
// Next page — use the last document's sort values
GET /products/_search
{
  "size": 20,
  "sort": [{ "created_at": "desc" }, { "_id": "asc" }],
  "search_after": ["2026-03-20T10:00:00", "abc123"]
}

// Solution 2: scroll API (for processing all results, e.g., export)
POST /products/_search?scroll=5m
{ "size": 1000, "query": { "match_all": {} } }
// Then: POST /_search/scroll { "scroll": "5m", "scroll_id": "..." }

// Solution 3: Point-in-time + search_after (best for concurrent readers)
POST /products/_pit?keep_alive=5m
// Returns: { "id": "pit_id_here" }
GET /_search
{
  "pit": { "id": "pit_id_here", "keep_alive": "5m" },
  "size": 20,
  "sort": [{ "created_at": "desc" }],
  "search_after": [...]
}

Mappings: Schema Design

Mappings define how fields are indexed and stored. Getting mappings right is essential for performance.

PUT /products
{
  "mappings": {
    "properties": {
      "title":       { "type": "text", "analyzer": "english" },
      "title_exact": { "type": "keyword" },
      "description": { "type": "text" },
      "price":       { "type": "float" },
      "category":    { "type": "keyword" },
      "tags":        { "type": "keyword" },
      "created_at":  { "type": "date" },
      "location":    { "type": "geo_point" },
      "in_stock":    { "type": "boolean" },
      "metadata":    { "type": "object", "enabled": false }
    }
  }
}

text vs keyword

text keyword
Analyzed Yes (tokenized, lowercased, stemmed) No (stored as-is)
Use for Full-text search Exact match, sorting, aggregations
Example Product description Category, status, email
Query match, multi_match term, terms, range

Common pattern: Map the same field as both:

{
  "title": {
    "type": "text",
    "fields": {
      "raw": { "type": "keyword" }
    }
  }
}
// Search on "title", sort/aggregate on "title.raw"

Dynamic Mapping Pitfalls

By default, ES auto-creates mappings for new fields. This is dangerous in production:

// Disable dynamic mapping to prevent mapping explosion
PUT /products
{
  "mappings": {
    "dynamic": "strict",
    "properties": { ... }
  }
}
// "strict" = reject docs with unmapped fields
// "false"  = accept docs but don't index unmapped fields
// "true"   = auto-map everything (default, risky)

Aggregations: Analytics at Scale

Aggregations run alongside search queries to compute summaries over matched documents.

GET /orders/_search
{
  "size": 0,
  "query": {
    "range": { "created_at": { "gte": "2026-01-01" } }
  },
  "aggs": {
    "revenue_by_category": {
      "terms": { "field": "category", "size": 20 },
      "aggs": {
        "total_revenue": { "sum": { "field": "amount" } },
        "avg_order_value": { "avg": { "field": "amount" } }
      }
    },
    "monthly_trend": {
      "date_histogram": {
        "field": "created_at",
        "calendar_interval": "month"
      },
      "aggs": {
        "revenue": { "sum": { "field": "amount" } }
      }
    }
  }
}

Three aggregation types:

Type Purpose Example
Bucket Group documents terms, date_histogram, range, filters
Metric Compute values sum, avg, min, max, cardinality
Pipeline Aggregate over other aggs moving_avg, derivative, cumulative_sum

Performance tip: Aggregations use doc_values (column-oriented on-disk data structure) by default for keyword and numeric fields. Never aggregate on text fields — use a keyword sub-field.

Common System Design Patterns

1. Product Search (E-commerce)

CDC / Change Events

writes

search queries

fetch by ID

Primary DB

PostgreSQL

Message Queue

Kafka

Indexer Service

Elasticsearch

Application

Key design decisions:

  • Source of truth is the primary database, not ES
  • Use CDC (Change Data Capture) or event-driven indexing to keep ES in sync
  • Fetch full product details from the primary DB by ID after getting search results
  • Use filter context for faceted navigation (category, brand, price range)

2. Log Analytics (ELK Stack)

App Server 1

Filebeat

App Server 2

Filebeat

App Server 3

Filebeat

Logstash

parse + enrich

Elasticsearch

Kibana

dashboards

Index strategy for logs:

  • Use time-based indices: logs-2026.03.21
  • Configure Index Lifecycle Management (ILM) to automatically:
    • Hot → Warm → Cold → Delete
    • Roll over when index hits 50GB or 30 days
// ILM policy
PUT _ilm/policy/logs_policy
{
  "policy": {
    "phases": {
      "hot":    { "actions": { "rollover": { "max_size": "50gb", "max_age": "1d" } } },
      "warm":   { "min_age": "7d",  "actions": { "shrink": { "number_of_shards": 1 }, "forcemerge": { "max_num_segments": 1 } } },
      "cold":   { "min_age": "30d", "actions": { "freeze": {} } },
      "delete": { "min_age": "90d", "actions": { "delete": {} } }
    }
  }
}

3. Autocomplete / Search-as-You-Type

// Mapping with completion suggester
PUT /products
{
  "mappings": {
    "properties": {
      "suggest": {
        "type": "completion"
      },
      "title": {
        "type": "search_as_you_type"
      }
    }
  }
}

// Index with suggestions
PUT /products/_doc/1
{
  "title": "Apple MacBook Pro 16-inch",
  "suggest": {
    "input": ["Apple MacBook Pro", "MacBook Pro 16", "laptop"],
    "weight": 10
  }
}

// Autocomplete query (extremely fast — uses FST, not inverted index)
GET /products/_search
{
  "suggest": {
    "product_suggest": {
      "prefix": "mac",
      "completion": {
        "field": "suggest",
        "size": 5,
        "fuzzy": { "fuzziness": 1 }
      }
    }
  }
}
// Find restaurants within 5km
GET /restaurants/_search
{
  "query": {
    "bool": {
      "must": { "match": { "cuisine": "italian" } },
      "filter": {
        "geo_distance": {
          "distance": "5km",
          "location": { "lat": 40.7128, "lon": -74.0060 }
        }
      }
    }
  },
  "sort": [
    {
      "_geo_distance": {
        "location": { "lat": 40.7128, "lon": -74.0060 },
        "order": "asc",
        "unit": "km"
      }
    }
  ]
}

Scaling Strategies

Horizontal Scaling Decision Matrix

Read latency

Write throughput

Data volume

Query complexity

Performance Problem?

Read or Write?

Add Replicas

More copies to query

Add Primary Shards

More shards = more parallel writes

Add Data Nodes

More disk + memory

Add Coordinating Nodes

Offload merge + aggregation

Index design?

Time-based indices

+ ILM rollover

Index aliases

for zero-downtime reindex

Reindexing Without Downtime

You can’t change the number of primary shards or modify certain mapping properties on a live index. The solution is index aliases:

// Initial setup
PUT /products_v1 { ... }
POST /_aliases { "actions": [{ "add": { "index": "products_v1", "alias": "products" } }] }

// Application always queries "products" alias

// When you need to reindex:
PUT /products_v2 { ... }  // new settings/mappings
POST /_reindex { "source": { "index": "products_v1" }, "dest": { "index": "products_v2" } }

// Atomic swap
POST /_aliases {
  "actions": [
    { "remove": { "index": "products_v1", "alias": "products" } },
    { "add":    { "index": "products_v2", "alias": "products" } }
  ]
}

Performance Optimization Checklist

Indexing Performance

// Bulk API — always use for batch indexing
POST /_bulk
{"index":{"_index":"products","_id":"1"}}
{"title":"Laptop","price":999}
{"index":{"_index":"products","_id":"2"}}
{"title":"Phone","price":699}

// Optimal bulk size: 5-15 MB per request
  • Use bulk API — never index one document at a time
  • Increase refresh interval to 30s during bulk loads
  • Disable replicas during initial load, re-enable after
  • Use _routing to co-locate related documents on the same shard

Query Performance

  • Use filter over must for exact matches — filters are cached and skip scoring
  • Avoid wildcards at the start of terms (*phone is slow; phone* is fine)
  • Limit from + size to under 10,000 — use search_after for deeper pagination
  • Use _source filtering to return only needed fields:
GET /products/_search
{
  "_source": ["title", "price", "category"],
  "query": { "match": { "title": "laptop" } }
}
  • Pre-warm field data for frequently aggregated fields
  • Avoid scripts in queries when possible — use runtime fields or indexed fields instead

Cluster Health

// Check cluster health
GET /_cluster/health
// green = all shards allocated
// yellow = primary shards OK, some replicas unassigned
// red = some primary shards unassigned (data loss risk!)

// Check shard allocation
GET /_cat/shards?v&h=index,shard,prirep,state,node

// Check node stats
GET /_nodes/stats/jvm,os,fs

Elasticsearch vs Alternatives

Feature Elasticsearch Apache Solr Typesense Meilisearch
Built on Lucene Lucene Custom C++ Custom Rust
Query language Query DSL (JSON) Solr Query / JSON Simple API Simple API
Scaling Automatic shard rebalancing Manual shard management Built-in raft Single node (HA planned)
Aggregations Very powerful Powerful Basic Basic
Typo tolerance Manual (fuzzy) Manual Built-in Built-in
Operational complexity High High Low Very low
Best for Large-scale search + analytics Enterprise search Developer-friendly search Small-medium search

Interview Cheat Sheet

When to Use Elasticsearch

  • Full-text search across large document collections
  • Log/event analytics with aggregations
  • Autocomplete and search-as-you-type
  • Faceted navigation (e-commerce filters)
  • Geospatial queries
  • Real-time dashboards over time-series data

When NOT to Use Elasticsearch

  • Primary data store — ES is not ACID, data loss is possible
  • Frequent updates to the same document — each update = delete + reindex
  • Strong consistency requirements — ES is eventually consistent (near real-time)
  • Transactions — no multi-document transactions
  • Small datasets — overhead isn’t worth it under ~100K documents
  • Simple key-value lookups — use Redis or a database

Key Numbers to Know

Metric Typical Value
Indexing to searchable ~1 second (refresh interval)
Search latency (simple) 5-50 ms
Search latency (complex aggs) 50-500 ms
Shard size sweet spot 10-50 GB
Max from + size 10,000 (default)
Heap recommendation 50% of RAM, max 31 GB
Shards per node < 600 recommended

Interview Answer Template

When an interviewer asks you to design a search feature:

  1. Source of truth — primary DB (PostgreSQL/MySQL), not ES
  2. Sync mechanism — CDC via Kafka/Debezium, or application-level dual writes
  3. Index design — define mappings, analyzers, shard count based on data size
  4. Query designbool query with must for relevance, filter for exact matches
  5. Paginationsearch_after for deep pagination, never from > 10,000
  6. Scaling — replicas for read throughput, time-based indices for logs, ILM for lifecycle
  7. Failure handling — what happens if ES is down? Degrade gracefully, queue writes for replay

Wrapping Up

Elasticsearch is a powerful tool, but it’s not a database replacement. It’s a secondary index — a read-optimized, denormalized view of your data that enables search and analytics capabilities that traditional databases can’t match.

The mental model for system design interviews:

  • Write path: App → Primary DB → Event/CDC → ES (async, eventually consistent)
  • Read path: App → ES for search/filter/aggregate → Primary DB for full record (if needed)
  • Scaling: Shards for write parallelism, replicas for read throughput, ILM for data lifecycle
  • Tradeoff: Near real-time (not real-time), eventually consistent (not strongly consistent), optimized for reads (not writes)

Master these concepts and you’ll be able to confidently design any search-heavy system in an interview.

Related Posts

Deep Dive on Apache Kafka: A System Design Interview Perspective

Deep Dive on Apache Kafka: A System Design Interview Perspective

“Kafka is not a message queue. It’s a distributed commit log that happens to be…

Deep Dive on Redis: Architecture, Data Structures, and Production Usage

Deep Dive on Redis: Architecture, Data Structures, and Production Usage

“Redis is not just a cache. It’s a data structure server that happens to be…

Deep Dive on Caching: From Browser to Database

Deep Dive on Caching: From Browser to Database

“There are only two hard things in Computer Science: cache invalidation and…

System Design Patterns for Real-Time Updates at High Traffic

System Design Patterns for Real-Time Updates at High Traffic

The previous articles in this series covered scaling reads and scaling writes…

System Design Patterns for Scaling Writes

System Design Patterns for Scaling Writes

In the companion article on scaling reads, we covered caching, replicas, and…

System Design Patterns for Scaling Reads

System Design Patterns for Scaling Reads

Most production systems are read-heavy. A typical web application sees 90-9…

Latest Posts

Deep Dive on Apache Kafka: A System Design Interview Perspective

Deep Dive on Apache Kafka: A System Design Interview Perspective

“Kafka is not a message queue. It’s a distributed commit log that happens to be…

Deep Dive on Redis: Architecture, Data Structures, and Production Usage

Deep Dive on Redis: Architecture, Data Structures, and Production Usage

“Redis is not just a cache. It’s a data structure server that happens to be…

Deep Dive on API Gateway: A System Design Interview Perspective

Deep Dive on API Gateway: A System Design Interview Perspective

“An API Gateway is the front door to your microservices. Every request walks…

REST API Design: Pagination, Versioning, and Best Practices

REST API Design: Pagination, Versioning, and Best Practices

Every time two systems need to talk, someone has to design the contract between…

Efficient Data Modelling: A Practical Guide for Production Systems

Efficient Data Modelling: A Practical Guide for Production Systems

Most engineers learn data modelling backwards. They draw an ER diagram…

Deep Dive on Caching: From Browser to Database

Deep Dive on Caching: From Browser to Database

“There are only two hard things in Computer Science: cache invalidation and…