Deep Dive on Elasticsearch: A System Design Interview Perspective

“If you’re searching, filtering, or aggregating over large volumes of semi-structured data, Elasticsearch is probably the right tool. If you need ACID transactions, it’s definitely the wrong one.”

Elasticsearch appears in almost every system design interview that involves search — product search, log analytics, autocomplete, geospatial queries, or real-time dashboards. This article covers what you need to know: how it works under the hood, how to design around its strengths and weaknesses, and the tradeoffs that come up in interviews.

What is Elasticsearch?

Elasticsearch is a distributed, RESTful search and analytics engine built on top of Apache Lucene. It stores JSON documents, indexes them into inverted indexes, and provides fast full-text search, structured queries, and aggregations.

Core properties:

Schema-free (but schema-aware via mappings)
Near real-time search (~1 second delay after indexing)
Horizontally scalable via sharding
High availability via replication
REST API — every operation is an HTTP request

The Inverted Index: Foundation of Search

The inverted index is the data structure that makes Elasticsearch fast. Instead of scanning every document for a term, you look up the term and instantly get the list of documents containing it.

Inverted Index

How Text Analysis Works

When a document is indexed, each text field goes through an analyzer pipeline:

"The Quick Brown Fox Jumps!"
        ↓ Character Filter (strip HTML, etc.)
"The Quick Brown Fox Jumps"
        ↓ Tokenizer (split on whitespace/punctuation)
["The", "Quick", "Brown", "Fox", "Jumps"]
        ↓ Token Filters (lowercase, stemming, stop words)
["quick", "brown", "fox", "jump"]

The same analyzer runs on the search query, so “Quick FOX” becomes ["quick", "fox"] — matching the indexed terms.

// Custom analyzer example
PUT /products
{
  "settings": {
    "analysis": {
      "analyzer": {
        "product_analyzer": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": ["lowercase", "english_stemmer", "english_stop"]
        }
      },
      "filter": {
        "english_stemmer": { "type": "stemmer", "language": "english" },
        "english_stop": { "type": "stop", "stopwords": "_english_" }
      }
    }
  }
}

BM25: The Relevance Algorithm

Elasticsearch uses BM25 (Best Matching 25) to score documents. Two key factors:

Term Frequency (TF) — how often the term appears in the document (more = higher score, with diminishing returns)
Inverse Document Frequency (IDF) — how rare the term is across all documents (rarer = higher score)

score(q, d) = Σ IDF(t) × [tf(t,d) × (k1 + 1)] / [tf(t,d) + k1 × (1 - b + b × |d| / avgdl)]

In interviews, you don’t need the formula — just know: rare terms matching frequently in short documents score highest.

Cluster Architecture

An Elasticsearch cluster is a collection of nodes, each serving one or more roles.

Elasticsearch Architecture

Node Roles

Role	Purpose	Resource Profile
Master-eligible	Cluster state management, index creation/deletion, shard allocation	Low CPU/memory, high stability
Data	Stores shards, executes queries and aggregations	High memory, high disk I/O
Coordinating	Routes requests, merges results from shards	Moderate CPU/memory
Ingest	Pre-processing pipeline (transforms before indexing)	Moderate CPU
ML	Machine learning jobs	High CPU/memory

Interview tip: In a production cluster, always have dedicated master nodes (typically 3) to prevent the cluster state from being affected by heavy data operations.

# elasticsearch.yml — dedicated master node
node.master: true
node.data: false
node.ingest: false

# elasticsearch.yml — dedicated data node
node.master: false
node.data: true
node.ingest: false

Shards: The Unit of Scale

An index is split into shards, each being a self-contained Lucene index.

// Create index with 5 primary shards and 1 replica
PUT /products
{
  "settings": {
    "number_of_shards": 5,
    "number_of_replicas": 1
  }
}

Shard routing formula:

shard_number = hash(_routing) % number_of_primary_shards

By default, _routing = _id. This means the number of primary shards is fixed at index creation — you can’t change it later without reindexing.

Shard sizing guidelines:

Target 10-50 GB per shard (sweet spot for most workloads)
Each shard has overhead (~500MB heap), so don’t over-shard
Too few shards = can’t scale writes; too many = overhead and slow queries
Rule of thumb: number of shards = ceil(expected_data_size / 30GB)

Write Path: How Documents Get Indexed

Understanding the write path is critical for interview discussions about consistency and latency.

Write and Read Paths

Step-by-Step Write Flow

Key concepts:

Translog (Transaction Log): Write-ahead log for durability. Every write is appended to the translog before acknowledgment. If the node crashes before flush, the translog replays on restart.
Refresh: Every 1 second (configurable), the in-memory buffer is written to a new Lucene segment. This is when the document becomes searchable. This is why ES is “near real-time,” not real-time.
Flush: Periodically, all in-memory segments are fsynced to disk and the translog is cleared. This is the Lucene “commit” operation.
Merge: Background process that combines small segments into larger ones. Reduces the number of segments to search and reclaims space from deleted documents.

// Force refresh (make recent writes searchable immediately)
POST /products/_refresh

// Adjust refresh interval (e.g., for bulk indexing)
PUT /products/_settings
{
  "index.refresh_interval": "30s"
}

// Disable refresh during bulk load, re-enable after
PUT /products/_settings { "index.refresh_interval": "-1" }
// ... bulk index ...
PUT /products/_settings { "index.refresh_interval": "1s" }

Write Consistency

ES uses a primary-backup model:

Write hits the primary shard
Primary replicates to all in-sync replica shards in parallel
Write is acknowledged only after primary + all in-sync replicas confirm

You can control this with the wait_for_active_shards parameter:

// Wait for all shards (primary + replicas) before acknowledging
PUT /products/_doc/1?wait_for_active_shards=all
{ "name": "laptop" }

// Wait for just the primary (faster, less durable)
PUT /products/_doc/1?wait_for_active_shards=1
{ "name": "laptop" }

Read Path: How Search Works

Scatter-Gather Pattern

Every search query follows the scatter-gather (two-phase) pattern:

Query Phase:

Coordinating node receives the search request
Forwards to one copy (primary or replica) of every relevant shard
Each shard runs the query locally, returns only doc IDs + scores (top N)
Coordinating node merges results into a global top-N list

Fetch Phase: 5. Coordinating node requests full documents for only the final top-N results 6. Returns complete JSON response to the client

// Search with explanation of scoring
GET /products/_search
{
  "query": {
    "bool": {
      "must": [
        { "match": { "title": "wireless headphones" } }
      ],
      "filter": [
        { "term": { "category": "electronics" } },
        { "range": { "price": { "lte": 100 } } }
      ]
    }
  },
  "sort": [
    { "_score": "desc" },
    { "created_at": "desc" }
  ],
  "from": 0,
  "size": 20
}

Interview insight: The filter context is critical for performance. Filters don’t calculate relevance scores and their results are cached in a bitset. Always put exact-match conditions (category, status, price range) in filter, not must.

Deep Pagination Problem

// Page 1: fast
GET /products/_search { "from": 0, "size": 20 }

// Page 500: SLOW — each shard must return 10,000 results
GET /products/_search { "from": 9980, "size": 20 }

Each shard must produce from + size results, the coordinator merges num_shards × (from + size) results. At deep offsets, this is extremely expensive.

Solutions:

// Solution 1: search_after (recommended for deep pagination)
// First page
GET /products/_search
{
  "size": 20,
  "sort": [{ "created_at": "desc" }, { "_id": "asc" }]
}
// Next page — use the last document's sort values
GET /products/_search
{
  "size": 20,
  "sort": [{ "created_at": "desc" }, { "_id": "asc" }],
  "search_after": ["2026-03-20T10:00:00", "abc123"]
}

// Solution 2: scroll API (for processing all results, e.g., export)
POST /products/_search?scroll=5m
{ "size": 1000, "query": { "match_all": {} } }
// Then: POST /_search/scroll { "scroll": "5m", "scroll_id": "..." }

// Solution 3: Point-in-time + search_after (best for concurrent readers)
POST /products/_pit?keep_alive=5m
// Returns: { "id": "pit_id_here" }
GET /_search
{
  "pit": { "id": "pit_id_here", "keep_alive": "5m" },
  "size": 20,
  "sort": [{ "created_at": "desc" }],
  "search_after": [...]
}

Mappings: Schema Design

Mappings define how fields are indexed and stored. Getting mappings right is essential for performance.

PUT /products
{
  "mappings": {
    "properties": {
      "title":       { "type": "text", "analyzer": "english" },
      "title_exact": { "type": "keyword" },
      "description": { "type": "text" },
      "price":       { "type": "float" },
      "category":    { "type": "keyword" },
      "tags":        { "type": "keyword" },
      "created_at":  { "type": "date" },
      "location":    { "type": "geo_point" },
      "in_stock":    { "type": "boolean" },
      "metadata":    { "type": "object", "enabled": false }
    }
  }
}

text vs keyword

	`text`	`keyword`
Analyzed	Yes (tokenized, lowercased, stemmed)	No (stored as-is)
Use for	Full-text search	Exact match, sorting, aggregations
Example	Product description	Category, status, email
Query	`match`, `multi_match`	`term`, `terms`, `range`

Common pattern: Map the same field as both:

{
  "title": {
    "type": "text",
    "fields": {
      "raw": { "type": "keyword" }
    }
  }
}
// Search on "title", sort/aggregate on "title.raw"

Dynamic Mapping Pitfalls

By default, ES auto-creates mappings for new fields. This is dangerous in production:

// Disable dynamic mapping to prevent mapping explosion
PUT /products
{
  "mappings": {
    "dynamic": "strict",
    "properties": { ... }
  }
}
// "strict" = reject docs with unmapped fields
// "false"  = accept docs but don't index unmapped fields
// "true"   = auto-map everything (default, risky)

Aggregations: Analytics at Scale

Aggregations run alongside search queries to compute summaries over matched documents.

GET /orders/_search
{
  "size": 0,
  "query": {
    "range": { "created_at": { "gte": "2026-01-01" } }
  },
  "aggs": {
    "revenue_by_category": {
      "terms": { "field": "category", "size": 20 },
      "aggs": {
        "total_revenue": { "sum": { "field": "amount" } },
        "avg_order_value": { "avg": { "field": "amount" } }
      }
    },
    "monthly_trend": {
      "date_histogram": {
        "field": "created_at",
        "calendar_interval": "month"
      },
      "aggs": {
        "revenue": { "sum": { "field": "amount" } }
      }
    }
  }
}

Three aggregation types:

Type	Purpose	Example
Bucket	Group documents	`terms`, `date_histogram`, `range`, `filters`
Metric	Compute values	`sum`, `avg`, `min`, `max`, `cardinality`
Pipeline	Aggregate over other aggs	`moving_avg`, `derivative`, `cumulative_sum`

Performance tip: Aggregations use doc_values (column-oriented on-disk data structure) by default for keyword and numeric fields. Never aggregate on text fields — use a keyword sub-field.

Common System Design Patterns

1. Product Search (E-commerce)

Key design decisions:

Source of truth is the primary database, not ES
Use CDC (Change Data Capture) or event-driven indexing to keep ES in sync
Fetch full product details from the primary DB by ID after getting search results
Use filter context for faceted navigation (category, brand, price range)

2. Log Analytics (ELK Stack)

Index strategy for logs:

Use time-based indices: logs-2026.03.21
Configure Index Lifecycle Management (ILM) to automatically:
- Hot → Warm → Cold → Delete
- Roll over when index hits 50GB or 30 days

// ILM policy
PUT _ilm/policy/logs_policy
{
  "policy": {
    "phases": {
      "hot":    { "actions": { "rollover": { "max_size": "50gb", "max_age": "1d" } } },
      "warm":   { "min_age": "7d",  "actions": { "shrink": { "number_of_shards": 1 }, "forcemerge": { "max_num_segments": 1 } } },
      "cold":   { "min_age": "30d", "actions": { "freeze": {} } },
      "delete": { "min_age": "90d", "actions": { "delete": {} } }
    }
  }
}

3. Autocomplete / Search-as-You-Type

// Mapping with completion suggester
PUT /products
{
  "mappings": {
    "properties": {
      "suggest": {
        "type": "completion"
      },
      "title": {
        "type": "search_as_you_type"
      }
    }
  }
}

// Index with suggestions
PUT /products/_doc/1
{
  "title": "Apple MacBook Pro 16-inch",
  "suggest": {
    "input": ["Apple MacBook Pro", "MacBook Pro 16", "laptop"],
    "weight": 10
  }
}

// Autocomplete query (extremely fast — uses FST, not inverted index)
GET /products/_search
{
  "suggest": {
    "product_suggest": {
      "prefix": "mac",
      "completion": {
        "field": "suggest",
        "size": 5,
        "fuzzy": { "fuzziness": 1 }
      }
    }
  }
}

4. Geospatial Search

// Find restaurants within 5km
GET /restaurants/_search
{
  "query": {
    "bool": {
      "must": { "match": { "cuisine": "italian" } },
      "filter": {
        "geo_distance": {
          "distance": "5km",
          "location": { "lat": 40.7128, "lon": -74.0060 }
        }
      }
    }
  },
  "sort": [
    {
      "_geo_distance": {
        "location": { "lat": 40.7128, "lon": -74.0060 },
        "order": "asc",
        "unit": "km"
      }
    }
  ]
}

Scaling Strategies

Horizontal Scaling Decision Matrix

Reindexing Without Downtime

You can’t change the number of primary shards or modify certain mapping properties on a live index. The solution is index aliases:

// Initial setup
PUT /products_v1 { ... }
POST /_aliases { "actions": [{ "add": { "index": "products_v1", "alias": "products" } }] }

// Application always queries "products" alias

// When you need to reindex:
PUT /products_v2 { ... }  // new settings/mappings
POST /_reindex { "source": { "index": "products_v1" }, "dest": { "index": "products_v2" } }

// Atomic swap
POST /_aliases {
  "actions": [
    { "remove": { "index": "products_v1", "alias": "products" } },
    { "add":    { "index": "products_v2", "alias": "products" } }
  ]
}

Performance Optimization Checklist

Indexing Performance

// Bulk API — always use for batch indexing
POST /_bulk
{"index":{"_index":"products","_id":"1"}}
{"title":"Laptop","price":999}
{"index":{"_index":"products","_id":"2"}}
{"title":"Phone","price":699}

// Optimal bulk size: 5-15 MB per request

Use bulk API — never index one document at a time
Increase refresh interval to 30s during bulk loads
Disable replicas during initial load, re-enable after
Use _routing to co-locate related documents on the same shard

Query Performance

Use filter over must for exact matches — filters are cached and skip scoring
Avoid wildcards at the start of terms (*phone is slow; phone* is fine)
Limit from + size to under 10,000 — use search_after for deeper pagination
Use _source filtering to return only needed fields:

GET /products/_search
{
  "_source": ["title", "price", "category"],
  "query": { "match": { "title": "laptop" } }
}

Pre-warm field data for frequently aggregated fields
Avoid scripts in queries when possible — use runtime fields or indexed fields instead

Cluster Health

// Check cluster health
GET /_cluster/health
// green = all shards allocated
// yellow = primary shards OK, some replicas unassigned
// red = some primary shards unassigned (data loss risk!)

// Check shard allocation
GET /_cat/shards?v&h=index,shard,prirep,state,node

// Check node stats
GET /_nodes/stats/jvm,os,fs

Elasticsearch vs Alternatives

Feature	Elasticsearch	Apache Solr	Typesense	Meilisearch
Built on	Lucene	Lucene	Custom C++	Custom Rust
Query language	Query DSL (JSON)	Solr Query / JSON	Simple API	Simple API
Scaling	Automatic shard rebalancing	Manual shard management	Built-in raft	Single node (HA planned)
Aggregations	Very powerful	Powerful	Basic	Basic
Typo tolerance	Manual (fuzzy)	Manual	Built-in	Built-in
Operational complexity	High	High	Low	Very low
Best for	Large-scale search + analytics	Enterprise search	Developer-friendly search	Small-medium search

Interview Cheat Sheet

When to Use Elasticsearch

Full-text search across large document collections
Log/event analytics with aggregations
Autocomplete and search-as-you-type
Faceted navigation (e-commerce filters)
Geospatial queries
Real-time dashboards over time-series data

When NOT to Use Elasticsearch

Primary data store — ES is not ACID, data loss is possible
Frequent updates to the same document — each update = delete + reindex
Strong consistency requirements — ES is eventually consistent (near real-time)
Transactions — no multi-document transactions
Small datasets — overhead isn’t worth it under ~100K documents
Simple key-value lookups — use Redis or a database

Key Numbers to Know

Metric	Typical Value
Indexing to searchable	~1 second (refresh interval)
Search latency (simple)	5-50 ms
Search latency (complex aggs)	50-500 ms
Shard size sweet spot	10-50 GB
Max `from + size`	10,000 (default)
Heap recommendation	50% of RAM, max 31 GB
Shards per node	< 600 recommended

Interview Answer Template

When an interviewer asks you to design a search feature:

Source of truth — primary DB (PostgreSQL/MySQL), not ES
Sync mechanism — CDC via Kafka/Debezium, or application-level dual writes
Index design — define mappings, analyzers, shard count based on data size
Query design — bool query with must for relevance, filter for exact matches
Pagination — search_after for deep pagination, never from > 10,000
Scaling — replicas for read throughput, time-based indices for logs, ILM for lifecycle
Failure handling — what happens if ES is down? Degrade gracefully, queue writes for replay

Wrapping Up

Elasticsearch is a powerful tool, but it’s not a database replacement. It’s a secondary index — a read-optimized, denormalized view of your data that enables search and analytics capabilities that traditional databases can’t match.

The mental model for system design interviews:

Write path: App → Primary DB → Event/CDC → ES (async, eventually consistent)
Read path: App → ES for search/filter/aggregate → Primary DB for full record (if needed)
Scaling: Shards for write parallelism, replicas for read throughput, ILM for data lifecycle
Tradeoff: Near real-time (not real-time), eventually consistent (not strongly consistent), optimized for reads (not writes)

Master these concepts and you’ll be able to confidently design any search-heavy system in an interview.