tutorials|September 06, 2019|3 min read

How to sync Mongodb data to ElasticSearch by using MongoConnector

TL;DR

Use MongoConnector to continuously replicate MongoDB collections to Elasticsearch indexes, enabling full-text search and Grafana visualizations on your MongoDB data.

How to sync Mongodb data to ElasticSearch by using MongoConnector

Introduction

This post is about syncing your mongodo database data to ElasticSearch. There might be several scenarios where you want to quickly search some data, or expose a search api, or running Grafana to visualize your data.

Mongo-Connector

MongoConnector is an open source tool which is to sync MongoDB data to ElasticSearch. You can run it periodically or continously. And, it can sync all the changes in your MongoDB data. ElasticSearch will have a replica of the MongoDB data.

You can configure which are the MongoDB collections you want to sync and with what names their indexes should be made in ElasticSearch.

Requirements

  1. MongoDB Replica Set You need a MongoDB replica set. A standalone instance will not work.

  2. ElasticSearch Cluster

  3. mongo-connector utility

How to create MongoDB replica set with Docker

See: Run MongoDB replica set with Docker

How to create ElasticSearch cluster with Docker

See: Run Elastic Search Cluster with Docker

How to get mongo-connector

You need to have python installed, and install it via pip:

pip install 'mongo-connector[elastic5]' 'elastic2-doc-manager[elastic5]'

Or, you can prepare its docker image too. See below Dockerfile:

FROM python:3-alpine
RUN apk add --no-cache curl sed && pip install 'mongo-connector[elastic5]' 'elastic2-doc-manager[elastic5]'
ENTRYPOINT ["mongo-connector"]

To build docker image:

docker build -t my_mongoconnector .

Run MongoConnector

MongoConnector Config

You should prepare a config file(name=mongoconnector.json):

{
   "oplogFile": "<your desired path>/oplog.timestamp",
   "noDump": false,
   "batchSize": 50,
   "verbosity": 2,
   "continueOnError": true,
   "logging": {
       "type": "stream"
   },
   "namespaces": {
        "mydb.coll1": {
            "rename": "mydb_coll1._doc"
        },
        "mydb.trainings": {
            "rename": "mydb_trainings._doc"
        }
    },
   "docManagers": [
       {
           "docManager": "elastic2_doc_manager",
           "targetURL": "<elastic search hostname>:9200",
           "bulkSize": 10,
           "uniqueKey": "_id",
           "args": {
              "clientOptions": {"timeout": 5000}
           }
       }
   ]
}

In above config file:

  • oplogFile - Its a file where mongo-connector will write a timestamp where it left syncing. So that even if it stopped, it can start syncing from the place where it left.
  • namespaces - which are the collections you want to sync, and with what names they will go in Elastic Search
  • docManagers - Configuration about your elastic search cluster.

Run

mongo-connector -m "mongodb://<mongoset1>:27017,<mongoset2>:27018,<mongoset3>:27019/<your db>?replicaSet=your-replicaset-name" -c ./mongoconnector.json

If everything is fine, it will start syncing your MongoDB data to ElasticSearch you specified.

Sample output

2019-09-06 08:17:05,189 [ALWAYS] mongo_connector.connector:50 - Starting mongo-connector version: 3.1.1
2019-09-06 08:17:05,189 [ALWAYS] mongo_connector.connector:50 - Python version: 3.6.8 (default, Apr 25 2019, 21:02:35) 
[GCC 4.8.5 20150623 (Red Hat 4.8.5-36)]
2019-09-06 08:17:05,190 [ALWAYS] mongo_connector.connector:50 - Platform: Linux-3.10.0-957.21.3.el7.x86_64-x86_64-with-centos-7.6.1810-Core
2019-09-06 08:17:05,191 [ALWAYS] mongo_connector.connector:50 - pymongo version: 3.9.0
2019-09-06 08:17:05,204 [ALWAYS] mongo_connector.connector:50 - Source MongoDB version: 4.2.0
2019-09-06 08:17:05,204 [ALWAYS] mongo_connector.connector:50 - Target DocManager: mongo_connector.doc_managers.elastic2_doc_manager version: 1.0.0
2019-09-06 08:17:05,225 [INFO] mongo_connector.oplog_manager:137 - OplogThread: Initializing oplog thread
2019-09-06 08:17:05,227 [INFO] mongo_connector.connector:402 - MongoConnector: Starting connection thread MongoClient(host=['mongoset1:27018', 'mongoset1:27017', 'mongoset1:27019'], document_class=dict, tz_aware=False, connect=True, replicaset='your-replica-set')
2019-09-06 08:17:05,241 [INFO] elasticsearch:83 - GET http://<es-hostname>:9200/_mget?realtime=true [status:200 request:0.007s]
2019-09-06 08:17:05,356 [INFO] elasticsearch:83 - POST http://<es-hostname>:9200/_bulk [status:200 request:0.110s]
2019-09-06 08:17:05,477 [INFO] elasticsearch:83 - POST http://<es-hostname>:9200/_refresh [status:200 request:0.121s]
2019-09-06 08:17:05,484 [INFO] elasticsearch:83 - GET http://<es-hostname>:9200/_mget?realtime=true [status:200 request:0.006s]
2019-09-06 08:17:05,616 [INFO] elasticsearch:83 - POST http://<es-hostname>9200/_bulk [status:200 request:0.129s]
2019-09-06 08:17:05,744 [INFO] elasticsearch:83 - POST http://<es-hostname>:9200/_refresh [status:200 request:0.128s]
.
.
.
.
.
2019-09-06 08:18:35,294 [INFO] mongo_connector.oplog_manager:78 - OplogThread for replica set 'your replica set' is up to date with the oplog.
2019-09-06 08:19:05,324 [INFO] mongo_connector.oplog_manager:78 - OplogThread for replica set 'your replica set' is up to date with the oplog.

And it will update the timestmap in that oplog file.

Related Posts

How to connect Php docker container with Mongo DB docker container

How to connect Php docker container with Mongo DB docker container

Goto your command terminal. Type: This will expose port: 27017 by default. You…

How to Copy Local Docker Image to Another Host Without Repository and Load

How to Copy Local Docker Image to Another Host Without Repository and Load

Introduction Consider a scenario where you are building a docker image on your…

How to connect to a running mysql service on host from a docker container on same host

How to connect to a running mysql service on host from a docker container on same host

Introduction I have a host running mysql (not on a container). I have to run an…

How to run MongoDB replica set on Docker

How to run MongoDB replica set on Docker

Introduction This post is about hosting MongoDB replica set cluster with…

Docker: unauthorized: incorrect username or password.

Docker: unauthorized: incorrect username or password.

While running docker commands with some images, I started getting error: The…

Common used Elastic Search queries

Common used Elastic Search queries

Listing down the commonly used Elastic Search queries. You can get search…

Latest Posts

Staff Engineer Study Plan for MAANG Interviews — The Complete 12-Week Roadmap

Staff Engineer Study Plan for MAANG Interviews — The Complete 12-Week Roadmap

If you’re a Senior Engineer (L5) preparing for Staff (L6+) roles at MAANG…

XSS and CSRF Explained — The Complete Guide with Real Attack Examples and Defenses

XSS and CSRF Explained — The Complete Guide with Real Attack Examples and Defenses

XSS and CSRF have been in the OWASP Top 10 for over a decade. They’re among the…

OWASP Top 10 (2021) — Every Vulnerability Explained with Code

OWASP Top 10 (2021) — Every Vulnerability Explained with Code

The OWASP Top 10 is the industry standard for web application security risks. If…

HTTP Cookies Security — Everything Developers Get Wrong

HTTP Cookies Security — Everything Developers Get Wrong

Cookies are the single most important mechanism for web authentication. Every…

Format String Vulnerabilities — The Read-Write Primitive Hiding in printf()

Format String Vulnerabilities — The Read-Write Primitive Hiding in printf()

Format string vulnerabilities are unique in the exploit world. Most memory…

Buffer Overflow Attacks — How Memory Corruption Actually Works

Buffer Overflow Attacks — How Memory Corruption Actually Works

Buffer overflows are the oldest and most consequential vulnerability class in…