Apr 2015

13

Setting up MongoDB Cluster

I had to setup a MongoDB cluster at my job recently and i thought it would be useful to share the research i did for the same. As you might know MongoDB is a NoSQL document DB that scales well horizontally.

MongoDB achieves scaling through a technique called "sharding". Sharding is the process of writing data across different servers to distribute read & write loads and also manage data storage.

Sharding Topology

  • Config Server Main responsibility is to determine which shard the data gets stored in. Best practices recommend using a minimum of 3 config servers to ensure redundancy and availability.
  • Query Routers The clients connect to the query routers and they are responsible for communicating to the config servers to figure out where the requested data is stored. The router then accesses it and returns the data from the appropriate shard. Best practices recommend using a minimum of 1 Query Router.
  • Shard Servers Shards are responsible for the actual data storage. Best practices recommend using a minimum of 2 shard servers.
  • Here is an diagram showing the MongoDB cluster architecture.

    MongoDB Cluster Architecture

    Shard Key

    Once you have determined the topology the next task is to determine the Shard Key. In the case of my employer we were storing global suppression list and the data was similar to this

    
    {
    	"_id" : ObjectId("553e79233738d570a1d9000001"),
    	"email" : "test@example.com",
    	"event_type" : "aa",
    	"event_source" : "bb",
    	"event_reason" : "abuse",
    	"event_date" : ISODate("2015-04-27T00:00:00Z"),
    	"processed" : true,
    	"remarks" : "campaign_id=8450f5b87d;list_id=cbb7896aa1"
    }
                

    It is recommended to use a shard key that's guarenteed to be evenly distributed. We were anticipating an initial load of 3 million global suppression emails and projected to grow by about 100,000 every year. We felt that email should be by and large distributed evently and decided to use the email as the shard key.

    Cluster Setup

    I am going to assume that mongodb has already been setup on the target machines. The target machines will be referred to as config 1 thru 3, Router 1 and 2 & Shard 1 and 2 respectively. Here are the instructions to setup the Cluster.

    Config
    
    mkdir /apps/mongodb/logs
    cat > /etc/mongodb.conf <<-EOF
    port = 27017
    fork = true
    pidfilepath = /apps/mongodb/mongodb.pid
    logpath = /apps/mongodb/logs/mongodb.log
    dbpath =/apps/mongodb
    journal = true
    EOF
                
    Router

    Each Router runs the mongos command to specify the config databases. Once that is done connect to the server as admin as issue the following commands

    
    mongos --configdb config1, config2, config3
    mongo localhost/admin
                
    
    db.runCommand({"addShard" : "HOSTNAME_FOR_SHARD1:27017"});
    db.runCommand({"addShard" : "HOSTNAME_FOR_SHARD2:27017"});
    // Now enable sharding
    db.adminCommand({"enableSharding" : "DATABASE_NAME"});
    // Set the shard key
    db.adminCommand({"shardCollection" : "DATABASE_NAME.COLLECTION_NAME", key : {"email" : 1}})
                
    Shard

    Create the configuration file for the shard server just like the config servers and you are all set.

    
    mkdir /apps/mongodb/logs
    cat > /etc/mongodb.conf <<-EOF
    port = 27017
    fork = true
    pidfilepath = /apps/mongodb/mongodb.pid
    logpath = /apps/mongodb/logs/mongodb.log
    dbpath =/apps/mongodb
    journal = true
    EOF
                

    The command to start or stop the mongodb depends on the version of OS hosting the mongodb Cluster. On CentOS linux servers it would be as follows

    
    sudo /etc/init.d/mongod start|stop
                

    References

    How to create a Sharded Cluster

    Scaling MongoDB




Tags: MongoDB