NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



MongoDB Pre-Splitting for Faster Data Loading and Importing

Jeremy Zawodny (Craigslist) explains how to optimize the import of data that far exceeds the amount of RAM available in a sharded MongoDB cluster:

Many of the chunks the balancer decided to move from the busy shard to a less busy shard contained “older” data that had already been flushed to disk to make room in memory for newer data. That means that the process of migrating those chunks was especially painful, since loading in that older data meant pushing newer data from memory, flushing it to disk, and then reading back the older data only to hand it to another shard and ultimately delete it. All the while newer data is streaming in and adding to the pressure.

That extra I/O and flushing eventually manifest themselves as lower throughput. A lot lower. Needless to say, the situation was not sustainable. At all.

This is a perfect example of how to investigate an issue in MongoDB auto-sharding.

Original title and link: MongoDB Pre-Splitting for Faster Data Loading and Importing (NoSQL databases © myNoSQL)