Jeremy Zawodny (Craigslist) explains how to optimize the import of data that far exceeds the amount of RAM available in a sharded MongoDB cluster:
Many of the chunks the balancer decided to move from the busy shard to a less busy shard contained “older” data that had already been flushed to disk to make room in memory for newer data. That means that the process of migrating those chunks was especially painful, since loading in that older data meant pushing newer data from memory, flushing it to disk, and then reading back the older data only to hand it to another shard and ultimately delete it. All the while newer data is streaming in and adding to the pressure.
That extra I/O and flushing eventually manifest themselves as lower throughput. A lot lower. Needless to say, the situation was not sustainable. At all.
This is a perfect example of how to investigate an issue in MongoDB auto-sharding.
Original title and link: MongoDB Pre-Splitting for Faster Data Loading and Importing (NoSQL databases © myNoSQL)