ALL COVERED TOPICS

NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter

NAVIGATE MAIN CATEGORIES

Close

The Hadoop as ETL part in migrating from MongoDB to Cassandra at FullContact

While I’ve found the whole post very educative — and very balanced considering the topic — the part that I’m linking to is about integrating MongoDB with Hadoop. After reading the story of integrating MongoDB and Hadoop at Foursquare, there were quite a few questions bugging me. This post doesn’t answer any of them, but it brings in some more details about existing tools, a completely different solution, and what seems to be an overarching theme when using Hadoop and MongoDB in the same phrase:

We’re big users of Hadoop MapReduce and tend to lean on it whenever we need to make large scale migrations, especially ones with lots of transformation. That fact along with our existing conversion project from before, we used 10gen’s mongo-hadoop project which has input and output formats for Hadoop. We immediately realized that the InputFormat which connected to a MongoDB cluster was ill-suited to our usage. We had 3TB of partially-overlapping data across 2 clusters. After calculating input splits for a few hours, it began pulling documents at an uncomfortably slow pace. It was slow enough, in fact, that we developed an alternative plan.

You’ll have to read the post to learn how they’ve accomplished their goal, but as a spoiler, it was once again more of an ETL process rather than an integration.

✚ The corresponding HN thread; it’s focused mostly on the from MongoDB to Cassandra parts.

Original title and link: The Hadoop as ETL part in migrating from MongoDB to Cassandra at FullContact (NoSQL database©myNoSQL)

via: http://www.fullcontact.com/blog/mongo-to-cassandra-migration/