NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



MapReduce and Massively Parallel Processing (MPP): Two Sides of the Big Data

Andrew Brust for ZDNet:

But, for a variety of reasons, MPP and MapReduce are used in rather different scenarios. You will find MPP employed in high-end data warehousing appliances. […] MPP gets used on expensive, specialized hardware tuned for CPU, storage and network performance. MapReduce and Hadoop find themselves deployed to clusters of commodity servers that in turn use commodity disks. The commodity nature of typical Hadoop hardware (and the free nature of Hadoop software) means that clusters can grow as data volumes do, whereas MPP products are bound by the cost of, and finite hardware in, the appliance and the relative high cost of the software. […] MPP and MapReduce are separated by more than just hardware. MapReduce’s native control mechanism is Java code (to implement the Map and Reduce logic), whereas MPP products are queried with SQL (Structured Query Language). […] Nonetheless, Hadoop is natively controlled through imperative code while MPP appliances are queried though declarative query. In a great many cases, SQL is easier and more productive than is writing MapReduce jobs, and database professionals with the SQL skill set are more plentiful and less costly than Hadoop specialists.

I totally agree with Andrew Brust that none of these are good reasons for these platforms to remain separate. Actually when analyzing the importance of the Teradata (MPP) and Hortonworks (Hadoop) partnership, I wrote:

Depending on the level of integration the two team will pull together, this partnership might result in one of the most complete and powerful structured and unstructured data warehouse and analytics platform.

This very same thing could be said about any platform that would offer a viable, fully integrated, cost effective, distributed, structured and unstructured data warehouse or analytics platform. MPP and MapReduce do not represent different sides of the Big Data, but rather complementary approaches for Big Data.

Original title and link: MapReduce and Massively Paralle Processing (MPP): Two Sides of the Big Data (NoSQL database©myNoSQL)