NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



Big Data and the Need for New Approaches to Data Integration

I’d say Dave Linthicum got some things wrongly:

First is the ability to manage large data sets more efficiently than with traditional relational technology as done in the past. The methodology is to leverage an approach called MapReduce.

MapReduce is about processing data, but you got to store that data first.

The “Map” portion of MapReduce is the master node that accepts the request and divides it among any number of worker nodes. The “Reduce” portion means that the master node considers the results from the worker nodes and combines them to determine the answer to the request. The power of this architecture is the simplistic nature of MapReduce, meaning it’s both easy to understand and to implement.


It is clear to me that using the cloud’s ability to provide massive amounts of commodity computing power, on-demand, when combined with a database architecture that will exploit that power means data processing power on scales we have never seen at these low price points.

This is still something I’m not yet convinced of. Processing in the cloud is indeed a good option. But data must be available on the cloud. And in the case of big data either storing it or moving it to the cloud doesn’t seem to be the best alternative.

Big Data and the Need for New Approaches to Data Integration originally posted on the NoSQL blog: myNoSQL