I’d say Dave Linthicum got some things wrongly:
First is the ability to manage large data sets more efficiently than with traditional relational technology as done in the past. The methodology is to leverage an approach called MapReduce.
MapReduce is about processing data, but you got to store that data first.
The “Map” portion of MapReduce is the master node that accepts the request and divides it among any number of worker nodes. The “Reduce” portion means that the master node considers the results from the worker nodes and combines them to determine the answer to the request. The power of this architecture is the simplistic nature of MapReduce, meaning it’s both easy to understand and to implement.
It is clear to me that using the cloud’s ability to provide massive amounts of commodity computing power, on-demand, when combined with a database architecture that will exploit that power means data processing power on scales we have never seen at these low price points.
This is still something I’m not yet convinced of. Processing in the cloud is indeed a good option. But data must be available on the cloud. And in the case of big data either storing it or moving it to the cloud doesn’t seem to be the best alternative.
Big Data and the Need for New Approaches to Data Integration originally posted on the NoSQL blog: myNoSQL