Simple batch processing tools like MapReduce and Hadoop are just not powerful enough in any one of the dimensions of the big data space that really matters. Sure, Hadoop is great for simple batch processing tasks that are “embarrassingly parallel”, but most of the difficult big data tasks confronting companies today are much more complex than that. They can involve complex joins, ACID requirements, real-time requirements, supercomputing algorithms, graph computing, interactive analysis, or the need for continuous incremental updates. In each case, Hadoop is unable to provide anything close to the levels of performance required.
That’s to say there are problems where Hadoop or more generically mapreduce might not be the solution. MapReduce is fits problems that deal with data parallelism and not necessarily task parallelism. Most of the issues described above are related to processing deeply interconnected data.
McColl also writes in the article:
The only problem with this story is that the people who really do have cutting edge performance and scalability requirements today have already moved on from the Hadoop model.
Which are “all these companies”? The only reference I can think of is Google Caffeine, but that’s only one.
Original title and link: Beyond Hadoop - Next-Generation Big Data Architectures (NoSQL databases © myNoSQL)