mapreduce: All content tagged as mapreduce in NoSQL databases and polyglot persistence
If you feel like reading a bit of bullet point style FUD about Hadoop, check Dr. David F. Rico’s PDF.
Original title and link: Some Interesting Facts, Sorry FUD About Hadoop MapReduce ( ©myNoSQL)
Over the next few years, Hadoop reinvented data analysis not only at Facebook and Yahoo but so many other web services. And then an army of commercial software vendors started selling the thing to the rest of the world. Soon, even the likes of Oracle and Greenplum were hawking Hadoop. These companies still treated Hadoop as an adjunct to the traditional database — as a tool suited only to certain types of data analysis. But now, that’s changing too.
I have found the above fragment, which fully describes the impact Hadoop had and has in the data world, in Cade Metz’s article about Greenplum’s Pivotal HD announcement for Wired: “Why Hadoop Is the Future of the Database.
Original title and link: The History of Hadoop Changed the World ( ©myNoSQL)
As shown above the performance gain is pretty significant, and the cost saving is even more impressive: $13.60/hour versus $57/hour. This is hard to compare due to the different pricing models, but check out pricing here for more info. In fact, our analysts like Redshift so much that they don’t want to go back to Hive and other tools even though a few key features are lacking in Redshift. Also, we have noticed that big joins of billions of rows tend to run for a very long time, so for that we’d go back to hadoop for help.
If I’m not mistaking, this is the second story in the last week about the performance of Redshift. But here’s something I don’t understand (or I don’t see mentioned in this post):
- you use Hadoop to store your data. The reason is that 12 months ago, 6 months ago (and today) there is no other more cost effective and productive solution.
- in this time you learn about the data. You develop models and queries
- your analysts prefer SQL because that’s what makes them more productive
- you take the data, the knowledge you’ve built in this time, you craft it to fit into a columnar analytic database
- then you write that the columnar analytic-oriented database is more performant than using Hive over Hadoop
To me this feels like saying that you are more efficient in your mother tongue than in a foreign language. Or am I missing something?
Original title and link: Redshift Performance & Cost at Airbnb ( ©myNoSQL)