As shown above the performance gain is pretty significant, and the cost saving is even more impressive: $13.60/hour versus $57/hour. This is hard to compare due to the different pricing models, but check out pricing here for more info. In fact, our analysts like Redshift so much that they don’t want to go back to Hive and other tools even though a few key features are lacking in Redshift. Also, we have noticed that big joins of billions of rows tend to run for a very long time, so for that we’d go back to hadoop for help.
If I’m not mistaking, this is the second story in the last week about the performance of Redshift. But here’s something I don’t understand (or I don’t see mentioned in this post):
- you use Hadoop to store your data. The reason is that 12 months ago, 6 months ago (and today) there is no other more cost effective and productive solution.
- in this time you learn about the data. You develop models and queries
- your analysts prefer SQL because that’s what makes them more productive
- you take the data, the knowledge you’ve built in this time, you craft it to fit into a columnar analytic database
- then you write that the columnar analytic-oriented database is more performant than using Hive over Hadoop
To me this feels like saying that you are more efficient in your mother tongue than in a foreign language. Or am I missing something?
Original title and link: Redshift Performance & Cost at Airbnb ( ©myNoSQL)