ALL COVERED TOPICS

NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter

NAVIGATE MAIN CATEGORIES

Close

Redshift Performance & Cost at Airbnb

Henry Cai from AirBnb reports about their experiment and move from using Hive with Hadoop to Amazon Redshift:

As shown above the performance gain is pretty significant, and the cost saving is even more impressive: $13.60/hour versus $57/hour. This is hard to compare due to the different pricing models, but check out pricing here for more info. In fact, our analysts like Redshift so much that they don’t want to go back to Hive and other tools even though a few key features are lacking in Redshift. Also, we have noticed that big joins of billions of rows tend to run for a very long time, so for that we’d go back to hadoop for help.

If I’m not mistaking, this is the second story in the last week about the performance of Redshift. But here’s something I don’t understand (or I don’t see mentioned in this post):

  1. you use Hadoop to store your data. The reason is that 12 months ago, 6 months ago (and today) there is no other more cost effective and productive solution.
  2. in this time you learn about the data. You develop models and queries
  3. your analysts prefer SQL because that’s what makes them more productive
  4. you take the data, the knowledge you’ve built in this time, you craft it to fit into a columnar analytic database
  5. then you write that the columnar analytic-oriented database is more performant than using Hive over Hadoop

To me this feels like saying that you are more efficient in your mother tongue than in a foreign language. Or am I missing something?

Original title and link: Redshift Performance & Cost at Airbnb (NoSQL database©myNoSQL)