NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



What Makes Amazon Redshift Faster Than Hive?

I’m not implying that this question appeared on Quora after my link and comments about Redshift’s performance and costs at AirBnb, but Reynold Xin’s answer covers in a more formal way the reasons of Redshift being faster than Hive I’ve suggested in that post:

Some of the advantages you gain from massive scale and flexibility make it challenging to build a more performant query engine. The following outlines how various features (or lack of features) influences performance:

  1. data format
  2. task launch overhead (nb: this can be optimized in Hive/Hadoop)
  3. intermediate data materialization vs pipelining
  4. columnar data format
  5. columnar query engine
  6. faster S3 connection

Original title and link: What Makes Amazon Redshift Faster Than Hive? (NoSQL database©myNoSQL)