What Makes Amazon Redshift Faster Than Hive?
I’m not implying that this question appeared on Quora after my link and comments about Redshift’s performance and costs at AirBnb, but Reynold Xin’s answer covers in a more formal way the reasons of Redshift being faster than Hive I’ve suggested in that post:
Some of the advantages you gain from massive scale and flexibility make it challenging to build a more performant query engine. The following outlines how various features (or lack of features) influences performance:
- data format
- task launch overhead (nb: this can be optimized in Hive/Hadoop)
- intermediate data materialization vs pipelining
- columnar data format
- columnar query engine
- faster S3 connection
Original title and link: What Makes Amazon Redshift Faster Than Hive? (©myNoSQL)
via: http://www.quora.com/Hive-computing/What-makes-Amazon-Redshift-faster-than-Hive