Great points by Glen Sheffield about the recent TPC-DS Impala benchmark results:
Although they are basing their tests on the industry standard TPC-DS
benchmark – they are only showing results for a carefully selected subset of
the TPC-DS queries, using a carefully selected subset of the TPC-DS data.
For the performance comparisons – they have chosen just 20 of the 99
official TPC-DS query set.
For the scalability tests, they have chosen just 6 of the 99 official TPC-DS
For all tests they have chosen to use a single fact table, even though the
TPC-DS database schema contains 6 fact tables.
And the list doesn’t stop here.
With every new vendor published benchmark I’m seeing, the more convinced I am that they’ve become pure marketing material. The art of turning numbers into non-stories.
Original title and link: Cloudera Impala – A Closer Look