Nong Li about Cloudera’s Impala implementation:
Cloudera Impala, the open-source real-time query engine for Apache
Hadoop, uses many tools and techniques to get the best query
performance. This blog post will discuss how we use runtime code
generation to significantly improve our CPU efficiency and overall
query execution time. We’ll explain the types of inefficiency that
code-generation eliminates and go over in more detail one of the
queries in the TPCH workload where code generation improves overall
query speeds by close to 3x.
This reminded me of the days I was working on Java AOP frameworks whose implementation was based on bytecode generation for the same purpose of optimization. Everything worked perfectly well as long as the underlying assumptions remained the same.
Original title and link: Inside Cloudera Impala: Runtime Code Generation ( ©myNoSQL)