Daniel Abadi in an email to Curt Monash analyzing a the Microsoft Polybase paper1:
The basic difference between Polybase and Hadapt is the following. With
Polybase, the basic interface to the user is the MPP database software (and
DBMS storage) that Microsoft is selling. Hadoop is viewed as a secondary
source of data — if you have a dataset stored inside Hadoop instead of the
database system for whatever reason, then the database system can access
that Hadoop data on the fly and include that data in query processing
alongside data that is already stored inside the database system. However,
the user must be aware that she might want to query the data in Hadoop in
advance — she must register this Hadoop data to the MPP database through an
external table definition (and ideally statistics should be generated in
advance to help the optimizer). Furthermore, the Hadoop data must be
structured, since the external table definition requires this (so you can’t
really access arbitrary unstructured data in Hadoop). The same is true for
SQL-H and Hawq — they all can access data in Hadoop (in particular data
stored in HDFS), but there needs to be some sort of structured schema
defined in order for the database to understand how to access it via SQL.
So, bottom line, Polybase/SQL-H/Hawq let you dynamically get at data in
Hadoop/HDFS that could theoretically have been stored in the DBMS all along,
but for some reason is being stored in Hadoop instead of the DBMS.
It’s a long paragraph, but the difference Daniel Abadi is emphasizing is critical: “Hadoop/HDFS data that could theoretically have been stored in DBMS all along”.
Original title and link: Main difference between Hadapt and Microsoft Polybase, HAWQ, SQL-H ( ©myNoSQL)