Vertica and Hadoop for Big Data
Here is what I’ve jotted down during Vertica’s webinar Hadoop vs. RDBMS for Big Data Analytics: Why Choose?
- the webinar has focused on clarifying where and how Vertica and Hadoop fit in the Big Data space
- Vertica’s strenghts:
- support for SQL, extended SQL, and analytics making it interactive investigation of data
- storage space efficiency — I don’t think it’s correct to interpret Hadoop data redundancy as storage space inneficiency
- analytics SDK (allows customizing in-database analytic functions)
- ease of operating and maintenance (auto-tunning features)
- the following slide is pretty eloquent about Hadoop and Vertica being complementary solutions :

-
when covering a scenario for using both Hadoop and Vertica, they chose the ease one: Hadoop as ETL. It’s not that it’s not a good one, but it’s the only one databases vendors are using these days when speaking about integration with Hadoop.

-
other possible Hadoop + Vertica use cases:
- Filter, join, and aggregation in Vertica with intermediate results fed into MR jobs
- parallel import and export to HDFS
- Hadoop MapReduce for data transformation and Vertica for optimized storage and retrieval
- there will be a community edition of Vertica. It was announced in October for the end of 2011, but I don’t think it’s out yet
- there’s a GitHub repo for user defined extensions for Vertica
-
the following categorization of Big Data tools is interesting but feels in favor of Vertica which would be placed somewhere close to the center of the triangle

Original title and link: Vertica and Hadoop for Big Data (©myNoSQL)