NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



Vertica and Hadoop for Big Data

Here is what I’ve jotted down during Vertica’s webinar Hadoop vs. RDBMS for Big Data Analytics: Why Choose?

  • the webinar has focused on clarifying where and how Vertica and Hadoop fit in the Big Data space
  • Vertica’s strenghts:
    • support for SQL, extended SQL, and analytics making it interactive investigation of data
    • storage space efficiency — I don’t think it’s correct to interpret Hadoop data redundancy as storage space inneficiency
    • analytics SDK (allows customizing in-database analytic functions)
    • ease of operating and maintenance (auto-tunning features)
  • the following slide is pretty eloquent about Hadoop and Vertica being complementary solutions : Vertica vs Hadoop - Analytics Feature Comparison
  • when covering a scenario for using both Hadoop and Vertica, they chose the ease one: Hadoop as ETL. It’s not that it’s not a good one, but it’s the only one databases vendors are using these days when speaking about integration with Hadoop.

    Hadoop + Vertica Use Case Example

  • other possible Hadoop + Vertica use cases:

    • Filter, join, and aggregation in Vertica with intermediate results fed into MR jobs
    • parallel import and export to HDFS
    • Hadoop MapReduce for data transformation and Vertica for optimized storage and retrieval
  • there will be a community edition of Vertica. It was announced in October for the end of 2011, but I don’t think it’s out yet
  • there’s a GitHub repo for user defined extensions for Vertica
  • the following categorization of Big Data tools is interesting but feels in favor of Vertica which would be placed somewhere close to the center of the triangle

    Triangle of Big Data Tools

Original title and link: Vertica and Hadoop for Big Data (NoSQL database©myNoSQL)