NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



RainStor Big Data Analytics on Hadoop Promises Impressive Data Compression Rates

RainStor has announced the Big Data Analytics on Hadoop:

  • The highest data compression in the industry with up to 40x reduction, compared to raw data typically stored in HDFS, with no re-inflation required for access
  • The ability to run faster query and analysis using both SQL query and MapReduce with 10-100x faster results
  • The ability to perform analytics directly in Hadoop, reducing the need to create copies and transfer data out
  • Reduced nodes in a Hadoop cluster with ~85 percent lower operating costs.

A couple of comments:

  • RainStor is not the only solution that can perform analytics directly in Hadoop
  • Hive provides a mechanism to project structure onto this data and query the data using a SQL-like language called HiveQL
  • RainStor MapReduce support is via Pig
  • according to this, there’s an interesting aspect of RainStor support of SQL and MapReduce:

    Users can choose SQL for rapid response ad-hoc queries or run batch jobs using MapReduce against RainStor data.  Additionally you can interoperate SQL and MapReduce and join results from a query against RainStor and against native CSV files on HDFS.

    As a side note, Toad for Cloud from Quest is a tool that tries to provide a table based perspective of data in relational and NoSQL databases

Anyways, the most interesting part of the announcement is RainStor’s claimed data compression level (up to 40x) and the fact that accessing data doesn’t require re-inflation. According to an infographic the current available solutions for compression are topped at at most 8x:

  • Hadoop LZO: 3x
  • Compressed relational: 6x
  • Flatfile Gzip: 7x
  • Columnar: 8x

If such compression levels can be achieved frequently and the impact on other server resources (CPU, memory) is minimal, RainStor Big Data Analytics on Hadoop will definitely be an interesting part of the Hadoop market.

Before leaving you with the infographic, here is a nice quote form RainStor CEO, John Bantleman:

We see Hadoop as a platform like Linux, which needs solutions on top to deliver value.

Hadoop Data Compression

Original title and link: RainStor Big Data Analytics on Hadoop Promises Impressive Data Compression Rates (NoSQL database©myNoSQL)