ALL COVERED TOPICS

NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter

NAVIGATE MAIN CATEGORIES

Close

Performance advantages of the new Google Cloud Storage Connector for Hadoop

This guest post by Mike Wendt from Accenture Technology provides some very good answers to the questions I had about the recently announced Hadoop connector for Google Cloud Storage: how does it behave compared to local storage (data locality), what the performance of accessing Google Cloud Storage directly from Hadoop, and, last but essential for cloud setups, what are the cost implications:

From our study, we can see that remote storage powered by the Google Cloud Storage connector for Hadoop actually performs better than local storage. The increased performance can be seen in all three of our workloads to varying degrees based on their access patterns. […] Availability of the files, and their chunks, is no longer limited to three copies within the cluster, which eliminates the dependence on the three nodes that contain the data to process the file or to transfer the file to an available node for processing.

[…] This availability of remote storage on the scale and size provided by Google Cloud Storage unlocks a unique way of moving and storing large amounts of data that is not available with bare-metal deployments.

If you are looking just for the conclusions:

First, cloud-based Hadoop deployments offer better price-performance ratios than bare-metal clusters. Second, the benefit of performance tuning is so huge that cloud’s virtualization layer overhead is a worthy investment as it expands performance-tuning opportunities. Third, despite the sizable benefit, the performance-tuning process is complex and time-consuming and thus requires automated tuning tools.

✚ Keep in mind though that this study was posted on the Google Cloud Platform, so you could expect the results to beat the competition.

Original title and link: Performance advantages of the new Google Cloud Storage Connector for Hadoop (NoSQL database©myNoSQL)

via: http://googlecloudplatform.blogspot.com/2014/01/performance-advantages-of-the-new-google-cloud-storage-connector-for-hadoop.html