NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



Groundhog: Hadoop Automated Testing at Yahoo!

Yahoo! and probably all other large installations of Hadoop have to deal with upgrading their Hadoop clusters. Scheduled rolling upgrades is the strategy applied everywhere, but depending on the size of the cluster this can take way too much time. Yahoo! has developed internally an interesting tool that can help with Hadoop upgrades:

Groundhog is an automated testing tool to help ensure backwards compatibility (in terms of API, functionality, and performance) between releases of Hadoop before deploying a new release onto clusters with a high QoS. Groundhog does this by providing an automated mechanism to capture user jobs (currently limited to pig scripts) as they are run on a cluster and then replay them on a different cluster with a different version of Hadoop to verify that they still produce the same results.

Groundhog Hadoop Yahoo

This is the sort of tool I always wanted for most of the applications I’ve developed: a system able to capture complete or percentage of the real traffic and then replay it. At every layer of the application.

Original title and link: Groundhog: Hadoop Automated Testing at Yahoo! (NoSQL database©myNoSQL)