ALL COVERED TOPICS

NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter

NAVIGATE MAIN CATEGORIES

Close

Hadoop Pluggable Scheduler Framework

M. Tim Jones takes a look at Hadoop schedulers and when to use each of them:

Up until 2008, Hadoop supported a single scheduler that was intermixed with the JobTracker logic. Luckily, a bug report (HADOOP-3412) was submitted for an implementation of a scheduler that was independent of the JobTracker. More importantly, the new scheduler is pluggable, which allows use of new scheduling algorithms to help optimize jobs that have specific characteristics. A further advantage to this change is the increased readability of the scheduler, which has opened it up to greater experimentation and the potential for a growing list of schedulers to specialize in Hadoop’s ever-increasing list of applications. With this change, Hadoop is now a multi-user data warehouse that supports a variety of different types of processing jobs, with a pluggable scheduler framework providing greater control. This framework allows optimal use of a Hadoop cluster over a varied set of workloads (from small jobs to large jobs and everything in between). Moving away from FIFO scheduling (which treats a job’s importance relative to when it was submitted) allows a Hadoop cluster to support a variety of workloads with varying priority and performance constraints.

Original title and link: Hadoop Pluggable Scheduler Framework (NoSQL database©myNoSQL)

via: http://www.ibm.com/developerworks/opensource/library/os-hadoop-scheduling/index.html?ca=drs-