NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



3 MapReduce and Hadoop Links: Secondary Sorting, Hadoop-Based Letterpress, and Hadoop Vaidya

  1. Bill Bejeck: MapReduce Algorithms - Secondary Sorting:

    This post covers the pattern of secondary sorting, found in chapter 3 of Data-Intensive Text Processing with MapReduce. While Hadoop automatically sorts data emitted by mappers before being sent to reducers, what can you do if you also want to sort by value? You use secondary sorting of course. With a slight manipulation to the format of the key object, secondary sorting gives us the ability to take the value into account during the sort phase. There are two possible approaches here.

  2. Jesse Anderson: Understanding MapReduce via Boggle:

    MapReduce is a great platform for traversing graphs. Therefore, one can leverage the power of an ApacheHadoop cluster to efficiently run an algorithm on the graph.

    One such graph problem is playing Boggle*. Boggle is played by rolling a group of 16 dice. Each players’ job is find the most number of words spelled out by the dice.

    Hadoop-based Letterpress?

  3. Vitthal Gogate: Hadoop Vaidya: Performance advisor for Hadoop Map/Reduce Jobs:

    In its current state, Hadoop Vaidya is an extensible framework that allows users to write their own tests/rules for analyzing MapReduce applications. It leverages Job Configuration and Job History logs as input for this analysis, but moving forward we plan to integrate the tool with data from Greenplum Command Center, a management and monitoring platform for both GPDB and GPHD. This will enable Vaidya incorporate more sources of information from the cluster such as daemon/user logs, audit logs, job queue information and system metrics into its analysis. It will also enable real-time job analysis when running MapReduce jobs. This tool will also be hosted on our 1000 node Analytics Workbench (AWB) so that partners and research institutions using the cluster can take advantage of the benefits of using Vaidya in their own analysis.

Original title and link: 3 MapReduce and Hadoop Links: Secondary Sorting, Hadoop-Based Letterpress, and Hadoop Vaidya (NoSQL database©myNoSQL)