ALL COVERED TOPICS

NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter

NAVIGATE MAIN CATEGORIES

Close

Parallel Processing Using the Map Reduce Programming Model

From scripting to MapReduce:

Next, Let us assume we have a much larger dataset with consists of all language movies and their details. And after running the new file on the same program, we find that it takes a lot of time. We want it to run faster. There are two ways to go about this. One, we hack the hell out of perl. Although this sounds promising, this might not be the best path. Or, we can try to run parts of the program in parallel thereby speeding up the execution. But, going this route, we might open up a Pandora’s Box of issues that accompany parallel processing. How many processes are we going to run? How do we co-ordinate the processes? How are we going combine the output of all the process? What if processes fail? How are we going to control access to a shared resource? What if the dataset outgrows our single machine? These are some of the questions that need answering before we start writing parallel programs. Although we can get the program to run faster, it is often messy in real life.

What if there is programming model that abstracts away these details and let us concentrate only on writing our business logic? And that the model automatically takes care of all the finer details of parallelism, freeing us from thinking about its intrinsic details. It seems there is that model available.

Original title and link: Parallel Processing Using the Map Reduce Programming Model (NoSQL database©myNoSQL)

via: http://blog.diskodev.com/parallel-processing-using-the-map-reduce-prog