NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



Dryad: All content tagged as Dryad in NoSQL databases and polyglot persistence

Claim Chowder: Microsoft’s Dryad Technology to Take on Google’s MapReduce

In Dec.2010, Joab Jackson writes for IDG News Service: Microsoft’s Dryad technology to take on Google’s MapReduce. Just 11 months later, in Nov.2011, Doug Henschen writes for the same IDG News Service: Microsoft Ditches Dryad, Focuses On Hadoop - Software.

Nothing wrong with Microsoft decision. Same cannot be said though about the titles and articles published by the IDG News Service network.

Original title and link: Claim Chowder: Microsoft’s Dryad Technology to Take on Google’s MapReduce (NoSQL database©myNoSQL)

Paper: HaLoop Efficient Iterative Data Processing on Large Clusters

A paper from the Seattle University of Washington (Yingyi Bu, Bill Howe, Magdalena Balazinska, Michael D. Ernst):

The growing demand for large-scale data mining and data analysis applications has led both industry and academia to design new types of highly scalable data-intensive computing platforms. MapReduce and Dryad are two popular platforms in which the dataflow takes the form of a directed acyclic graph of operators. These platforms lack built-in support for iterative programs, which arise naturally in many applications including data mining, web ranking, graph analysis, model fitting, and so on. This paper presents HaLoop, a modified version of the Hadoop MapReduce framework that is designed to serve these applications. HaLoop not only extends MapReduce with programming support for iterative applications, it also dramatically improves their efficiency by making the task scheduler loop-aware and by adding various caching mechanisms. We evaluated HaLoop on real queries and real datasets. Compared with Hadoop, on average, HaLoop reduces query runtimes by 1.85, and shuffles only 4% of the data between mappers and reducers.

The embedded paper and download link after the break

Trinity, Dryad, Probase and Bing

Klint Finley (RWW) connecting the dots between Microsoft Research projects Trinity, Dryad, Probase, Bing and competition (Google, Facebook):

It’s not hard to connect the dots between Bing, Dryad, Probase and Trinity. Microsoft is building a set of tools to rival those used internally at Google and the open source tools used by companies like Facebook and Twitter. The interesting thing will be what Microsoft does with its data.

Original title and link: Trinity, Dryad, Probase and Bing (NoSQL databases © myNoSQL)


Comparing Dryad and Hadoop

Madhu Reddy[1] comparing the commercial and not yet released Dryad with the open source, widely used Hadoop:

  • While Hadoop has chosen to build these capabilities from scratch [management and administration of large clusters], Dryad has chosen to leverage the proven and tested cluster management capabilities already present in Windows HPC Server.
  • Hadoop […] has focused on performance and scale. Dryad, building on the performance and scale of Windows HPC Server, has in addition focused on making big data easier to use for mainstream application developers.
  • Dryad and DSC are based on the widely used and mature NTFS (New Technology File System), the file system that comes standard with Windows Server.
  • Hadoop uses the MapReduce computational model, which provides support for expressing the application logic in two simple steps — map and reduce. However, to develop more complex applications, developers will have to manually string together a sequence of MapReduce steps. DryadLINQ offers a higher-level computational model where complex sequence of MapReduce steps can be easily expressed in a query language similar to SQL.

A couple of aspects that were left out:

  1. licensing costs for Windows HPC Server, Microsoft Visual Studio, and the future Dryad
  2. Dryad commercial closed source model versus Hadoop open source model. (nb: example question: how soon could you get a bug fix or improvement?)
  3. Hadoop tools ecosystem
  4. Other Hadoop tools like Karmasphere studio — a graphical environment to develop, debug, deploy and monitor MapReduce jobs.

That’s not to say that Dryad and DryadLINQ are not interesting projects.

  1. Madhu Reddy is senior product manager for Technical Computing marketing at Microsoft  

Original title and link: Comparing Dryad and Hadoop (NoSQL databases © myNoSQL)