NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



NoSQL video: All content tagged as NoSQL video in NoSQL databases and polyglot persistence

Top 5 Presentations from MongoNYC

If I’ve posted the Cassandra Summit’s Bests, I’ve also found the top 5 slides and videos from MongoNYC on the 10gen blog.

✚ You might take this as one of my biases but the fact that a presentation with the title “How to keep your data safe in MongoDB” is still in top 5 after so many years of MongoDB makes me think how much some of the early decisions hurt its adoption; maybe even the adoption of NoSQL in general.

Original title and link: Top 5 Presentations from MongoNYC (NoSQL database©myNoSQL)

NoSQL Screencast: Building a StackOverflow Clone With RavenDB

Ayende and Justin pair to model a StackOverflow website clone with RavenDB. And they cover:

  • Map/Reduce indexes
  • Modelling tags
    • Root aggregates
    • Metadata
    • Active tags
  • Facets
  • Performance:
    • Built-in caching
    • Lazy loading
    • Aggressive caching
  • RavenDB profiler

Design Patterns for Efficient Graph Algorithms in MapReduce: Paper and Video

One of the most cited limitations of Hadoop is graph processing.

This problem has been approached in a few different ways until now. Google’s graph processing framework Pregel, which has some major differences compared to MapReduce, is one of them. There are also some MapReduce implementations for graph processing. Last, but not least different approaches are being tried for scaling graph databases.

Jimmy Lin and Michael Schatz have published in 2010 a paper on the subject of Design patterns for efficient graph algorithms in MapReduce (pdf):

Graphs are analyzed in many important contexts, including ranking search results based on the hyperlink structure of the world wide web, module detection of protein-protein interaction networks, and privacy analysis of social networks. Many graphs of interest are difficult to analyze because of their large size, often spanning millions of vertices and billions of edges. As such, researchers have increasingly turned to distributed solutions. In particular, MapReduce has emerged as an enabling technology for large-scale graph processing. However, existing best practices for MapReduce graph algorithms have significant shortcomings that limit performance, especially with respect to partitioning, serial- izing, and distributing the graph. In this paper, we present three design patterns that address these issues and can be used to accelerate a large class of graph algorithms based on message passing, exemplified by PageRank. Experiments show that the application of our design patterns reduces the running time of PageRank on a web graph with 1.4 billion edges by 69%.

After the break you can find a video of Jimmy Lin talking about current best practices in designing large-scale graph algorithms in MapReduce and how to avoid some of the shortcomings, especially those related to partitioning, serializing, and distributing the graph. He shows three enhanced design patterns applicable to large class of graph algorithms that address many of these deficiencies.