NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



presentation: All content tagged as presentation in NoSQL databases and polyglot persistence

Presentation: An Introduction to FluidDB

We briefly covered FluidDB in the past when I mis-named it the Wikipedia of databases. The presentation embedded below clarifies a bit more the following questions:

  • what is FluidDB: a platform for the web of things, each represented by an openly writable “social” object
  • why FluidDB: most of the information nowadays lives inside walled gardens, so its difficult to make real use of it. I especially enjoyed this slide explaining the problem with closed information:
    Issue with walled garden information
  • how to use FluidDB: all applications use the same FluidDB database through a RESTful API

Presentations: Riak, Schema Design, and Ruby

Wynn Netherland:

Two great slide decks on schema design, #riak, and #ruby

Well, I’ve added one myself so make it three great Riak presentations. You can definitely use them as reference material:

Riak: A friendly key/value store for the web by Bruce Williams

Schema design for Riak by Sean Cribbs

There’s also a nice ☞ Q&A post covering a couple of very interesting topics:

  • what’s the cost of listing keys in Riak and the impact on MapReduce
  • modeling relationships with large numbers of associations
  • caching of intermediate results for link-walking and map phase
  • notification mechanisms

Riak and Ruby by Grant Schofield

Presentations: Oren Eini on NoSQL and RavenDB

Bookmark this for the time you’ll be looking into RavenDB or when you’ll have around 6 hours to watch Oren Eini (Ayende Rahien) talk on NoSQL and RavenDB.

Embedded below are the slides from Introduction to RavenDB:

Presentation: Scalable Event Analytics with MongoDB & Ruby on Rails

We’ve already seen the analytics MongoDB case study before when looking how Eventbrite is tracking page views with MongoDB, but also in a MongoDB-based real time web traffic visualization tool called Hummingbird.

But Jared Rosoff’s presentation contains a series of slides which are identifying possible issues in each scaling approach:

  • single database
  • master-slave database
  • sharded database
  • key-value stores
  • key-value store with Hadoop for reporting
  • MongoDB

The only part I don’t really understand is how is using Hadoop

more complex than scaling MongoDB:

Maybe someone could explain?

Meanwhile, Jared Rosoff’s complete slidedeck below.

Question about Riak MapReduce

There’s one aspect of Riak’s MapReduce that I’ve always wondered about: why the reduce phase is run only on a single node?

As you can see in the images below — extracted from Jon Meredith’s Riak in Ten Minutes embedded below — the map phase is distributed on all machines having the target data. But the reduce phase is run only on the machine that triggered the processing.

There can be quite a few problems with this approach:

  • saturating the network
  • overwhelming the node with data and processing

Is this just a temporary solution? Or are there good reasons for this behavior?

While I usually don’t believe in learning X in Y lessons, Jon Meredith’s presentation is a good intro to Riak. Think of it as a summary of Kevin Smith’s 209 slides introducing Riak or Sean Cribbs’s 145 on Riak and Ripple or even for the excellent 2 hours Riak Tutorial — in case you haven’t checked these then you should definitely start with this one as it will give you the basics so you can dive deeper.

Video: 2 Hours Riak Tutorial

A must see tutorial on Riak by Sean Cribbs.

Compare that with Riak in 10 minutes:

Update: Unfortunately it looks like the original video was taken off by The Red Dirt RubyConf people. The only thing I could find is the slide deck:


Berlin Buzzwords Presentations

The organizers of the Berlin Buzzwords NoSQL event have set up a ☞ wiki page with links to all presentations.

My 5 top favorites:

What are yours?

Presentation: Project Voldemort at Gilt Groupe: When Failure Isn't an Option

InfoQ posted Geir Magnusson’s presentation on Project Voldermort recorded at ☞ QCon London.

Geir Magnusson explains how Gilt Groupe is using Project Voldemort to scale out their e-commerce transactional system. The initial SQL solution had to be replaced because it could not handle the transactional spikes the site is experiencing daily due to its particular way of selling their inventory: each day at noon. Magnusson explains why they chose Voldemort and talks about the architecture.

I have recently heard about others having to deal with similar scenarios so I’m really wondering what solutions are they employing.

As a side note, ☞ QCon San Francisco 2010, coming in November 1-5, will have a full day NoSQL track hosted by yours truly. Expect more details about the presentations in this track very soon.


Hadoop and Complex Data Processing Workflows with Cascading

Understanding the basic concepts behind MapReduce is not a very difficult task, but those using extensively MapReduce tasks inside Hadoop are already facing new challenges like:

  • how can you run multiple map and/or reduce phases in your data processing?
  • how can you better coordinate the data processing execution flow for more complex scenarios?
  • how can you perform additional work between map/reduce phases?

Addressing these new challenges is the goal of the ☞ Cascading project:

Cascading is a feature rich API for defining and executing complex, scale-free, and fault tolerant data processing workflows on a Hadoop cluster.

Christopher Curtin’s slides embedded below are offering a good overview of what can be achieved using Cascading (starting with slide 20).

Presentation: Cassandra Basics - Indexing

A very informative presentation by Benjamin Black on Cassandra indexing:

There are so many interesting things to learn from these slides. Benjamin is briefly introducing the main Cassandra terms — if you are not familiar with them you can read more in this Cassandra tutorial — and moves to explain how column sorting and partitioning strategies should be used. Also to mention, some really quotable fragments from the deck:

Relational stores are schema oriented. Start from your schema & work forwards

Column stores are query oriented. Start from your queries & work backwards

Cassandra is an index construction kit

Presentation: An introduction to node.js and Riak

While most of Francisco Treacy’s (@frank06) “An Introduction to node.js and Riak” presentation is focusing on the advantages of event-based architectures, it also shows how to integrate node.js and Riak using ☞ riak-js, a node.js library for Riak that takes advantage of the friendly HTTP-based Riak protocol

There are a couple of other interesting things that can be learned from this slide deck. For example the cost of I/O:

simply described afterwards:

In other words, reaching RAM is like going from here to the Red Light District. Accessing the network is like going to the moon.

Update: thanks to a comment on this post, here is what Googler Jeff Dean presented on the cost of I/O:

But as Frank mentions, there are some risks while working with cutting-edge technologies:

  • Cutting-edge technologies are not bug-free
  • Riak still has some rough edges (some in terms of performance)
  • node.js is approaching its first stable version
  • asynchronous JS code can get “boomerang-shaped”

Presentation: Cassandra @ Outbrain

☞ Some interesting slides[1] (starting with slide 12) about why and how Outbrain is using Cassandra, plus a brief intro to:

  • Cassandra data model
  • Cassandra API
  • consistency model
  • the Hector Java client
  • sorting
  • Thrift
  • gossip, consistency hashing and consistency levels

You’ll find much more details about these in our getting started with Cassandra tutorial, but bullet point format is usefull sometimes.


  • [1] Note: this is a Google doc. ()