presentation: All content tagged as presentation in NoSQL databases and polyglot persistence
Monday, 19 July 2010
Presentation: An Introduction to FluidDB
We briefly covered FluidDB in the past when I mis-named it the Wikipedia of databases. The presentation embedded below clarifies a bit more the following questions:
- what is FluidDB: a platform for the web of things, each represented by an openly writable “social” object
- why FluidDB: most of the information nowadays lives inside walled gardens, so its difficult to make real use of it. I especially enjoyed this slide explaining the problem with closed information:
- how to use FluidDB: all applications use the same FluidDB database through a RESTful API
Monday, 12 July 2010
Presentations: Riak, Schema Design, and Ruby
Wynn Netherland:
Two great slide decks on schema design, #riak, and #ruby
Well, I’ve added one myself so make it three great Riak presentations. You can definitely use them as reference material:
Riak: A friendly key/value store for the web by Bruce Williams
Schema design for Riak by Sean Cribbs
There’s also a nice ☞ Q&A post covering a couple of very interesting topics:
- what’s the cost of listing keys in Riak and the impact on MapReduce
- modeling relationships with large numbers of associations
- caching of intermediate results for link-walking and map phase
- notification mechanisms
Riak and Ruby by Grant Schofield
Wednesday, 30 June 2010
Presentations: Oren Eini on NoSQL and RavenDB
Bookmark this for the time you’ll be looking into RavenDB or when you’ll have around 6 hours to watch Oren Eini (Ayende Rahien) talk on NoSQL and RavenDB.
Embedded below are the slides from Introduction to RavenDB:
Presentation: Scalable Event Analytics with MongoDB & Ruby on Rails
We’ve already seen the analytics MongoDB case study before when looking how Eventbrite is tracking page views with MongoDB, but also in a MongoDB-based real time web traffic visualization tool called Hummingbird.
But Jared Rosoff’s presentation contains a series of slides which are identifying possible issues in each scaling approach:
- single database
- master-slave database
- sharded database
- key-value stores
- key-value store with Hadoop for reporting
- MongoDB
The only part I don’t really understand is how is using Hadoop
more complex than scaling MongoDB:
Maybe someone could explain?
Meanwhile, Jared Rosoff’s complete slidedeck below.
Friday, 18 June 2010
Question about Riak MapReduce
There’s one aspect of Riak’s MapReduce that I’ve always wondered about: why the reduce phase is run only on a single node?
As you can see in the images below — extracted from Jon Meredith’s Riak in Ten Minutes embedded below — the map phase is distributed on all machines having the target data. But the reduce phase is run only on the machine that triggered the processing.
There can be quite a few problems with this approach:
- saturating the network
- overwhelming the node with data and processing
Is this just a temporary solution? Or are there good reasons for this behavior?
While I usually don’t believe in learning X in Y lessons, Jon Meredith’s presentation is a good intro to Riak. Think of it as a summary of Kevin Smith’s 209 slides introducing Riak or Sean Cribbs’s 145 on Riak and Ripple or even for the excellent 2 hours Riak Tutorial — in case you haven’t checked these then you should definitely start with this one as it will give you the basics so you can dive deeper.
Thursday, 17 June 2010
Video: 2 Hours Riak Tutorial
A must see tutorial on Riak by Sean Cribbs.
Compare that with Riak in 10 minutes:
Update: Unfortunately it looks like the original video was taken off by The Red Dirt RubyConf people. The only thing I could find is the slide deck:
Berlin Buzzwords Presentations
The organizers of the Berlin Buzzwords NoSQL event have set up a ☞ wiki page with links to all presentations.
My 5 top favorites:
- Mathias Meyer: ☞ NoSQL - The Definitive Guide
- Rusty Klophaus: ☞ Riak from small to large Mon (pdf). New: video of the presentation is available here
- Mathias Stearn: ☞ Mongo DB - the new ‘M’ in your LAMP stack (pdf)
- Peter Neubauer: ☞ 5 cool problems you can solve with neo4j
- Doug Judd: ☞ Hypertable - The Ultimate Scaling Machine. New: video of the presentation is available here
What are yours?
Tuesday, 15 June 2010
Presentation: Project Voldemort at Gilt Groupe: When Failure Isn't an Option
InfoQ posted Geir Magnusson’s presentation on Project Voldermort recorded at ☞ QCon London.
Geir Magnusson explains how Gilt Groupe is using Project Voldemort to scale out their e-commerce transactional system. The initial SQL solution had to be replaced because it could not handle the transactional spikes the site is experiencing daily due to its particular way of selling their inventory: each day at noon. Magnusson explains why they chose Voldemort and talks about the architecture.
I have recently heard about others having to deal with similar scenarios so I’m really wondering what solutions are they employing.
As a side note, ☞ QCon San Francisco 2010, coming in November 1-5, will have a full day NoSQL track hosted by yours truly. Expect more details about the presentations in this track very soon.
via: http://www.infoq.com/presentations/Project-Voldemort-at-Gilt-Groupe
Friday, 11 June 2010
Hadoop and Complex Data Processing Workflows with Cascading
Understanding the basic concepts behind MapReduce is not a very difficult task, but those using extensively MapReduce tasks inside Hadoop are already facing new challenges like:
- how can you run multiple map and/or reduce phases in your data processing?
- how can you better coordinate the data processing execution flow for more complex scenarios?
- how can you perform additional work between map/reduce phases?
Addressing these new challenges is the goal of the ☞ Cascading project:
Cascading is a feature rich API for defining and executing complex, scale-free, and fault tolerant data processing workflows on a Hadoop cluster.
Christopher Curtin’s slides embedded below are offering a good overview of what can be achieved using Cascading (starting with slide 20).
Thursday, 3 June 2010
Presentation: Cassandra Basics - Indexing
A very informative presentation by Benjamin Black on Cassandra indexing:
There are so many interesting things to learn from these slides. Benjamin is briefly introducing the main Cassandra terms — if you are not familiar with them you can read more in this Cassandra tutorial — and moves to explain how column sorting and partitioning strategies should be used. Also to mention, some really quotable fragments from the deck:
Relational stores are schema oriented. Start from your schema & work forwards
Column stores are query oriented. Start from your queries & work backwards
Cassandra is an index construction kit
Tuesday, 1 June 2010
Presentation: An introduction to node.js and Riak
While most of Francisco Treacy’s (@frank06) “An Introduction to node.js and Riak” presentation is focusing on the advantages of event-based architectures, it also shows how to integrate node.js and Riak using ☞ riak-js, a node.js library for Riak that takes advantage of the friendly HTTP-based Riak protocol
There are a couple of other interesting things that can be learned from this slide deck. For example the cost of I/O:

simply described afterwards:
In other words, reaching RAM is like going from here to the Red Light District. Accessing the network is like going to the moon.
Update: thanks to a comment on this post, here is what Googler Jeff Dean presented on the cost of I/O:
But as Frank mentions, there are some risks while working with cutting-edge technologies:
- Cutting-edge technologies are not bug-free
- Riak still has some rough edges (some in terms of performance)
- node.js is approaching its first stable version
- asynchronous JS code can get “boomerang-shaped”
Friday, 28 May 2010
Presentation: Cassandra @ Outbrain
☞ Some interesting slides[1] (starting with slide 12) about why and how Outbrain is using Cassandra, plus a brief intro to:
- Cassandra data model
- Cassandra API
- consistency model
- the Hector Java client
- sorting
- Thrift
- gossip, consistency hashing and consistency levels
You’ll find much more details about these in our getting started with Cassandra tutorial, but bullet point format is usefull sometimes.
References
- [1] Note: this is a Google doc. (↩)
Most Popular Articles
- Translate SQL to MongoDB MapReduce
- Tutorial: Getting Started With Cassandra
- CouchDB vs MongoDB: An attempt for a More Informed Comparison
- Cassandra @ Twitter: An Interview with Ryan King
- A Couple of Nice GUI Tools for MongoDB
- NoSQL benchmarks and performance evaluations
- Ehcache: Distributed Cache or NoSQL Store?
- Document Databases Compared: CouchDB, MongoDB, RavenDB
- Quick Review of Existing Graph Databases
- NoSQL Data Modeling












