NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



tutorial: All content tagged as tutorial in NoSQL databases and polyglot persistence

Redis and PHP

If we already had Redis and Python and Redis and Ruby, the scene would not be complete without Redis and PHP.

Kevin van Zonneveld’s (@kvz) article is following closely the pattern of the other two great Redis tutorials and walks the reader through all the steps involved to get started with Redis and PHP Rediska library.

I guess the one we are missing right now would be Redis and Perl. Anyone?


Tutorial: Riak Schema Design

Just a few days after posting about the “art” and need for data modeling in the NoSQL world, Basho guys have started a series of articles on Riak schema design.

While the ☞ first post was a bit more philosophical (or call it high-level), the ☞ second one is more hands-on and presents various approaches of modeling relationships with a key-value store or document store. Personally, I’ve kind of liked the Riak links[1] approach as soon as I ☞read about it.

PS: Guys, I hope you’ve already prepared the beers ;-).


Cassandra Reads Performance Explained

After explaining Cassandra writes performance, Mike Perham ☞ continues his series now explaining: “reads and […] why they are slow”.

So what happens with a Cassandra read?

  • a client makes a read request to a random node
  • the node acts as a proxy determining the nodes having copies of data
  • the node request the corresponding data from each node
  • the client can select the strength of the read consistency:

    • single read => the request returns once it gets the first response, but data can be stale
    • quorum read => the request returns only after the majority responded with the same value

      Mark mentions a couple of corner cases related to this behavior that is more complicated.

  • the node also performs read repair of any inconsistent response
  • each node reading data uses either Memtable (in-memory) or SSTables (disk)

    Mike and Jonathan provide a very detailed explanation of the read performance:

    Mike: To scan the SSTable, Cassandra uses a row-level column index and bloom filter to find the necessary blocks on disk, deserializes them and determines the actual data to return. There’s a lot of disk IO here which ultimately makes the read latency higher than a similar DBMS.

    Jonathan: The reason uncached reads are slower in Cassandra is not because the SSTable is inherently io-intensive (it’s actually better than b-tree based storage on a 1:1 basis) but because in the average case you’ll have to merge row fragments from 2-4 SSTables to complete the request, since SSTables are not update-in-place.

    It is also important to note that Cassandra employs row caching that addresses reads latency.

Mike’s post also covers Cassandra range scans and explains the role of Cassandra partitioning strategies. ☞ Great read!

Quick Guide to MapReduce

Not everyone feels comfortable with MapReduce even if at first glance it looks pretty simple. Considering how useful and used MapReduce is in the NoSQL world, I thought it would be useful to put together a quick guide to MapReduce.

If the whole notion is completely unfamiliar to you, Kristina Kchodorow’s(@kchodorow) ☞ post should get you started using a Star Trek-based story.

The next very detailed and visual explanation of MapReduce is Oren Eini’s (@ayende) ☞ post. While the post relies on Linq to explain MapReduce, all steps are accompanied by nice images.

There is also this ☞ article by Joel Spolsky and the ☞ Stackoverflow answers.

In case by now you’ve already started to ask yourself how would you translate some SQL functionality you’ll probably find ☞ pdf showing a SQL to (MongoDB) MapReduce translation:

While probably not as clear as the other examples above, Pete Warden’s (@peterwarden) “MapReduce for Idiots”:

Once you are done with this, I think you have enough details to read the ☞ original paper from Google and why not even continue with some MapReduce and Hadoop academic papers:

Getting Ready for CouchDB 0.11

CouchDB seems to get closer and closer to the 0.11 release which will bring quite a few new interesting features. The blog has started a series of posts covering what’s new in CouchDB 0.11:

  • ☞ Part 1
    • Nice URLs
    • URL rewriting
    • Virtual hosts
  • ☞ Part 2
    • CouchDB JOINs redux
    • raw collation
  • ☞ Part 3 (new)
    • implicitly create target database when replicating
    • replicate documents by id
    • replication filters

    This last post also contains a series of use cases for the new features:

    • Replicating inboxes or “Idle users are free”
    • Splitting cluster nodes
    • DesktopCouch
    • Need-to-know-based Data sharing

A new feature that is not mentioned in these posts but my be quite interesting is support for CommonJS 1.0 modules in show, list, update, and validation functions (but not map and reduce). You can read more about it ☞ here.

As we’ve mentioned the new replication features in CouchDB 0.11, I should also point you to two posts from Chris Strom on CouchDB replication: ☞ part 1 and ☞ part 2.

We will continue updating this post with the upcoming features, so once CouchDB 0.11 is available you’ll be ready to start using it.

Update: CouchDB 0.11 was released!

Quick Guide to CouchDB and PHP

CouchDB is one of the most friendly NoSQL systems in terms of protocols: JSON over HTTP. But that doesn’t mean that small libraries aware of the URI space and other aspects of CouchDB are not useful. (nb: the only problem would be if everyone starts creating his own though. Anyway, discussing about CouchDB libs is not the main intent of this post, but rather a personal note I’ve made while going through a couple of PHP guides to CouchDB). As a plus to its ease of use, CouchDB can completely change the architecture of your next web application

So, if you’re planning to get started with CouchDB and PHP, you’d probably like to see an overview of CouchDB. You could either follow the from beginner to CouchDB expert in 2 hours videos or watch Will Leinweber’s Relaxing with CouchDB or just go through David Coallier’s introduction to CouchdB slides embedded below:

Next, you’ll have to install CouchDB on your OS and read one of the PHP getting started posts listed below:

  • Gonzalo Ayuso’s PHP and CouchDB ☞ part 1 and ☞ part 2 will walk you through basic CRUD operation with CouchDB and PHP
  • Matt Apperson ☞ post will help you get started with CouchDB and PHPillow, a simple PHP helper class.
  • Thomas Myer article will introduce you to CouchDB and PHP-on-Couch.

For those familiar with the Zend framework there’s also a Zend and CouchDB integration proposal that you might want to try. And there are probably more CouchDB quick guides around that you might find useful, in which case you’d probably also like to share with others. Meanwhile, have fun with PHP and CouchDB.

Getting Up to Speed with CouchDB and Java

Nothing fancy, but if you haven’t done it already, this article will probably get you started in no time.

This article provides a step-by-step guide for using Apache CouchDB using Java. Here we will use Java code to:

  • Create a database in CouchDB,
  • Store employee data (Employee number, Name, Designation, etc).
  • Perform some CRUD operations on the data that is stored.
  • Show the use of “Views” and how it can retrieve data based on field values.

The software we use includes:

  • Apache CouchDB 0.9.0 installation
  • Couchdb4j-0.1.2 jar file
  • Json-lib-2.2.3-jdk15 jar file
  • ezmorph-1.0 jar file
  • Httpclient-4.0-beta2 jar file
  • Httpcore-4.0.1 jar file

Please note the current CouchDB version is 0.10.1, so you might want to try with this version instead of the one mentioned in the article.


MongoDB Poster

One of those examples of “an image is worth a thousand words”


Installing CouchDB on Your Favorite Linux?

I am seeing lots of tutorials on how to install CouchDB on your favorite flavor of Linux, so I was wondering if this is a complex thing to get going? Or is it more about trying out the latest versions?

As far as I know getting CouchDB on Mac OS is just a matter of downloading it from ☞ CouchDBX. And there’s also the ☞ Homebrew way. Pretty similar experience for ☞ Windows.

So I thought I should ask if you’d find it useful to put together a list of tutorials on how to install uptodate versions of CouchDB on each of these OSes? In case the answer is yes, then please submit your preferred tutorial through a comment. Thanks!

MongoDB Tutorial: MapReduce

I don’t consider myself the right person to write detailed tutorials as I usually tend to omit a lot of details . But I’d like to try out a different approach: I’ll share with you the best materials I have found and used myself to learn about a specific feature. Please do let me know if you’ll find this approach useful.

Today will take a look at MongoDB MapReduce. As is normal (at least for making sure that we are getting rid of all future RTFM advice) we will start with the ☞ official documents. In MongoDB MapReduce case, the official documentation will provide us with details about:

  • the complete command syntax
  • specs for map and reduce functions
  • as a bonus a couple of basic examples

There are also a couple of important aspects that you’ll have to keep in mind while implementing your own MongoDB MapReduce functions:

  1. The MapReduce engine may invoke reduce functions iteratively; thus, these functions must be idempotent. That is, the following must hold for your reduce function:

    for all k,vals : reduce( k, [reduce(k,vals)] ) == reduce(k,vals)

  2. Currently, the return value from a reduce function cannot be an array (it’s typically an object or a number).
  3. If you need to perform an operation only once, use a finalize function.

Knowing the basics, what I’ve found to work well for me was to take a look at a simple but close to real life example. In this case I have chosen the ☞ following piece of code which implements a basic text search.

I have also found very useful to take a look at how SQL translates to MapReduce in MongoDB.

Just to make sure that I got things straight by now, I used the 3rd part of Kyle Banker’s MongoDB aggregation tutorial: MapReduce basics.

The last step in learning about MapReduce in MongoDB was to take a look at some real usecases. Depending on your programming language preference, I’d recommend one of these two MongoDB MapReduce usecases:

  • Ruby: Visualizing log files with MongoDB, MapReduce, Ruby & Google Charts: ☞ part 1 and ☞ part 2
  • Perl: Using MongoDB and MapReduce on Apache Access Logs

Summarizing our short tutorial on MongoDB MapReduce:

In case you have other materials on MongoDB MapReduce that you consider essential please share them with us!

Tokyo Cabinet Tutorial: Database Types and Configuration Options

A great piece of documentation for the 3 different storage types supported by Tokyo Cabinet: hash, B+ tree and fixed-length. The article also features a long list of tuning parameters.

Here are a couple of things that I’ve learned myself:

  • Tokyo Cabinet support multi-operation transactions
  • the extension of the file determines the type of storage:
    • tch: Tokyo Cabinet Hash database
    • tcb: Tokyo Cabinet B+Tree database
    • tcf: Tokyo Cabinet Fixed-Length database
  • while Tokyo Cabinet B+Tree storage might be a bit slower than Tokyo Cabinet Hash storage, it brings new features:
    • keys are ordered (default lexical, but can be configured by passing a comparison function)
    • as a consequence it supports key ranges
    • allows duplicate values to be stored under the same key
  • Tokyo Cabinet Fixed-Length has some restrictions:
    • all keys are positive integers
    • as Tokyo Cabinet B+Tree keys are ordered (based on the integer keys) and it is not configurable
    • all values stored have fixed-length
  • on the bright side, Tokyo Cabinet Fixed-Length support some special keys: :min, :max, :prev and :next.

I think this is a great contribution by James Edward Gray II to the Tokyo Cabinet community which is facing some problems including the lack of documentation.