tutorial: All content tagged as tutorial in NoSQL databases and polyglot persistence
Just a few days after posting about the “art” and need for data modeling in the NoSQL world, Basho guys have started a series of articles on Riak schema design.
While the ☞ first post was a bit more philosophical (or call it high-level), the ☞ second one is more hands-on and presents various approaches of modeling relationships with a key-value store or document store. Personally, I’ve kind of liked the Riak links approach as soon as I ☞read about it.
PS: Guys, I hope you’ve already prepared the beers ;-).
So what happens with a Cassandra read?
- a client makes a read request to a random node
- the node acts as a proxy determining the nodes having copies of data
- the node request the corresponding data from each node
the client can select the strength of the read consistency:
- single read => the request returns once it gets the first response, but data can be stale
quorum read => the request returns only after the majority responded with the same value
Mark mentions a couple of corner cases related to this behavior that is more complicated.
- the node also performs read repair of any inconsistent response
each node reading data uses either
Mike and Jonathan provide a very detailed explanation of the read performance:
Mike: To scan the
SSTable, Cassandra uses a row-level column index and bloom filter to find the necessary blocks on disk, deserializes them and determines the actual data to return. There’s a lot of disk IO here which ultimately makes the read latency higher than a similar DBMS.
Jonathan: The reason uncached reads are slower in Cassandra is not because the
SSTableis inherently io-intensive (it’s actually better than b-tree based storage on a 1:1 basis) but because in the average case you’ll have to merge row fragments from 2-4
SSTables to complete the request, since
SSTables are not update-in-place.
It is also important to note that Cassandra employs row caching that addresses reads latency.
Not everyone feels comfortable with MapReduce even if at first glance it looks pretty simple. Considering how useful and used MapReduce is in the NoSQL world, I thought it would be useful to put together a quick guide to MapReduce.
In case by now you’ve already started to ask yourself how would you translate some SQL functionality you’ll probably find ☞ pdf showing a SQL to (MongoDB) MapReduce translation:
While probably not as clear as the other examples above, Pete Warden’s (@peterwarden) “MapReduce for Idiots”:
CouchDB seems to get closer and closer to the 0.11 release which will bring quite a few new interesting features. The Couch.io blog has started a series of posts covering what’s new in CouchDB 0.11:
- ☞ Part 1
- Nice URLs
- URL rewriting
- Virtual hosts
- ☞ Part 2
- CouchDB JOINs redux
- raw collation
- ☞ Part 3 (new)
- implicitly create target database when replicating
- replicate documents by id
- replication filters
This last post also contains a series of use cases for the new features:
- Replicating inboxes or “Idle users are free”
- Splitting cluster nodes
- Need-to-know-based Data sharing
A new feature that is not mentioned in these posts but my be quite interesting is support for CommonJS 1.0 modules in
validation functions (but not
reduce). You can read more about it ☞ here.
We will continue updating this post with the upcoming features, so once CouchDB 0.11 is available you’ll be ready to start using it.
Update: CouchDB 0.11 was released!
CouchDB is one of the most friendly NoSQL systems in terms of protocols: JSON over HTTP. But that doesn’t mean that small libraries aware of the URI space and other aspects of CouchDB are not useful. (nb: the only problem would be if everyone starts creating his own though. Anyway, discussing about CouchDB libs is not the main intent of this post, but rather a personal note I’ve made while going through a couple of PHP guides to CouchDB). As a plus to its ease of use, CouchDB can completely change the architecture of your next web application
So, if you’re planning to get started with CouchDB and PHP, you’d probably like to see an overview of CouchDB. You could either follow the from beginner to CouchDB expert in 2 hours videos or watch Will Leinweber’s Relaxing with CouchDB or just go through David Coallier’s introduction to CouchdB slides embedded below:
Next, you’ll have to install CouchDB on your OS and read one of the PHP getting started posts listed below:
- Gonzalo Ayuso’s PHP and CouchDB ☞ part 1 and ☞ part 2 will walk you through basic CRUD operation with CouchDB and PHP
- Matt Apperson ☞ post will help you get started with CouchDB and PHPillow, a simple PHP helper class.
- Thomas Myer article will introduce you to CouchDB and PHP-on-Couch.
For those familiar with the Zend framework there’s also a Zend and CouchDB integration proposal that you might want to try. And there are probably more CouchDB quick guides around that you might find useful, in which case you’d probably also like to share with others. Meanwhile, have fun with PHP and CouchDB.
I am seeing lots of tutorials on how to install CouchDB on your favorite flavor of Linux, so I was wondering if this is a complex thing to get going? Or is it more about trying out the latest versions?
So I thought I should ask if you’d find it useful to put together a list of tutorials on how to install uptodate versions of CouchDB on each of these OSes? In case the answer is yes, then please submit your preferred tutorial through a comment. Thanks!
I don’t consider myself the right person to write detailed tutorials as I usually tend to omit a lot of details . But I’d like to try out a different approach: I’ll share with you the best materials I have found and used myself to learn about a specific feature. Please do let me know if you’ll find this approach useful.
Today will take a look at MongoDB MapReduce. As is normal (at least for making sure that we are getting rid of all future RTFM advice) we will start with the ☞ official documents. In MongoDB MapReduce case, the official documentation will provide us with details about:
- the complete command syntax
- specs for
- as a bonus a couple of basic examples
There are also a couple of important aspects that you’ll have to keep in mind while implementing your own MongoDB MapReduce functions:
- The MapReduce engine may invoke reduce functions iteratively; thus, these functions must be idempotent. That is, the following must hold for your reduce function:
for all k,vals : reduce( k, [reduce(k,vals)] ) == reduce(k,vals)
- Currently, the return value from a reduce function cannot be an array (it’s typically an object or a number).
- If you need to perform an operation only once, use a finalize function.
Knowing the basics, what I’ve found to work well for me was to take a look at a simple but close to real life example. In this case I have chosen the ☞ following piece of code which implements a basic text search.
I have also found very useful to take a look at how SQL translates to MapReduce in MongoDB.
Just to make sure that I got things straight by now, I used the 3rd part of Kyle Banker’s MongoDB aggregation tutorial: MapReduce basics.
The last step in learning about MapReduce in MongoDB was to take a look at some real usecases. Depending on your programming language preference, I’d recommend one of these two MongoDB MapReduce usecases:
- Ruby: Visualizing log files with MongoDB, MapReduce, Ruby & Google Charts: ☞ part 1 and ☞ part 2
- Perl: Using MongoDB and MapReduce on Apache Access Logs ☞
Summarizing our short tutorial on MongoDB MapReduce:
- ☞ official documents
- ☞ Basic text search example
- Translate SQL to MongoDB MapReduce
- MongoDB aggregation tutorial: MapReduce basics
- MongoDB MapReduce Usecases:
In case you have other materials on MongoDB MapReduce that you consider essential please share them with us!