tutorial: All content tagged as tutorial in NoSQL databases and polyglot persistence
Friday, 2 April 2010
Redis and PHP
If we already had Redis and Python and Redis and Ruby, the scene would not be complete without Redis and PHP.
Kevin van Zonneveld’s (@kvz) article is following closely the pattern of the other two great Redis tutorials and walks the reader through all the steps involved to get started with Redis and PHP Rediska library.
I guess the one we are missing right now would be Redis and Perl. Anyone?
via: http://kevin.vanzonneveld.net/techblog/article/redis_in_php/
Wednesday, 31 March 2010
Tutorial: Riak Schema Design
Just a few days after posting about the “art” and need for data modeling in the NoSQL world, Basho guys have started a series of articles on Riak schema design.
While the ☞ first post was a bit more philosophical (or call it high-level), the ☞ second one is more hands-on and presents various approaches of modeling relationships with a key-value store or document store. Personally, I’ve kind of liked the Riak links[1] approach as soon as I ☞read about it.
PS: Guys, I hope you’ve already prepared the beers ;-).
References
- [1] There is also a ☞ Web Linking RFC draft proposed by Mark Nottingham. (↩)
Friday, 26 March 2010
Cassandra Reads Performance Explained
After explaining Cassandra writes performance, Mike Perham ☞ continues his series now explaining: “reads and […] why they are slow”.
So what happens with a Cassandra read?
- a client makes a read request to a random node
- the node acts as a proxy determining the nodes having copies of data
- the node request the corresponding data from each node
the client can select the strength of the read consistency:
- single read => the request returns once it gets the first response, but data can be stale
quorum read => the request returns only after the majority responded with the same value
Mark mentions a couple of corner cases related to this behavior that is more complicated.
- the node also performs read repair of any inconsistent response
each node reading data uses either
Memtable(in-memory) orSSTables (disk)Mike and Jonathan provide a very detailed explanation of the read performance:
Mike: To scan the
SSTable, Cassandra uses a row-level column index and bloom filter to find the necessary blocks on disk, deserializes them and determines the actual data to return. There’s a lot of disk IO here which ultimately makes the read latency higher than a similar DBMS.Jonathan: The reason uncached reads are slower in Cassandra is not because the
SSTableis inherently io-intensive (it’s actually better than b-tree based storage on a 1:1 basis) but because in the average case you’ll have to merge row fragments from 2-4SSTables to complete the request, sinceSSTables are not update-in-place.It is also important to note that Cassandra employs row caching that addresses reads latency.
Mike’s post also covers Cassandra range scans and explains the role of Cassandra partitioning strategies. ☞ Great read!
Tuesday, 23 March 2010
Quick Guide to MapReduce
Not everyone feels comfortable with MapReduce even if at first glance it looks pretty simple. Considering how useful and used MapReduce is in the NoSQL world, I thought it would be useful to put together a quick guide to MapReduce.
If the whole notion is completely unfamiliar to you, Kristina Kchodorow’s(@kchodorow) ☞ post should get you started using a Star Trek-based story.
The next very detailed and visual explanation of MapReduce is Oren Eini’s (@ayende) ☞ post. While the post relies on Linq to explain MapReduce, all steps are accompanied by nice images.
There is also this ☞ article by Joel Spolsky and the ☞ Stackoverflow answers.
In case by now you’ve already started to ask yourself how would you translate some SQL functionality you’ll probably find ☞ pdf showing a SQL to (MongoDB) MapReduce translation:
While probably not as clear as the other examples above, Pete Warden’s (@peterwarden) “MapReduce for Idiots”:
Once you are done with this, I think you have enough details to read the ☞ original paper from Google and why not even continue with some MapReduce and Hadoop academic papers:
Tuesday, 16 March 2010
Getting Ready for CouchDB 0.11
CouchDB seems to get closer and closer to the 0.11 release which will bring quite a few new interesting features. The Couch.io blog has started a series of posts covering what’s new in CouchDB 0.11:
- ☞ Part 1
- Nice URLs
- URL rewriting
- Virtual hosts
- ☞ Part 2
- CouchDB JOINs redux
- raw collation
- ☞ Part 3 (new)
- implicitly create target database when replicating
- replicate documents by id
- replication filters
This last post also contains a series of use cases for the new features:
- Replicating inboxes or “Idle users are free”
- Splitting cluster nodes
- DesktopCouch
- Need-to-know-based Data sharing
A new feature that is not mentioned in these posts but my be quite interesting is support for CommonJS 1.0 modules in show, list, update, and validation functions (but not map and reduce). You can read more about it ☞ here.
As we’ve mentioned the new replication features in CouchDB 0.11, I should also point you to two posts from Chris Strom on CouchDB replication: ☞ part 1 and ☞ part 2.
We will continue updating this post with the upcoming features, so once CouchDB 0.11 is available you’ll be ready to start using it.
Update: CouchDB 0.11 was released!
Quick Guide to CouchDB and PHP
CouchDB is one of the most friendly NoSQL systems in terms of protocols: JSON over HTTP. But that doesn’t mean that small libraries aware of the URI space and other aspects of CouchDB are not useful. (nb: the only problem would be if everyone starts creating his own though. Anyway, discussing about CouchDB libs is not the main intent of this post, but rather a personal note I’ve made while going through a couple of PHP guides to CouchDB). As a plus to its ease of use, CouchDB can completely change the architecture of your next web application
So, if you’re planning to get started with CouchDB and PHP, you’d probably like to see an overview of CouchDB. You could either follow the from beginner to CouchDB expert in 2 hours videos or watch Will Leinweber’s Relaxing with CouchDB or just go through David Coallier’s introduction to CouchdB slides embedded below:
Next, you’ll have to install CouchDB on your OS and read one of the PHP getting started posts listed below:
- Gonzalo Ayuso’s PHP and CouchDB ☞ part 1 and ☞ part 2 will walk you through basic CRUD operation with CouchDB and PHP
- Matt Apperson ☞ post will help you get started with CouchDB and PHPillow, a simple PHP helper class.
- Thomas Myer article will introduce you to CouchDB and PHP-on-Couch.
For those familiar with the Zend framework there’s also a Zend and CouchDB integration proposal that you might want to try. And there are probably more CouchDB quick guides around that you might find useful, in which case you’d probably also like to share with others. Meanwhile, have fun with PHP and CouchDB.
Monday, 1 March 2010
Getting Up to Speed with CouchDB and Java
Nothing fancy, but if you haven’t done it already, this article will probably get you started in no time.
This article provides a step-by-step guide for using Apache CouchDB using Java. Here we will use Java code to:
- Create a database in CouchDB,
- Store employee data (Employee number, Name, Designation, etc).
- Perform some CRUD operations on the data that is stored.
- Show the use of “Views” and how it can retrieve data based on field values.
The software we use includes:
- Apache CouchDB 0.9.0 installation
- Couchdb4j-0.1.2 jar file
- Json-lib-2.2.3-jdk15 jar file
- ezmorph-1.0 jar file
- Httpclient-4.0-beta2 jar file
- Httpcore-4.0.1 jar file
Please note the current CouchDB version is 0.10.1, so you might want to try with this version instead of the one mentioned in the article.
Tuesday, 23 February 2010
Monday, 22 February 2010
Installing CouchDB on Your Favorite Linux?
I am seeing lots of tutorials on how to install CouchDB on your favorite flavor of Linux, so I was wondering if this is a complex thing to get going? Or is it more about trying out the latest versions?
As far as I know getting CouchDB on Mac OS is just a matter of downloading it from ☞ CouchDBX. And there’s also the ☞ Homebrew way. Pretty similar experience for ☞ Windows.
So I thought I should ask if you’d find it useful to put together a list of tutorials on how to install uptodate versions of CouchDB on each of these OSes? In case the answer is yes, then please submit your preferred tutorial through a comment. Thanks!
Wednesday, 17 February 2010
MongoDB Tutorial: MapReduce
I don’t consider myself the right person to write detailed tutorials as I usually tend to omit a lot of details . But I’d like to try out a different approach: I’ll share with you the best materials I have found and used myself to learn about a specific feature. Please do let me know if you’ll find this approach useful.
Today will take a look at MongoDB MapReduce. As is normal (at least for making sure that we are getting rid of all future RTFM advice) we will start with the ☞ official documents. In MongoDB MapReduce case, the official documentation will provide us with details about:
- the complete command syntax
- specs for
mapandreducefunctions - as a bonus a couple of basic examples
There are also a couple of important aspects that you’ll have to keep in mind while implementing your own MongoDB MapReduce functions:
- The MapReduce engine may invoke reduce functions iteratively; thus, these functions must be idempotent. That is, the following must hold for your reduce function:
for all k,vals : reduce( k, [reduce(k,vals)] ) == reduce(k,vals) - Currently, the return value from a reduce function cannot be an array (it’s typically an object or a number).
- If you need to perform an operation only once, use a finalize function.
Knowing the basics, what I’ve found to work well for me was to take a look at a simple but close to real life example. In this case I have chosen the ☞ following piece of code which implements a basic text search.
I have also found very useful to take a look at how SQL translates to MapReduce in MongoDB.
Just to make sure that I got things straight by now, I used the 3rd part of Kyle Banker’s MongoDB aggregation tutorial: MapReduce basics.
The last step in learning about MapReduce in MongoDB was to take a look at some real usecases. Depending on your programming language preference, I’d recommend one of these two MongoDB MapReduce usecases:
- Ruby: Visualizing log files with MongoDB, MapReduce, Ruby & Google Charts: ☞ part 1 and ☞ part 2
- Perl: Using MongoDB and MapReduce on Apache Access Logs ☞
Summarizing our short tutorial on MongoDB MapReduce:
- ☞ official documents
- ☞ Basic text search example
- Translate SQL to MongoDB MapReduce
- MongoDB aggregation tutorial: MapReduce basics
- MongoDB MapReduce Usecases:
In case you have other materials on MongoDB MapReduce that you consider essential please share them with us!
Tuesday, 16 February 2010
Tokyo Cabinet Tutorial: Database Types and Configuration Options
A great piece of documentation for the 3 different storage types supported by Tokyo Cabinet: hash, B+ tree and fixed-length. The article also features a long list of tuning parameters.
Here are a couple of things that I’ve learned myself:
- Tokyo Cabinet support multi-operation transactions
- the extension of the file determines the type of storage:
tch: Tokyo Cabinet Hash databasetcb: Tokyo Cabinet B+Tree databasetcf: Tokyo Cabinet Fixed-Length database
- while Tokyo Cabinet B+Tree storage might be a bit slower than Tokyo Cabinet Hash storage, it brings new features:
- keys are ordered (default lexical, but can be configured by passing a comparison function)
- as a consequence it supports key ranges
- allows duplicate values to be stored under the same key
- Tokyo Cabinet Fixed-Length has some restrictions:
- all keys are positive integers
- as Tokyo Cabinet B+Tree keys are ordered (based on the integer keys) and it is not configurable
- all values stored have fixed-length
- on the bright side, Tokyo Cabinet Fixed-Length support some special keys:
:min,:max,:prevand:next.
I think this is a great contribution by James Edward Gray II to the Tokyo Cabinet community which is facing some problems including the lack of documentation.
via: http://blog.grayproductions.net/articles/tokyo_cabinets_keyvalue_database_types
Most Popular Articles
- Translate SQL to MongoDB MapReduce
- Tutorial: Getting Started With Cassandra
- CouchDB vs MongoDB: An attempt for a More Informed Comparison
- Cassandra @ Twitter: An Interview with Ryan King
- A Couple of Nice GUI Tools for MongoDB
- NoSQL benchmarks and performance evaluations
- Ehcache: Distributed Cache or NoSQL Store?
- Document Databases Compared: CouchDB, MongoDB, RavenDB
- Quick Review of Existing Graph Databases
- NoSQL Data Modeling
