tutorial: All content tagged as tutorial in NoSQL databases and polyglot persistence
Friday, 23 July 2010
Tutorial: MapReduce with Riak
While we’ve talked in the past about Riak and MapReduce support and Sean Cribbs’s Riak tutorial is covering it too, the following video covers exclusively MapReduce with Riak.
Enjoy the video and slides:
Thursday, 22 July 2010
Presentation: Introduction to Cassandra
Nice addition to the getting started with Cassandra tutorial:
Friday, 18 June 2010
A Cassandra Glossary
If you are just starting to look into Cassandra or want to explain some Cassandra (and not only) terms to your friends or colleagues this might be a good resource defining over 50 terms.
- Snitch
- A snitch is Cassandra’s way of mapping a node to a physical location in the network. It helps determine the location of a node relative to another node in order to assist with discovery and ensure efficient request routing. There are different kinds of snitches.
Also, you can use our guide to getting started with Cassandra.
Thursday, 17 June 2010
Video: 2 Hours Riak Tutorial
A must see tutorial on Riak by Sean Cribbs.
Compare that with Riak in 10 minutes:
Update: Unfortunately it looks like the original video was taken off by The Red Dirt RubyConf people. The only thing I could find is the slide deck:
Thursday, 3 June 2010
Presentation: Cassandra Basics - Indexing
A very informative presentation by Benjamin Black on Cassandra indexing:
There are so many interesting things to learn from these slides. Benjamin is briefly introducing the main Cassandra terms — if you are not familiar with them you can read more in this Cassandra tutorial — and moves to explain how column sorting and partitioning strategies should be used. Also to mention, some really quotable fragments from the deck:
Relational stores are schema oriented. Start from your schema & work forwards
Column stores are query oriented. Start from your queries & work backwards
Cassandra is an index construction kit
Thursday, 27 May 2010
Cassandra Installation Guide for Ubuntu and Debian
As a guy that spent years on the Java platform, I usually don’t pay much attention to installation guides (things are completely different about configuration guides) for solutions running on the Java platform. But sometimes I realize that not everyone likes to deal with the classpath hell[1]:
To install Cassandra on Debian or other Debian derivatives like Ubuntu, LinuxMint…, use the following…
Doesn’t sound complex at all!
References
- [1] ☞ Classpath hell is the Java correspondent of the ☞ DLL hell. (↩)
Tuesday, 18 May 2010
HBase/Hadoop Mac OS Installation Guide
Now we have a very detailed HBase/Hadoop installation guide for Mac OS thanks to Robert J. Berger :
You should now have a fully working Pseudo-Distributed Hadoop / HBase setup on your Mac. This is not suitable for any kind of large data or production project. In fact it will probably fail if you try to do anything with lots of data or high volumes of I/O. HBase seems to not like to work well until you get 4 – 5 regionservers.
But this Pseudo-Distributed version should be fine for doing experiments with tools and small data sets.
via: http://blog.ibd.com/scalable-deployment/hbase-hadoop-on-mac-ox-x/
Wednesday, 5 May 2010
Tutorial: Getting Started With Cassandra
Based on Ronald Mathies’ intro articles to Cassandra and a few other resources I’ve been gathering, I thought I should put together a detailed guide to getting started with Cassandra. As one would expect the ☞ first post is briefly introducing Cassandra and covers the distribution details and installation steps. It should be noted that Windows may not be the best environment to install Cassandra. Also if after the brief intro you’d like to see more details about it, you should check Gary Dusbabek’s presentation on Cassandra or watch Eric Evan’s Cassandra presentation at FOSDEM.
The ☞ second article is focusing on Cassandra data model. If you are not familiar with it, this is the part you’ll want to focus on.
- Column
A column is also referred to as a tuple (triplet) that contains a name, value and a timestamp. This is the smallest data container there is.
- SuperColumn:
A SuperColumn is a tuple with a name and a value, it doesn’t have a timestamp like the Column tuple. Notice that the value is in this case not a binary value but more of a Map style container. The map contains key / column combinations. What is important here is that the key has the same value as the name of the Columnit refers to. So to put it simple, a SuperColumn is a container for one or more Columns. You will see that it will also make a big difference later on when we discuss the ColumnFamily and SuperColumnFamily.
- ColumnFamily:
ColumnFamily is a structure that can keep an infinite number of rows, for most people with an RDBMS background, this is the structure that resembles a Table the most. When you look at the diagram you can see that a ColumnFamily has a name (comparable to the name of a Table), A map with a key (comparable to a row identifier) and a value (which is a Map containing Columns). The map with the columns have the same rules as the SuperColumn, the key has the same value as the name of the Column it refers to.
- SuperColumnFamily:
Finally we have the largest container, the SuperColumnFamily, if you understand the ColumnFamily then this construction isn’t much harder, instead of having Columns in the inner most Map we have SuperColumns. So it just adds an extra dimension. As displayed in the image, the Key of the Map which contain the SuperColumns must be the same as the name of the SuperColumn (just like with the ColumnFamily).
- Keyspace:
Keyspaces are quite simple again, from an RDBMS point of view you can compare this to your schema, normally you have one per application. A keyspace contains the ColumnFamilies. Note however there is no relationship between the ColumnFamiliies, they are just separate containers.
Probably the best explanation of the Cassandra data model can be found in Arin Sarkissian’s ☞ WTF is a SuperColumn?. There are other recommended resources about Cassandra and Jonathan Ellis, Cassandra project chair, has a suggested Cassandra reading list.
☞ Third article in the series is focusing on Cassandra sorting capabilities:
By default Cassandra sorts the data as soon as you store it in the database and it remains sorted. This gives you an enormous performance boost, however you need to think before you start storing data.
Sorting can be specified on the ColumnFamily CompareWith attribute, these are the options you can choose from (it is possible to create custom sorting behavior but we will cover that later):
- BytesType
- UTF8Type
- LexicalUUIDType
- TimeUUIDType
- AsciiType
- LongType
And there is also a way to define your own custom Cassandra sorting types described in ☞ post.
By now you should be ready to start using Cassandra and this is exactly the subject of the ☞ part 4 and ☞ part 5 of the series which cover the Thrift Cassandra client. Understanding how writes and reads are performed might be useful, so you should check Cassandra write operation and Cassandra read operation which also talk about the performance of these operation.
While initially you might not have enough data to have to decide how to partition a Cassandra cluster, once you’ll get to that point I’m pretty sure you’ll appreciate some more details on Cassandra partitioning strategies.
Last, but not least, here is a list of known Cassandra usecases that might give you a good idea of where Cassandra will fit in your next app and then you should be absolutely ready to experiment with Cassandra.
Tuesday, 4 May 2010
Basic AJAX app with CouchDB and JQuery
I could not find too much on the web about using jQuery and CouchDb together, so I decided to put together a little “Hello World” couch app of my own. Having done that, I thought I’d put together a little step-by-step instruction in case it’s of use to others.
I should warn that I’m not too familiar with jQuery, so I may be doing things in odd ways - please let me know if you see something that could be done better.
Monday, 26 April 2010
A Detailed Redis Tutorial
Tuesday, 20 April 2010
Tutorial: MongoDB for PHP programmers
A not quite safe for work, but detailed tutorial to MongoDB with PHP.
via: http://pronewb.com/mongodb-as-in-humongous-not-retarded
Thursday, 8 April 2010
Presentation: Introducing Riak
This is the longest NoSQL presentation I’ve ever posted here: 209 slides! If you’re planning to beat Kevin Smith’s (@kevsmith) record please do let me know in advance so I can reserve enough time to go through it.
My notes below:
What is Riak?
- A flexible storage engine…
- … with a REST API …
- … and map/reduce capability …
- … designed to be fault-tolerant …
- … distributed …
- … and ops friendly
The Riak Way for CAP
- Pick Two
- For each operation
Riak Improvements on Amazon Dynamo N, R, W[1]
- N can vary per bucket
- R and W can vary per operation
- *Choose your own fault tolerance/performance tradeoff
Conflict resolution: Client Resolution[2]
- Can be set per-bucket or server-wide
- Conflicting data is “bubbled up” to the client
- Client picks the winner
Conflict resolution: Server Resolution
- “Last write wins”
- Enabled by default
- What most apps need 80% of the time
The presentation covers also:
- Linking objects (slide 78)
- Map/Reduce (slide 99)
References
- [1] N= number of replicas, R=number of replicas needed for a successful read, W=number of replicas needed for a successful write. (↩)
- [2] Jeff Darcy has an interesting article on ☞ conflict resolution (↩)
Most Popular Articles
- Translate SQL to MongoDB MapReduce
- Tutorial: Getting Started With Cassandra
- CouchDB vs MongoDB: An attempt for a More Informed Comparison
- Cassandra @ Twitter: An Interview with Ryan King
- A Couple of Nice GUI Tools for MongoDB
- NoSQL benchmarks and performance evaluations
- Ehcache: Distributed Cache or NoSQL Store?
- Document Databases Compared: CouchDB, MongoDB, RavenDB
- Quick Review of Existing Graph Databases
- NoSQL Data Modeling