tutorial: All content on NoSQL databases and projects about tutorial, featuring the best daily NoSQL articles, news, and links on tutorial

Riak: Building a Wiki

by Alex Popescu

Twitter Reddit

If you are not planning to build a new Wikipedia use this as an educational example:

Original title and link for this post: Riak: Building a Wiki (published on the NoSQL blog: myNoSQL)


Getting Started with Hadoop

by Alex Popescu

Twitter Reddit

Good intro material about Hadoop (and a bit of Hive):

One design pattern that both Google and Facebook share is the ability to distribute computations among large clusters of machines that all share a common data source. The pattern is called Map/Reduce, and Hadoop is an open source implementation of this. This article is an introduction to Hadoop. Even if you donʼt currently have a massive scaling issue, it can be worthwhile to become familiar with Map/Reduce as a concept, and playing with Hadoop is a good way to do that.

If you are new to map/reduce and Hadoop, keep also in mind that many NoSQL databases — Riak, CouchDB, MongoDB to name a few — are able to run natively map/reduce jobs.

Getting Started with Hadoop originally posted on the NoSQL blog: myNoSQL


Riak and Rails: 6 Steps for Getting Started

by Alex Popescu

Twitter Reddit
1 likes

From Basho:

Web applications built with Ruby on Rails have lots of ways to take advantage of scalable, distributed storage systems like Riak. These resources can help you get started.

Video and slides below:

Riak and Rails: 6 Steps for Getting Started originally posted on the NoSQL blog: myNoSQL


Quick Guide to MongoDB Replica Sets

by Alex Popescu

Twitter Reddit
2 likes

One of the much awaited features in MongoDB 1.6 is replica sets, MongoDB replication solution providing automatic failover and recovery[1].

Chris Heald has a very detailed guide on setting up MongoDB replica sets:

We’re now up and running with a replica set. We can add new slaves to the replica set, force a new master, take nodes in the cluster down, and all that jazz without impacting your app. You can even set up replica slaves in other data centers for zero-effort offsite backup. If your DB server exploded, you could point your app at the external datacenter’s node and keep running while you replace your local database server. Once your new server is up, just bring it online and re-add its node back into your replica set. Data will be transparently synched back to your local node. Once the sync is complete, you can re-elect your local node as the master, and all is well again.

Update: Kristina Chodorow (10gen) has a 3 part series about MongoDB replica sets: ☞ part 1 (setup), ☞ part 2 (replica sets behind the scene) and ☞ part 3 (migrating live setups to use replica sets).


  1. You can read more about replica sets ☞ here and ☞ here.  ()

Quick Guide to MongoDB Replica Sets originally posted on the NoSQL blog: myNoSQL


Video Guide of What’s New in CouchDB 0.11 and 1.0

by Alex Popescu

Twitter Reddit

CouchDB 1.0 has been out for two weeks already, so if you haven’t upgraded yet or haven’t had the time to check all new cool features in CouchDB 1.0, here is a video of Jan Lehnardt (@janl), core CouchDB developer, covering what’s new in CouchDB 0.11 and CouchDB 1.0:


Tutorial: MapReduce with Riak

by Alex Popescu

Twitter Reddit

While we’ve talked in the past about Riak and MapReduce support and Sean Cribbs’s Riak tutorial is covering it too, the following video covers exclusively MapReduce with Riak.

Enjoy the video and slides:


Presentation: Introduction to Cassandra

by Alex Popescu

Twitter Reddit
1 likes

Nice addition to the getting started with Cassandra tutorial:


A Cassandra Glossary

by Alex Popescu

Twitter Reddit

If you are just starting to look into Cassandra or want to explain some Cassandra (and not only) terms to your friends or colleagues this might be a good resource defining over 50 terms.

Snitch
A snitch is Cassandra’s way of mapping a node to a physical location in the network. It helps determine the location of a node relative to another node in order to assist with discovery and ensure efficient request routing. There are different kinds of snitches.

Also, you can use our guide to getting started with Cassandra.


Video: 2 Hours Riak Tutorial

by Alex Popescu

Twitter Reddit
1 likes

A must see tutorial on Riak by Sean Cribbs.

Compare that with Riak in 10 minutes:


Presentation: Cassandra Basics - Indexing

by Alex Popescu

Twitter Reddit
1 likes

A very informative presentation by Benjamin Black on Cassandra indexing:

There are so many interesting things to learn from these slides. Benjamin is briefly introducing the main Cassandra terms — if you are not familiar with them you can read more in this Cassandra tutorial — and moves to explain how column sorting and partitioning strategies should be used. Also to mention, some really quotable fragments from the deck:

Relational stores are schema oriented. Start from your schema & work forwards

Column stores are query oriented. Start from your queries & work backwards

Cassandra is an index construction kit


Cassandra Installation Guide for Ubuntu and Debian

by Alex Popescu

Twitter Reddit

As a guy that spent years on the Java platform, I usually don’t pay much attention to installation guides (things are completely different about configuration guides) for solutions running on the Java platform. But sometimes I realize that not everyone likes to deal with the classpath hell[1]:

To install Cassandra on Debian or other Debian derivatives like Ubuntu, LinuxMint…, use the following…

Doesn’t sound complex at all!

References


HBase/Hadoop Mac OS Installation Guide

by Alex Popescu

Twitter Reddit
2 likes

Now we have a very detailed HBase/Hadoop installation guide for Mac OS thanks to Robert J. Berger :

You should now have a fully working Pseudo-Distributed Hadoop / HBase setup on your Mac. This is not suitable for any kind of large data or production project. In fact it will probably fail if you try to do anything with lots of data or high volumes of I/O. HBase seems to not like to work well until you get 4 – 5 regionservers.

But this Pseudo-Distributed version should be fine for doing experiments with tools and small data sets.