riak: All content on NoSQL databases and projects about riak, featuring the best daily NoSQL articles, news, and links on riak

Riak: Sort by with MapReduce

by Alex Popescu

Twitter Reddit

Alexander Sicular:

The focus of this post is to show you how to do the equivalent of the sql “SORT BY date DESC” using Riak’s map/reduce interface. Due to Riak’s schemaless, document focused nature Riak lacks internal indexing and by extension, native sorting capabilities.

Complete code included (and embedded below):

A couple of links you’ll probably find useful before/after reading the article:

Original title and link for this post: Riak: Sort by with MapReduce (published on the NoSQL blog: myNoSQL)


Riak: Improvements for Fetching Bucket Keys

by Alex Popescu

Twitter Reddit

Previously discouraged, listing keys in a Riak bucket got some major improvements:

Due to these three changes, there are two effective results:

  1. In nearly all cases, the list_keys operator is much faster than before. In some common cases it is 10 times faster.
  2. In cases of very large buckets, memory allocation will not spike during key listing. (though of course if you ask Riak to build the whole list for you instead of streaming it out, then at least that much must be used to accommodate)

☞ lists.basho.com

Currently this works only with Riak’s Bitcask storage.

I was thinking that for other storage backends, if you want to trade some space (and consistency?) you could use Riak’s post-commit hooks to manage your own lists of keys.

Original title and link for this post: Riak: Improvements on Bucket Key Lists (published on the NoSQL blog: myNoSQL)


Riak: Building a Wiki

by Alex Popescu

Twitter Reddit

If you are not planning to build a new Wikipedia use this as an educational example:

Original title and link for this post: Riak: Building a Wiki (published on the NoSQL blog: myNoSQL)


Riak Map/Reduce Queries in Clojure

by Alex Popescu

Twitter Reddit

Over this week I’ve been working on a proof of concept to see if it’s possible to use Clojure as the map/reduce language for Riak, in the same way now we can use Javascript and Erlang for that purpose. To accomplish that I needed a way to call Clojure code from Erlang. So I set up a very simple server in Clojure that runs as an Erlang node using Closerl.

Theoretically nice… practically I’d say there is a fundamental problem with this idea (different than the ones listed in the article). Map and reduce functions are supposed to run on the nodes hosting the data[1]. If you need to wire this data is like implementing mapreduce on your application so the data locality property is lost. Not to mention that adding another variable to the equation (the JVM) your distributed system will become more sensible to failures.


  1. As mentioned in this question about Riak MapReduce, currently Riak runs only the map functions on all nodes, while reduce function is run on the node receiving the request.  ()

Original title and link for this post: Riak Map/Reduce Queries in Clojure (published on the NoSQL blog: myNoSQL)


CouchDB: Horizontal Scalability from Cloudant

by Alex Popescu

Twitter Reddit
1 likes

Even if CouchDB benefits of probably one of the most sophisticated and cool replication mechanisms that doesn’t make it horizontally scalable. I’ve already covered the different solutions for scaling CouchDB, but what Cloudant promises seems to be the missing part:

All of these features — distributed, horizontally scalable, durable, consistent — happen with little or no change required in applications that have been written for CouchDB. A cluster looks just like a stand-alone CouchDB, and API compliance has been our goal from the beginning. Granted, there are a few extra options like overriding quorum constant defaults and there are a few vagaries, like views always performing rereduce due to the views being distributed. But on the whole, the extras in Cloudant are transparent to the application.

Now I’m wondering how Cloudant CouchDB scaling compares with running CouchDB with a Riak backend, Riak offering also a Dynamo-like distributed system.

CouchDB: Horizontal Scalability from Cloudant originally posted on the NoSQL blog: myNoSQL


Video: Riak from Small to Large

by Alex Popescu

Twitter Reddit
1 likes

Rusty Klophaus (Basho) talking about how you can go from using a single Riak server to a fully distributed Riak installation:

I’ve seen this presentation live at Berlin Buzzwords and it is a must see.

Video: Riak from Small to Large originally posted on the NoSQL blog: myNoSQL


Riak and Rails: 6 Steps for Getting Started

by Alex Popescu

Twitter Reddit
1 likes

From Basho:

Web applications built with Ruby on Rails have lots of ways to take advantage of scalable, distributed storage systems like Riak. These resources can help you get started.

Video and slides below:

Riak and Rails: 6 Steps for Getting Started originally posted on the NoSQL blog: myNoSQL


CouchDB with a Riak Backend

by Alex Popescu

Twitter Reddit

Pure awesomeness:

To make CouchDB store documents remotely, we only have to replace the implementation of the two functions listed above. For our remote storage let’s use Riak as our Key-Value store (because it’s awesome). CouchDB persists Erlang terms to disk and Riak persists Erlang terms to disk. We get to remove redundant code from CouchDB since Riak is converting terms for us. Riak also automatically replicates everything we store, easily handles adding more machines to increase capacity, and deals with failures transparently.

Not only has Matt seen seen the connection between CouchDB and Riak, but he made it ☞ work.

CouchDB with a Riak Backend originally posted on the NoSQL blog: myNoSQL


Quick Guide for Riak with Clojure

by Alex Popescu

Twitter Reddit

From installation to using the Clojure library for Riak ☞ clj-riak including MapReduce with Riak:

This brief introduction leaves many aspects of Riak unaddressed. For example, we have not looked at throughput, scalability, fault tolerance, conflict resolution, or production operations – all critical to a complete understanding of the datastore.

Quick Guide for Riak with Clojure originally posted on the NoSQL blog: myNoSQL


From Cassandra to Riak at inagist.com

by Alex Popescu

Twitter Reddit

A couple of confusing things in this post:

The nice thing about Cassandra was the data model. Super columns allowed us to store metadata for a resource as needed. […] Concurrency issues were also not a bother. We could do simultaneous updates to columns and super columns and not worry about data consistency issues. […] When looking for alternatives Riak was our first choice primarily because of it being in Erlang and since it had a map-reduce option which looked seriously promising.

I don’t see any connection between these. Going from a granular data model supporting column level operations to an key-value store with opaque values doesn’t really add up.

Of the back-ends available this has worked best for us giving a consistent performance along with being reasonable on the resource usage.

This seems a bit contradictory with what was said about the new default Riak storage Bitcask in the Innostore and Bitcask comparison.

Anyone able to clarify these? (nb I’m not saying something is wrong, but I’d like to better understand the details). For now, Mozilla story Cassandra, HBase, Riak: Choosing the Right Solution seems to be much better documented.

Update: Thanks to Jebu Ittiachen things are a clearer now:

My issues with Cassandra and with Bitcask under Riak were with how they behaved in terms of their memory consumption. In the presence of ever increasing number of keys like the tweets which keep coming in both of them would eat up all the memory available on my servers. Cassandra I guess because of its per SSTable cache of keys and Bitcask because it maintains all keys in memory. This initially being the reason for me looking out for a different store than Cassandra. I should mention that in addition to tweets other data is also managed in Cassandra / Riak.

What I was trying to convey is how something that was easily modeled in Cassandra could still be mapped into Riak and possibly be to an advantage given the map-reduce infrastructure.

My preference of innostore over bitcask has purely been seeing how they behave in real use. Bitcask is definitely faster but high in memory usage on the servers. Innostore on the other hand is steady on the memory usage over time.

From Cassandra to Riak at inagist.com originally posted on the NoSQL blog: myNoSQL


Building Blocks of Dynamo-like Distributed Systems

by Alex Popescu

Twitter Reddit

Basho guys have started to talk about their experience on building Riak, the Dynamo-like distributed key-value store and the common building blocks of distributed systems.

Justin Sheehy interviewed by Sadek Drobi over ☞ InfoQ.com:

Even just the Dynamo specific parts are very dramatic in differences. There have been a number of Dynamo-like systems developed over the past few years, each of which has had to design and implement large portions of even just the Dynamo-like sections on their own. Because Dynamo tells you what some very good design decisions are but it doesn’t show you how to implement the system. Even just the Dynamo portion you have to do a lot of design work, just to implement that.

Justin on choosing Erlang for implementing Riak:

There was a really natural choice because especially when you look at the Dynamo model, where they talk about all these operations where to get a value you’ll send messages to multiple other parties, then you’ll wait through various phases for responses of different classes to come back and the basic building blocks to do that kind of messaging and to do that kind of more complex state machine are there for you out of the box for you in Erlang.

Kevin Smith promises a series of posts covering the details of ☞ riak_core, the refactored core of the Riak system that can be used for building Dynamo-like distributed systems:

Distributed systems are complex and some of that complexity shows in the amount of features available in riak_core. Rather than dive deeply into code, I’m going to separate the features into broad categories and give an overview of each.

The ☞ first part covers aspects like:

Definitely a series I’ll keep an eye on as I’m pretty sure there are many things to be learned from their experience. (shameless plug) If you happen to be in the Bay area in November, come check the NoSQL track at QCon where, even if not yet published yet, among others, Andy Gross, VP of engineering at Basho, will be speaking about how to build Dynamo style systems using Riak’s core.

Building Blocks of Dynamo-like Distributed Systems originally posted on the NoSQL blog: myNoSQL


Release: Riak 0.12.0, Improving Failure Recovery

by Alex Popescu

Twitter Reddit

I’m not really sure how I’ve missed the release of Riak 0.12.0 last week. The ☞ release notes are listing over 30 enhancements and bug fixes. Another important update coming with Riak 0.12.0 is the improved failure recovery:

Riak now uses a new and improved mechanism for determining whether a node is fully online and ready to participate in Riak operations. This is especially important in failure recovery situations, as it allows the storage backend to complete a full integrity check and repair process.

Riak is getting closer to celebrate 1 year and there were more than 12 releases during this year. That’s a nice pace for a project!

Meanwhile, Riak has got a library in Go named ☞ Goriak whose API is described ☞ here, but also a Javascript library for node.js[1]: ☞ riak-js.


  1. As we can see in how many mentions node.js sees in the NoSQL database world, I’d say people are looking for interesting ways to connect the two technologies.  ()