NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



Riak: All content tagged as Riak in NoSQL databases and polyglot persistence

Basho Announces Riak-Based Multi-Tenant, Distributed, S3-Compatible Cloud Storage Platform

Coverage of the announcement of a new product from Basho: Riak CS: a multi-tenant, distributed, S3-compatible cloud storage platform:

My notes about Riak CS will follow shortly.

Original title and link: Basho Announces Riak-Based Multi-Tenant, Distributed, S3-Compatible Cloud Storage Platform (NoSQL database©myNoSQL)

NoSQL Databases Adoption in Numbers

Source of data is Jaspersoft NoSQL connectors downloads. RedMonk published a graphic and an analysis and Klint Finley followed up with job trends:

NoSQL databases adoption

Couple of things I don’t see mentioned in the RedMonk post:

  1. if and how data has been normalized based on each connector availability

    According to the post data has been collected between Jan.2011-Mar.2012 and I think that not all connectors have been available since the beginning of the period.

  2. if and how marketing pushes for each connectors have been weighed in

    Announcing the Hadoop connector at an event with 2000 attendees or the MongoDB connector at an event with 800 attendeed could definitely influence the results (nb: keep in mind that the largest number is less than 7000, thus 200-500 downloads triggered by such an event have a significant impact)

  3. Redis and VoltDB are mostly OLTP only databases

Original title and link: NoSQL Databases Adoption in Numbers (NoSQL database©myNoSQL)

Which NoSQL Databases Are Robust to Net-Splits?

Answered on Quora:

  • Dynamo (key-value)
  • Voldemort (key-value)
  • Tokyo Cabinet (key-value)
  • KAI (key-value)
  • Cassandra (column-oriented/tabular)
  • CouchDB (document-oriented)
  • SimpleDB (document-oriented)
  • Riak (document-oriented)

A couple of clarifications to the list above:

  1. Dynamo has never been available to the public. On the other hand DynamoDB is not exactly Dynamo
  2. Tokyo Cabinet is not a distributed database so it shouldn’t be in this list
  3. CouchDB isn’t a distributed database either, but one could argue that with its peer-to-peer replication it sits right at the border. On the other hand there’s BigCouch.

Original title and link: Which NoSQL Databases Are Robust to Net-Splits? (NoSQL database©myNoSQL)

Riak at Clipboard: Why Riak and How We Made Riak Search Faster

Gary William Flake:

For me, the two most important considerations are (1) how easy it is to write effective code and (2) how bulletproof the system is operationally. Others may argue that other attributes — like performance or the particulars of the data model — are more important, but I’ll pick simplicity and robustness every time1. A simple and robust store can usually be finessed to map to any data model and can be scaled outward to make up for performance.

The rest of the article focuses on the solution Clipboard employed to making Riak Search scale for the scenario of performing multi-matching search queries across millions of documents. While the very details apply only to Clipboard and Riak Search, the idea of precomputing results or at least modeling data in ways that optimize the most often access scenarios are generally applicable.

  1. My emphasis. I find these two principles to be the core of Riak. 

Original title and link: Riak at Clipboard: Why Riak and How We Made Riak Search Faster (NoSQL database©myNoSQL)


NoSQL Hosting Services

Michael Hausenblas put together a list of hosted NoSQL solutions including Amazon DynamoDB and SimpleDB, Google App Engine, Riak, Cassandra, CouchDB, MongoDB, Neo4j, and OrientDB. If you go through my posts on NoSQL hosting , you’ll find a couple more.

Original title and link: NoSQL Hosting Services (NoSQL database©myNoSQL)


Major Riak Release Includes Tons of Improvements, Plus a Riak Admin UI and Riaknostic

One of the major releases that happened around the end of February (and I’ve missed due to some personal problems), is Riak 1.1. I assume that by now everyone using Riak already knows all the goodies packaged by the Basho team in this new release, but for those that are not yet onboard here is a summary:

From the Release notes:

  • Numerous changes to Riak Core which address issues with cluster scalability, and enable Riak to better handle large clusters and large rings
  • New Ownership Claim Algorithm: The new ring ownership claim algorithm introduced as an optional setting in the 1.0 release has been set as the default for 1.1. The new claim algorithm significantly reduces the amount of ownerhip shuffling for clusters with more than N+2 nodes in them.
  • Riak KV improvements:
    • Liskeys backpressure: Backpressure has been added to listkeys to prevent the node listing keys from being overwhelemed.
    • Don’t drop post-commit errors on floor
  • MapReduce Improvements
    • The MapReduce interface now supports requests with empty queries. This allows the 2i, list-keys, and search inputs to return matching keys to clients without needing to include a reduce_identity query phase.
    • MapReduce error messages have been improved. Most error cases should now return helpful information all the way to the client, while also producing less spam in Riak’s logs.
  • Bitcask and LevelDB improvements

Then there’s also Riaknostic and the new Riak admin tool: Riak Control.

What is Riaknostic?

From the initial Riaknostic announcement:

Riaknostic is an Erlang script (escript) that runs a series of “diagnostics” or “checks”, inspecting your operating system and Riak installation for known potential problems and then printing suggestions for how to fix those problems. Riaknostic will NOT fix those problems for you, it’s only a tool for diagnostics. Some of the things it checks are:

  • How much memory does the Riak process currently use?
  • Do Riak’s data directories have the correct permissions?
  • Did the Riak node crash in the past and leave a dump file?

Riaknostic project page is here.

What is Riak Control?

From Riak Control GitHub page:

Riak Control is a set of webmachine resources, all accessible via the /admin/* paths, allow you to inspect your running cluster, and manipulate it in various ways.

Now that description doesn’t make Riak Control any justice. What Riak Control is a very fancy REST-driven admin interface for Riak. You don’t have to take my word for it, so check this screenshot:

Riak Control

Riak Control covers different details of a Riak cluster:

  • general cluster status
  • details about the cluster
  • details about the ring

This blog post gives more details about Riak Control and a couple more sexy screenshots. If you’d like to dive a bit deeper into Riak Control, you can also watch after the break a 25min video of Mark Phillips talking about it.

Since the Riak 1.1.0 release, there has been a bug fix release 1.1.1 addressing some MapReduce bugs described on the mailing list but also on the Riak 1.1.1 release notes.

Riak and WebMachine are the two systems for which I wished I knew Erlang so I could dive into and learn more about. I’m already (slowly) working to change this.

Paginating With Riak

Alexander Sicular explaining why pure key-value stores require a different approach when an application needs to paginate through result sets:

Riak at its core is a distributed key/value persisted data store that also happens to do a lot of other things. Now break that down. Looking at those words individually we have “distributed”, meaning that your data lives on a number of different machines in your cluster. Good thing, right? Yes. However it also means that no single machine is the canonical reference for all your data. Which in turn means that you need to ask multiple machines for your data and those machines will return data to you when they see fit, ie. not in order. Moving on, we have “key/value”. In regards to the topic at hand, this means that Riak has no insight into any data held within your keys, ie. Riak does not care if your stored json object has an age value in it. Next, we have “persisted”. Riak has no native internal index, meaning Riak will not store on disk the data you send it in any useful way - useful to you at least. Lastly, we have “happens to do a lot of other things.” Thankfully for us, one of those other things is Map/Reduce.

Original title and link: Paginating With Riak (NoSQL database©myNoSQL)


Multiple Index Queries in Riak Using Python

Sreejith K describing his riak_multi_query Python library for muti-indeces-based queries:

One of the advantage of using LevelDB with Riak is that they support Secondary Indexes. […] I wrote a Python wrapper that allows multiple index queries using Secondary indexes and MapReduce. The basic idea is as follows:

  • Query Multiple Indexes and get the associated keys
  • Pass the keys to a MapReduce job where Multiple filters are again evaluated. The map phase applies all the conditions to individual keys.

Now imagine this library would run those queries in parallel.

Original title and link: Multiple Index Queries in Riak Using Python (NoSQL database©myNoSQL)


Riak Precommit Hooks for Creating Secondary Indeces

I wanted to create a secondary index on a field so that when I want to look up the data, it doesn’t require a full M/R to do.  The great Riak Handbook showed a couple of examples of creating a secondary index, and it looked simple enough.  It actually is pretty simple, but the documentation and examples are few and far between, so I’m going to share my experience. […] Whenever an item is created in this bucket, I want a secondary index based on the app.  I never update items in this bucket, only create.  In fact, I never delete these items either, but figured my code should handle that case.

Erlang code included.

Original title and link: Riak Precommit Hooks for Creating Secondary Indeces (NoSQL database©myNoSQL)


Riak Performance of Link Walking vs MapReduce

If you are asked to compare (or you just wonder about) the performance of link walking and map-reduce in Riak keep in mind the following details of how the two mechanism are implemented:

The biggest difference I see is that the link-walk uses an Erlang function where your MapReduce query uses a Javascript function (link-walking is implemented as a MapReduce query internally).

Serializing/deserializing to JSON as well as contention for Javascript VMs likely accounts for the lost time.

My emphasis on Bryan Fink’s email from Riak’s mailing list.

Original title and link: Riak Performance of Link Walking vs MapReduce (NoSQL database©myNoSQL)

NoSQL Market from Couchbase Perspective

James Philips (Couchbase) for Curt Monash:

  • MongoDB is the big competition. He believes Couchbase has an excellent win rate vs. 10gen for actual paying accounts.
  • DataStax/Cassandra wins over Couchbase only when multi-data-center capability is important. Naturally, multi-data-center capability is planned for Couchbase. (Indeed, that’s one of the benefits of swapping in CouchDB at the back end.)
  • Redis has “dropped off the radar”, presumably because there’s no particular persistence strategy for it.
  • Riak doesn’t show up much.

I assume this is sort of a pre-sales/sales department 100k feet overview.

Original title and link: NoSQL Market from Couchbase Perspective (NoSQL database©myNoSQL)


Riak Used by Auric Systems to Meet PCI Compliance Requirements

PR announcement:

Auric Systems International, a leader in merchant transaction processing solutions, relies on Basho’s Riak to power its PaymentVault(TM) solution for PCI compliance. Riak was chosen because of the simplicity by which it replicates data, including stored encrypted credit card tokenized data, its ability to automate the aging of data, and its availability as open source.

After spending half an hour on the pcisecuritystandards site I still couldn’t figure out what the Level 1 PCI compliancy means to understand what Riak brought to the table.

If you thought all systems in the financial sector need transactions and are using relational databases, then I guess you were wrong. Read also the Card payment sytems and the CAP theorem to see the requirements of another financial service.

Original title and link: Riak Used by Auric Systems to Meet PCI Compliance Requirements (NoSQL database©myNoSQL)