Riak: All content tagged as Riak in NoSQL databases and polyglot persistence
Couple of things I don’t see mentioned in the RedMonk post:
if and how data has been normalized based on each connector availability
According to the post data has been collected between Jan.2011-Mar.2012 and I think that not all connectors have been available since the beginning of the period.
if and how marketing pushes for each connectors have been weighed in
Announcing the Hadoop connector at an event with 2000 attendees or the MongoDB connector at an event with 800 attendeed could definitely influence the results (nb: keep in mind that the largest number is less than 7000, thus 200-500 downloads triggered by such an event have a significant impact)
Redis and VoltDB are mostly OLTP only databases
Original title and link: NoSQL Databases Adoption in Numbers ( ©myNoSQL)
- Dynamo (key-value)
- Voldemort (key-value)
- Tokyo Cabinet (key-value)
- KAI (key-value)
- Cassandra (column-oriented/tabular)
- CouchDB (document-oriented)
- SimpleDB (document-oriented)
- Riak (document-oriented)
A couple of clarifications to the list above:
- Dynamo has never been available to the public. On the other hand DynamoDB is not exactly Dynamo
- Tokyo Cabinet is not a distributed database so it shouldn’t be in this list
- CouchDB isn’t a distributed database either, but one could argue that with its peer-to-peer replication it sits right at the border. On the other hand there’s BigCouch.
Original title and link: Which NoSQL Databases Are Robust to Net-Splits? ( ©myNoSQL)
One of the major releases that happened around the end of February (and I’ve missed due to some personal problems), is Riak 1.1. I assume that by now everyone using Riak already knows all the goodies packaged by the Basho team in this new release, but for those that are not yet onboard here is a summary:
From the Release notes:
- Numerous changes to Riak Core which address issues with cluster scalability, and enable Riak to better handle large clusters and large rings
- New Ownership Claim Algorithm: The new ring ownership claim algorithm introduced as an optional setting in the 1.0 release has been set as the default for 1.1. The new claim algorithm significantly reduces the amount of ownerhip shuffling for clusters with more than N+2 nodes in them.
Riak KV improvements:
- Liskeys backpressure: Backpressure has been added to listkeys to prevent the node listing keys from being overwhelemed.
- Don’t drop post-commit errors on floor
- The MapReduce interface now supports requests with empty queries. This allows the 2i, list-keys, and search inputs to return matching keys to clients without needing to include a reduce_identity query phase.
- MapReduce error messages have been improved. Most error cases should now return helpful information all the way to the client, while also producing less spam in Riak’s logs.
- Bitcask and LevelDB improvements
Then there’s also Riaknostic and the new Riak admin tool: Riak Control.
What is Riaknostic?
From the initial Riaknostic announcement:
Riaknostic is an Erlang script (escript) that runs a series of “diagnostics” or “checks”, inspecting your operating system and Riak installation for known potential problems and then printing suggestions for how to fix those problems. Riaknostic will NOT fix those problems for you, it’s only a tool for diagnostics. Some of the things it checks are:
- How much memory does the Riak process currently use?
- Do Riak’s data directories have the correct permissions?
- Did the Riak node crash in the past and leave a dump file?
Riaknostic project page is here.
What is Riak Control?
From Riak Control GitHub page:
Riak Control is a set of webmachine resources, all accessible via the /admin/* paths, allow you to inspect your running cluster, and manipulate it in various ways.
Now that description doesn’t make Riak Control any justice. What Riak Control is a very fancy REST-driven admin interface for Riak. You don’t have to take my word for it, so check this screenshot:
Riak Control covers different details of a Riak cluster:
- general cluster status
- details about the cluster
- details about the ring
This blog post gives more details about Riak Control and a couple more sexy screenshots. If you’d like to dive a bit deeper into Riak Control, you can also watch after the break a 25min video of Mark Phillips talking about it.
Riak and WebMachine are the two systems for which I wished I knew Erlang so I could dive into and learn more about. I’m already (slowly) working to change this.
If you are asked to compare (or you just wonder about) the performance of link walking and map-reduce in Riak keep in mind the following details of how the two mechanism are implemented:
My emphasis on Bryan Fink’s email from Riak’s mailing list.
Original title and link: Riak Performance of Link Walking vs MapReduce ( ©myNoSQL)
Auric Systems International, a leader in merchant transaction processing solutions, relies on Basho’s Riak to power its PaymentVault(TM) solution for PCI compliance. Riak was chosen because of the simplicity by which it replicates data, including stored encrypted credit card tokenized data, its ability to automate the aging of data, and its availability as open source.
After spending half an hour on the pcisecuritystandards site I still couldn’t figure out what the Level 1 PCI compliancy means to understand what Riak brought to the table.
If you thought all systems in the financial sector need transactions and are using relational databases, then I guess you were wrong. Read also the Card payment sytems and the CAP theorem to see the requirements of another financial service.
Original title and link: Riak Used by Auric Systems to Meet PCI Compliance Requirements ( ©myNoSQL)
Old Quora question with very good answers.
- (pro) can (potentially) query live data
- (pro) can (conceptually) be highly efficient at joining data sets that are identically sharded on the join key (the joins can be pushed down into the key-value store itself)
- (con) full scans (the most common pattern for map-reduce) is most likely to be much faster with raw file system access
- (con) because of the better decoupling of computation and storage in the GFS+Map-Reduce model - tolerating hot spots (resulting from MR jobs) is much easier
- (con) key-value stores are rarely arranged to have schemas optimized for analytics
Original title and link: Pros and Cons of Using MapReduce With Distributed Key-Value Stores: HBase, Cassandra, Riak ( ©myNoSQL)