NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



BerkleyDB: All content tagged as BerkleyDB in NoSQL databases and polyglot persistence

Choosing Oracle NoSQL Database Over… Open Source

From Passoker chooses Oracle NoSQL Database over Open Source:

The main things that helped were: very clean and simple APIs, set of Java tools we could use for it, very, very easy to setup up, good GUI administration tools. All the things you perhaps wouldn’t get if you’d look to an open source solution (nb: my emphasis).

You’d perhaps be amazed of what you can find in open source solutions like say Riak or this newcomer. No need to bash open source tools in general.

Berkeley DB at Yammer: Application Specific NoSQL Data Stores for Everyone

Even if I’ve been using Berkley DB for over 6 years, I very rarely heard stories about it. This presentation from Yammer tells the story of taking Berkley DB a long way:

In early 2011 Yammer set out to replace an 11 billion row PostgreSQL message delivery database with something a bit more scale-ready. They reached for several databases with which they were familiar, but none proved to be a fit for various reasons. Following in the footsteps of so few before them, they took the wheel of the SS Berkeley DB Java Edition and piloted it into the uncharted waters of horizontal scalability.

In this talk, Ryan will cover Yammer’s journey through log cleaner infested waters, being hijacked on the high seas by the BDB B-tree cache, and their eventual flotilla of a 45 node, 256 partition BDB cluster.

NoSQL and Relational Databases Podcast With Mathias Meyer

EngineYard’s Ines Sombra recorded a conversation with Mathias Meyer about NoSQL databases and their evolution towards more friendlier functionality, relational databases and their steps towards non-relational models, and a bit more on what polyglot persistence means.

Mathias Meyer is one of the people I could talk for days about NoSQL and databases in general with different infrastructure toppings and he has some of the most well balanced thoughts when speaking about this exciting space—see this conversation I’ve had with him in the early days of NoSQL. I strongly encourage you to download the mp3 and listen to it.

Original title and link: NoSQL and Relational Databases Podcast With Mathias Meyer (NoSQL database©myNoSQL)

Griffon and NoSQL Databases

Andres Almiray:

The following list enumerates all NoSQL options currently supported by Griffon via plugins:

  • BerkeleyDB
  • CouchDB
  • Memcached
  • Riak
  • Redis
  • Terrastore
  • Voldemort
  • Neo4j
  • Db4o
  • Neodatis

The first 7 are Key/Value stores. Neo4j is a Graph based database. The last two are object stores. All of them support multiple datasources, data bootstrap and a Java friendly API similar to the one shown earlier.

Griffon is a Groovy-based framework for developing desktop applications. While the coolness factor of Java-based desktop apps is close to zero, having some multi-platform management utilities for these NoSQL databases might be interesting.

Original title and link: Griffon and NoSQL Databases (NoSQL database©myNoSQL)


The Oracle NoSQL Database 11G

A bit after posting my predictions about the Oracle NoSQL database, I’ve received a link to a PDF introducing the Oracle NoSQL database, embedded below for your reference.


  • based on BerkleyDB Java Edition. Thus it is a key-value store
  • it’s a commercial product available as a Community edition and an Enterprise edition
  • single-master with multireplicas.
  • PAXOS-based automated fail-over master election
  • supports configurable consistency policies
  • auto-sharding
  • update: there’s no download available yet, the term mentioned being mid-October

Update: There’s an official product page: Oracle NoSQL Database Technical Overview.

Oracle NoSQL database key features:

  • Simple Data Model
  • Key-value pair data structure, keys are composed of Major & Minor keys
  • Easy-to-use Java API with simple Put, Delete and Get operations
  • Scalability
  • Automatic, hash-function based data partitioning and distribution
  • Intelligent NoSQL Database driver is topology and latency aware, providing optimal data access
  • Predictable behavior
  • ACID transactions, configurable globally and per operation
  • Bounded latency via B-tree caching and efficient query dispatching
  • High Availability
  • No single point of failure
  • Built-in, configurable replication
  • Resilient to single and multi-storage node failure
  • Disaster recovery via data center replication
  • Easy Administration
  • Web console or command line interface
  • System and node management
  • Shows system topology, status, current load, trailing and average latency, events and alerts

The Oracle NoSQL Database and Big Data Appliance

There’s been a lot of speculation about the announcements coming from Oracle’s OpenWorld event. A first part was revealed during the keynote in the form of an in-memory analytics appliance called Exalytics [2]. But there’s talk about a Big Data Appliance and an Oracle NoSQL database.

Here’re my predictions[1]

  1. Oracle became very aggressive in selling products based on hardware, software, and services. So they’ll announce a Hadoop appliance integrated with an existing Oracle product. It could be either the Oracle Exadata or even the newly announced Exalytics.

    This appliance will place Oracle in competition with all other Hadoop appliance sellers: EMC, NetApp, IBM. Also these days most of the analytics databases try to integrate with Hadoop.

  2. Oracle already has a couple of non-relational solutions in their portfolio: BerkleyDB, TimesTen, Coherence. And they’ve already started to test the NoSQL market by announcing the MySQL and MySQL Cluster NoSQL hybrid systems.

    I don’t expect Oracle NoSQL database to be a new product. Just a rebranding or repackaging of one of the above mentioned ones. Probably the TimesTen.

  3. Oracle will invest more into integrating its line of products with Hadoop. Having both a Hadoop and an in-memory analytics appliance will make them very competitive in this space.

  4. Oracle will extend the support for NoSQLish interfaces (memcached) to its other database products.

What are your predictions?

  1. or speculations  

  2. I’m currently gathering more details about Exalytics.  

Original title and link: The Oracle NoSQL Database and Big Data Appliance (NoSQL database©myNoSQL)

Using Oracle Berkley DB as a NoSQL Data Store

Simple as it may be at its core, Berkeley DB can be configured to provide concurrent non-blocking access or support transactions, scaled out as a highly available cluster of master-slave replicas, or in a number of other ways.

Berkeley DB is a pure storage engine that makes no assumptions about an implied schema or structure to the key-value pairs. Therefore, Berkeley DB easily allows for higher level API, query, and modeling abstractions on top of the underlying key-value store.

Berkley DB is used as the storage engine for MemcacheDB and is one of the pluggable engines for Project Voldemort.

Original title and link: Using Oracle Berkley DB as a NoSQL Data Store (NoSQL databases © myNoSQL)


Is Berkeley DB a NoSQL solution?

Berkeley DB evolved too, we added high availability (HA) and a replication manager that makes it easy to setup replica groups. Berkeley DB’s replication doesn’t partitioned the data, every node keeps an entire copy of the database. For consistency, there is a single node where writes are committed first – a master – then those changes are delivered to the replica nodes as log records. Applications can choose to wait until all nodes are consistent, or fire and forget allowing Berkeley DB to eventually become consistent. Berkeley DB’s HA scales-out quite well for read-intensive applications and also effectively eliminates the central point of failure by allowing replica nodes to be elected (using a PAXOS algorithm) to mastership if the master should fail. This implementation covers a wide variety of use cases. MemcacheDB is a server that implements the Memcache network protocol but uses Berkeley DB for storage and HA to replicate the cache state across all the nodes in the cache group. Google Accounts, the user authentication layer for all Google properties, was until recently running Berkeley DB HA. That scaled to a globally distributed system.

It definitely is non-relational.

Original title and link: Is Berkeley DB a NoSQL solution? (NoSQL databases © myNoSQL)


Big Data Analysis at BackType

RWW has a nice post diving into the data flow and the tools used by BackType, a company with only 3 engineers, to deal and analyze large amounts of data.

They’ve invented their own language, Cascalog, to make analysis easy, and their own database, ElephantDB, to simplify delivering the results of their analysis to users. They’ve even written a system to update traditional batch processing of massive data sets with new information in near real-time.

Some highlights:

  • 25 terabytes of compressed binary data, over 100 billion individual records
  • all services and data storage are on Amazon S3 and EC2
  • 60 up to 150 EC2 instances servicing an average of 400 requests/s
  • Clojure and Python as platform languages
  • Hadoop, Cascading and Cascalog are central pieces of BackType’s platform
  • Cascalog, a Clojure-based query language for Hadoop, was created and open sourced by BackType’s engineer Nathan Marz
  • ElephantDB, the storage solution, is a read-only cluster built on top of BerkleyDB files
  • Crawlers place data in Gearman queues for processing and storing

BackType data flow is presented in the following diagram:

BackType data flow

Included below is an interview with Nathan about Cascalog:

@pharkmillups .

Original title and link: Big Data Analysis at BackType (NoSQL databases © myNoSQL)


Another NoSQL Comparison: Evaluation Guide

The requirements were clear:

  • Fast data insertion.
  • Extremely fast random reads on large datasets.
  • Consistent read/write speed across the whole data set.
  • Efficient data storage.
  • Scale well.
  • Easy to maintain.
  • Have a network interface.
  • Stable, of course.

The list of NoSQL databases to be compared: Tokyo Cabinet, BerkleyDB, MemcacheDB, Project Voldemort, Redis, and MongoDB, not so clear.

The methodology to evaluate and the results definitely not clear at all.

NoSQL Comparison Guide / A review of Tokyo Cabinet, Tokyo Tyrant, Berkeley DB, MemcacheDB, Voldemort, Redis, MongoDB

And the conclusion is quite wrong:

Although MongoDB is the solution for most NoSQL use cases, it’s not the only solution for all NoSQL needs.

There were a couple of people asking for more details about my comments on this NoSQL comparison, so here they are:

  1. the initial list of NoSQL databases to be evaluated looks at the first glance a bit random. It includes some not so used solutions (memcachedb), some that are not , while leaving aside others that at least at the high level would correspond to the characteristics of others in the list (Riak, Membase)
  2. another reason for considering the initial choice a bit random is that while scaling is listed as one of the requirements, the only truly scalable in the list would be Project Voldemort. The recently added auto-sharding and replica sets would make MongoDB a candidate too, but a search on the MongoDB group would show that the solution is still young
  3. even if the set of requirements is clear, there’s no indication of what kind of evaluation and how was it performed. Without knowing what and how and it is impossible to consider the results as being relevant.
  4. as Janl was writing about benchmarks, most of the time you are doing it wrong. Creating good, trustworthy, useful, relevant benchmarks is very difficult
  5. .
  6. the matrix lists characteristics that are difficult to measure. And there are no comments on how the thumbs up were given. Examples: what is manageability and how was that measured? Same questions for stability and feature set.
  7. because most of it sounds speculative here are a couple of speculations:
    1. judging by the thumbs up MongoDB received for insertion/random reads for large data set, I can assume that data hasn’t overpassed the available memory. But on the other hand, Redis was dismissed and received less votes due to its “more” in-memory character
    2. Tokyo Cabinet and Redis project activity and community were ranked the same. When was the last release of Tokyo Cabinet?
  8. I’m leaving up to you to decide why the conclusion — “Although MongoDB is the solution for most NoSQL use cases”” is wrong.

Original title and link: Another NoSQL Comparison: Evaluation Guide (NoSQL databases © myNoSQL)


Release: Riak 0.11.0, Defaults to Bitcask storage

Basho team has ☞ announced the release of Riak 0.11.0 which features a couple enhancements and bug fixes. But more importantly the new Riak 0.11.0 is using in-house developed Bitcask storage so replacing the embedded InnoDB store and other previously available options.

As a side note, Chef cookbooks for Riak have also been ☞ updated and Basho also released their internal ☞ benchmark code.

Bitcask has been ☞ announced a while ago as a solution developed to address the following goals:

  • low latency per item read or written
  • high throughput, especially when writing an incoming stream of random items
  • ability to handle datasets much larger than RAM w/o degradation
  • crash friendliness, both in terms of fast recovery and not losing data
  • ease of backup and restore
  • a relatively simple, understandable (and thus supportable) code structure and data format
  • predictable behavior under heavy access load or large volume
  • a license that allowed for easy default use in Riak

Jeff Darcy has some ☞ very good things to say about Bitcask, so I’ve spent some time reading the ☞ technical paper (pdf). While not an expert in either Bitcask or BerkleyDB/Java I have found the top level goals and some of the implementation details quite similar, but I’m pretty sure there are some subtle differences as BerkleyDB is referred to in the paper (nb maybe it was just about the license).

Release: Riak 0.11.0, Defaults to Bitcask storage originally posted on the NoSQL blog: myNoSQL