NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



releases: All content tagged as releases in NoSQL databases and polyglot persistence

Couchbase Single Server Important Improvements

  1. Couchbase Single Server is the CouchDB packaging offered by Couchbase. But I think this is the first time this product came out under this name. At least the first Couchbase Server release didn’t mention it.

  2. Back in December I was speculating that CouchDB could benefit of an internal cache. But the Couchbase team has found other places to improve performance:

    • IO compression for faster effective IO, reduced view generation time and reduced disk usage.
    • Asynchronous write optimizations.
    • New, higher performance and more configurable replicator

    All these improvements are explained in a separate post. Note that for measuring these improvements, the team used a derived version of the Basho’s benchmark—one of the few good NoSQL benchmarks.

  3. Mathias Meyer has mentioned automatic compaction in his CouchDB post 1.0 roadmap. It is now available in the Couchbase Single Server 2.0.

  4. Couchbase Single Server 2.0 adds experimental Coffeescript support.

    The addition of CoffeeScript allows you to write your view definitions, validation functions, update handles and changes filters in CoffeeScript instead of JavaScript. This allows one to develop views sans-braces and semicolons. It’s just as fast, and much easier on the eyes.

    For me CoffeeScript is just a JavaScript DSL. I’m not sure I like generic DSLs, but I’m no JavaScript expert.

  5. It is great to see that Couchbase people continue to listen to the community and don’t drive their decisions by business objectives only—it might be the case that business objectives and community suggestions overlapped.

  6. Last but not least, most of these changes have already been contributed back to the Apache CouchDB. This is a good sign that Couchbase team will continue to support the original CouchDB project.

Original title and link: Couchbase Single Server Important Improvements (NoSQL database©myNoSQL)


RethinkDB Launches 1.0 Version With Memcached Compatibility Only

Just as I speculated , RethinkDB has finally launched the 1.0 version with Memcached compatibility only. Jason Kincaid (Techcrunch) writes:

RethinkDB has just launched its 1.0 release to the public, and it’s offering a product geared toward NoSQL installations — and it will work on SSDs, traditional drives, and cloud-based services like AWS. The startup has also moved away from MySQL and now fully supports Memcached.

But RethinkDB is not the first product providing a Memcached compatible (disk) persistent storage engine. One year ago Membase was launched promising not only a persistent Memcached compatible solution, but also elastic scalability.

RethinkDB has also published a performance report (PDF) demonstrating RethinkDB speed compared to Membase and MySQL. But if I’m reading those numbers correctly, while RethinkDB leads the majority of query-per-second (QPS) benchmarks, MySQL is consistently showing better latency numbers (which is kind of weird). For a strong durability scenario, the benchmark shows MySQL delivering 2x QPS compared to RethinkDB.

Another interesting aspect of the RethinkDB 1.0 release is the licensing model —which I don’t fully get:

RethinkDB Basic is currently identical in feature-set to RethinkDB Premium and Enterprise. However, the paid versions of RethinkDB include phone and email support, access to all future updates, and volume licensing options.

Or spelled out on the TechCrunch post :

Akhmechet says that the free version will get security updates, but that it won’t necessarily receive new features in the future, whereas the premium version will.

Original title and link: RethinkDB Launches 1.0 Version With Memcached Compatibility Only (NoSQL databases © myNoSQL)

Apache CouchDB 1.1.0 Released: Native SSL, HTTP Range Requests

Robert Newson just announced a new version of Apache CouchDB, 1.1.0, featuring native SSL, HTTP range requests, and a other features and improvements listed below:

  • Native SSL support.
  • Added support for HTTP range requests for attachments.
  • Added built-in filters for _changes: _doc_ids and _design.
  • Added configuration option for TCP_NODELAY aka “Nagle”.
  • Allow wildcards in vhosts definitions.
  • More granular ETag support for views.
  • More flexible URL rewriter.
  • Added OS Process module to manage daemons outside of CouchDB.
  • Added HTTP Proxy handler for more scalable externals.
  • Added _replicator database to manage replications.
  • Multiple micro-optimizations when reading data.
  • Added CommonJS support to map functions.
  • Added stale=update_after query option that triggers a view update after returning a stale=ok response.
  • More explicit error messages when it’s not possible to access a file due to lack of permissions.
  • Added a “change password”-feature to Futon.

While all these sound interesting, many of the items listed in this user suggested post 1.0 CouchDB roadmap didn’t make it in yet.

Original title and link: Apache CouchDB 1.1.0 Released: Native SSL, HTTP Range Requests (NoSQL databases © myNoSQL)

Cassandra 0.8 Featuring a Query Language and Distributed Counters

Cassandra 0.8.0 was announced today and it brings quite a few exciting new features:

  • Cassandra Query Language

    Eric Evans:

    How would you feel about a query language? Good, because now Cassandra is not only NoSQL, it’s MoSQL. 0.8 is the debut of CQL (Cassandra Query Language), an SQL-alike query language. We already have language drivers for Java (JDBC), Twisted (txcql), Python (DBAPI2), and Node.js, and more are on the way!

    The best places to learn about Cassandra Query Language (CQL) are the official doc[1] and Courtney Robinson’s slides below:

    There’s also a (low quality) video on skillsmatter website.

  • distributed counters

    Eric Evans again:

    Cassandra also has distributed counters now. With counters, you can count stuff, and counting stuff rocks.

  • intranode traffic encryption

A rolling restart will get your cluster up and running with Cassandra 0.8.0

  1. A more readable form of the document can be found here.  

Original title and link: Cassandra 0.8 Featuring a Query Language and Distributed Counters (NoSQL databases © myNoSQL)

Riak and Riak Search 0.14.2: Patch-Level Releases

Basho released a minor update for both Riak and Riak Search. Release notes for Riak and Riak Search are available at the following links: Riak 0.14.2 and Riak Search 0.14.2.

Original title and link: Riak and Riak Search 0.14.2: Patch-Level Releases (NoSQL databases © myNoSQL)

Apache Hive 0.7.0: Security and Performance

Long, impressive list of new features (notably authorization and authentication support) and improvements in Apache Hive 0.7.0 released end of March.

Original title and link: Apache Hive 0.7.0: Security and Performance (NoSQL databases © myNoSQL)

Couchbase: First Version End of July?

This is pure speculation on my side based on the program of the first Couchbase developer conference.

Keep in mind that Couchbase first release was just a build of CouchDB including GeoCouch. And end of July would be roughly 6 months since CouchOne and Membase merger.

Original title and link: Couchbase: First Version End of July? (NoSQL databases © myNoSQL)

Introducing AODBM

What’s the point in writing a new dbm style DBMS when so many exist already? Surely there is nothing new that anyone can add to this long line of projects?

Having done exactly that, I’m going to dedicate the rest of this post to convincing you that I haven’t lost my sanity.

I became interested in databases about six months ago when I started studying databases that spanned more than one machine. I found that it seems to be the accepted norm that if one ACID compliant database server can’t handle the number of writes that you need it to then you’ll have to loosen your reliability guarantees.

This can manifest in a number of ways:

  • Amongst the most frequent is manually sharding your database. Using two or more databases is a sure fire way to loose your atomicity and isolation guarantees for cross database operations.

  • Forsaking durability is common. It is the founding principle of database systems like Redis.

  • Settling for eventual consistency is the latest craze. CouchDB is heading the wave.

    Of course, this isn’t always a bad thing. CouchDB, for instance, was designed for environments where partitioning is common and in these circumstances eventual consistency is a logical choice. ACID guarantees aren’t always a deal-breaker and when you can afford to give them up in order to gain performance, you definitely should.

However, I couldn’t find a database implementation that would allow me to scale without loosing multi-record atomic updates, durability or immediate consistency. I set about designing my perfect DBMS.

Following several false starts, I settled upon a design. AODBM (Append Only Database Manager) is the storage backend for this future DBMS. However, AODBM has merit on its own! It is a simple, hackable, open-source implementation of an append-only B+Tree written in C. It uses MVCC. It has an easy to use API (for C and Python) and it is designed to be fast for heavy, highly-concurrent loads.

In practice, this means that if you find a need to update a key-value store from many threads and you need ACID compliance, then I have a solution for you.

AODBM is currently in (early) beta and I am pushing for the first release. Early performance figures look promising; on my five year old desktop I can do 8500 insertions per second on a single thread from Python. I have implemented get, set and delete; iteration is on it’s way. I’m hoping to release the first version by the end of this month.

I hope I’ve succeeded in persuading you that my sanity is still intact. Have fun.

Daniel Waterworth

The AODBM project source code is on GitHub and it has a page on SourceForge.

Original title and link: Introducing AODBM (NoSQL databases © myNoSQL)

Emil Eifrem about Neo4j 1.3 and the Neo4j GPL Community Edition

Last week, Neo Technology has released the 1.3 version of their graph database Neo4j. The technical aspects of the release have been covered in this blog post. Briefly:

  • support for large data sets and optimizations at the storage level
  • improved web admin tool
  • API cleanup

But the most exciting aspect of Neo4j 1.3 is the availability of a GPL version of the graph database. Emil Eifrem has covered it here:

Today marks a new major milestone for Neo4j: we’re making the core graph database - Neo4j Community - available under the same proven open source license as MySQL, the GNU General Public License (GPL).

That means that in every scenario where you can use MySQL for free, you can now also use Neo4j Community for free.

I had the chance to talk to Emil and he has been kind enough to answer my questions.

Alex: It took Neo Technology almost 10 years to release Neo4j 1.0. Since then things seem to have moved faster and faster. What changed leading to this fast paced release cycles?

Emil: The main reason is that our community has just reached a critical mass. This means that the feedback loop is faster, feature requests are more frequent, bug fixes and patches are better. It’s a faster and more virtuous cycle. On top of that, our customer traction the past year has allowed us to grow the full time in-house development team.

Alex: How would you summarize the release of Neo4j 1.3?

Emil: By far the most important aspect of this release is the license change to the GPL for Neo4j Community. Secondly, I’d put the support for really large stores (100+ billion of primitives). And finally, I’d love to give a shout out to the new interactive graph visualization in the web UI.

Neo4j web admin

Alex:  3 products and 3 licenses. Moreover Neo4j Community edition comes with a GPLv3 license. As you know I’ve always said that graph databases market is missing a more open license. So what made you change your mind about the licensing model?

Emil: The GPL is the best license for getting Neo4j in the hands of developers worldwide. It’s a proven model to get databases in the hands of developers while protecting an OEM revenue stream, so we figured why reinvent the wheel? The world deserves a graph database under the GPL.

Alex: Could you please clarify a bit the differences between the 3 products and their licensing models?

Emil: Sure, Neo4j Community is what most people will use. It’s a fully functional, robust and mature graph database. It’s available under the GPL like MySQL, which means that it can be used for free in all “end user” scenarios (for example to back a webapp). For OEM scenarios (i.e. it’s embedded in a product that ships to end users) then the enclosing product must be open source.

Neo4j Advanced adds monitoring and management and couples that with commercial support. It’s available under the AGPL or a commercial license.

Neo4j Enterprise adds high availability, i.e. the ability to automatically and transparently replicate the graph across many instances, and enterprise-grade 24/7 commercial support. It’s available under the AGPL or a commercial license.

Alex: In your post you are saying that “the graph database opportunity is at least as big as the MySQL opportunity”. Could you please expand on this?

Emil: Absolutely. First off, information is exploding in both volume and complexity and in many cases relational databases can’t keep up. For example, a lot of big installations have massive problems with low-latency queries due to joins.

Secondly, business requirements are changing. For example, we have high requirements on the freshness of information (“realtime”) where a retail store may want to get a coupon recommendation while the customer is still in the store, not 24 hour later from the big corporate data warehouse.

Some of the largest web properties in the world were hit early by these two forces, and this catalyzed NoSQL. Now ask yourself this: of these two trends (information volume / complexity and realtime business requirements), in which direction is the world moving? I think the answer is clear and over time, most database deployments in the world will face requirements similar to the high-end web properties of today. In order to deliver business value, IT departments must then be equally committed to SQL and NoSQL.

I think of the current NoSQL landscape graph databases have the opportunity to solve the most problems, for most developers, in most situations. A graph database is incredibly horizontally applicable and it’s useful across a wide range of problem spaces. In a world where most applications make use of both SQL and NOSQL, graph databases have the opportunity to be as frequently used as MySQL is today.

That’s why I said that the graph database opportunity is at least as big as the MySQL opportunity.

Alex: Could you enumerate some not so common use cases for Neo4j?

Emil: No! If they’re not so common I probably don’t know them. But here are three relatively unknown use cases for graph databases:

  • Cloud Management: Neo4j is used today to back management and operations on some of the largest private cloud deployments in the world.
  • Network Management: In the telecom and datacom world, management of resources in networks has long been a huge problem. It lends itself incredibly well to graph modeling.
  • Master Data Management (MDM): This is a very enterprise-y use case, but relevant for all big companies in the world. MDM stores the master data for a big company and that data is usually very complex and dynamic and gives huge join-problems if you put it in a relational database. That kind of dataset is a great fit for a graph database.

Alex: I confess that I was expecting to see a more open license available in the graph databases market. So I’m happy to see this happening. Also I’m convinced that it is a very smart move for both the future of graph databases and your company. Thanks a lot Emil.

Original title and link: Emil Eifrem about Neo4j 1.3 and the Neo4j GPL Community Edition (NoSQL databases © myNoSQL)

NoSQL Databases Updates: MongoDB and Redis

Two minor updates for MongoDB (1.8.1) and Redis (2.2.4). From the official announcement for MongoDB 1.8.1:

  • sharding migrate fix when moving larger chunks
  • durability fix with background indexing
  • fixed mongos concurrency issue with many incoming connections

Redis 2.2.4 has introduced a new command OBJECT used to inspect the internals of Redis Objects associated with keys. It is documented here. The release contains also a couple of bug fixes. Complete release notes can be found here.

Original title and link: NoSQL Databases Updates: MongoDB and Redis (NoSQL databases © myNoSQL)

Couchbase Server: What is This First Release?

You’ve probably read everywhere about the Couchbase Server first release.

But what is this first release of Couchbase Server?

J. Chris Anderson[1] has been kind enough to answer this question:

  1. Today’s Couchbase server release is more or less just a clean build of the current Apache CouchDB including GeoCouch.

    In the future we may release code before it makes it through the full Apache release process, but before we do that we want to get our Q/A infrastructure up and running.

    The current version does not include yet any of the Membase elastic features. Nor does it support yet memcached protocol.

  2. In the near future there will be a release offering CouchDB Map Reduce views with the Membase speed and scalability, and memcached API.

    This is a huge value for existing Membase and memcached users, as now they will be able to query what they have stored, not just retrieve it by key.

  3. In the long run we will have a combined product that supports the CouchDB HTTP API as well as Membase’s memcapable API.

    Currently Membase has a strong set of clustering tools, but it is a plain key value store. By adding CouchDB query-ability to it, Membase users see value. A subset of CouchDB users will see value in our initial release (worry free scalability), but it won’t be until the integration work is done later this year that we offer elastic support to the full HTTP CouchDB API.

About Couchbase Server

  1. J.Chris Anderson: CouchDB committer, Couchone founder, Chief architect of mobile at Couchbase, @jchris  

  2. James Phillips: Co-founder and SVP Products for Couchbase  

Original title and link: Couchbase Server: What is This First Release? (NoSQL databases © myNoSQL)

New OrientDB Release: new memory model, new graph api, much more stable

In its way towards the 1.0 version, OrientDB announced a new release featuring:

  • Brand new memory model with level-1 and level-2 caches (Issue #242)
  • SQL prepared statement (Issue #49)
  • SQL Projections with the support of links (Issue #15)
  • Graphical editor for documents in OrientDB Studio app (Issue #217)
  • Graph representation in OrientDB Studio app
  • Support for JPA annotation by the Object Database interface (Issue #102)
  • Smart Console under bash: history, auto completition, etc. (Issue #228)
  • Operations to work with GEO-spatial points (Issue #182)
  • @rid support in SQL UPDATE statement (Issue #72)
  • Range queries against Indexes (Issue #231)
  • 100% support of TinkerPop Blueprints 0.5

Regrettably the same comment thread shows that there are still some problems handling large amounts of data in OrientDB.

Original title and link: New OrientDB Release: new memory model, new graph api, much more stable (NoSQL databases © myNoSQL)