NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



releases: All content tagged as releases in NoSQL databases and polyglot persistence

Redis 2.2: An Optimization Release

Salvatore Sanfilippo summarizes the new Redis release in the Hacker News thread:

2.2 was exactly an “optimization” release, to bring what we had at a better level of maturity.

Basically we’ll try hard to don’t add things to the API in the next releases, but just to open to new use cases changing the “backend” part, with cluster support for large fault tolerant deployment, and with diskstore for “bigdata”.

However there are a few important new things in Redis 2.2 from the point of view of the features, I think the main ones are:

  • non blocking replication, so that now slaves are able to serve data even when trying to resync with the master.
  • Check and Set with WATCH.
  • Write operations against keys with an expire set.
  • LRU eviction of keys in ‘maxmemory’ mode.
  • Support for SETBIT/GETBIT/SETRANGE/GETRANGE, basically this turn the string data type into a random access array.

The more verbose release notes are available here.

Redis 2.2 is a drop in replacement for the previous 2.0 version. Though there are some changes in the return values for edge cases.

Update: Minutes ago, Salvatore has announced that there will be a new release today to fix an urgent bug in SPOP.

Update: Redis 2.2.1 is out. You can get it from here.

Original title and link: Redis 2.2: An Optimization Release (NoSQL databases © myNoSQL)

Cassandra Releases: Two Minor Upgrades

Cassandra has pushed out two new minor releases, first of them, 0.7.1, featuring a couple of performance improvements and new features, and the second, 0.7.2, fixing a critical bug in the 0.7.1 release.

Cassandra 0.7.1 Performance Improvements

  • Disk writes and sequential scans avoid polluting page cache (requires JNA to be enabled)
  • Cassandra performs writes efficiently across datacenters by sending a single copy of the mutation and having the recipient forward that to other replicas in its datacenter.
  • Improved network buffering
  • Reduced lock contention on memtable flush
  • Optimized supercolumn deserialization
  • Zero-copy reads from mmapped sstable files
  • Explicitly set higher JVM new generation size
  • Reduced i/o contention during saving of caches

Cassandra 0.7.1 New Features

  • added flush_largest_memtables_at and reduce_cache_sizes_at options to cassandra.yaml as an escape valve for memory pressure
  • added option to specify -Dcassandra.join_ring=false on startup to allow “warm spare” nodes or performing JMX maintenance before joining the ring

The complete list of changes can be found here:

Original title and link: Cassandra Releases: Two Minor Upgrades (NoSQL databases © myNoSQL)

InfiniteGraph 1.1 Released with New Indexing Options

A new version of InfiniteGraph, the graph database from Objectivity, has been released with a new indexing solution offering improved performance for indexing, data imports, and lookups.

InfiniteGraph’s graph processing strengths are well suited to many applications, including those in intelligence, internet systems and services around social media, location based networking and personalization, discovering networks of people that have business, influence or other value, analysis of financial transactions to detect and prevent fraud, and in adding new capabilities to enterprise business intelligence (BI) systems.

For next releases, it sounds like a lot of work is already scheduled, InfiniteGraph’s team planning to focus on:

  • improving data import
  • parallel ingest capabilities leveraging the distributed processing strengths of InfiniteGraph
  • integrating with the open source Blueprints project
  • faster graph processing
  • range querying and geo-hashed indexes
  • options to relax InfiniteGraph’s fully ACID compliant consistency model

Things in the graph database space are getting more exciting by the day. Unfortunately compared to the other NoSQL databases categories, the top graph databases are all commercial products and I think this can be notice when looking at adoption rates.

Original title and link: InfiniteGraph 1.1 Released with New Indexing Options (NoSQL databases © myNoSQL)


CouchDB 1.0.2: 3rd is Lucky

You’d assume that the more mature a project gets the less interesting a point release would be. But this doesn’t seem to apply to the NoSQL databases, where with each new release we are seeing new exciting features. Only from this month: Neo4j 1.2, Cassandra 0.7, HBase 0.90.0, and upcoming MongoDB 1.8.

After two attempts to announce CouchDB 1.0.2 back in December, both stopped in a very last moment by issues that the community considered mandatory to fix before the release, today the CouchDB people are finally announcing the availability of CouchDB 1.0.2.

You can find the list of changes in CouchDB 1.0.2 here:

  • Significantly higher read and write throughput against database and view index files.
  • Reduce lengthy stack traces.
  • Allow reduce=false parameter in map-only views.
  • Fix databases forgetting their validation function after compaction.
  • Fix occasional timeout errors after successfully compacting large databases.
  • Fix ocassional error when writing to a database that has just been compacted.
  • Fix occasional timeout errors on systems with slow or heavily loaded IO.
  • Fix for OOME when compactions include documents with many conflicts.
  • Fix for missing attachment compression when MIME types included parameters.
  • Preserve purge metadata during compaction to avoid spurious view rebuilds.
  • Fix spurious conflicts introduced when uploading an attachment after a doc has been in a conflict. See COUCHDB-902 for details.
  • Fix for frequently edited documents in multi-master deployments being duplicated in _changes and _all_docs. See COUCHDDB-968 for details on how to repair.
  • Fix authenticated replication (with HTTP basic auth) of design documents with attachments.
  • Various fixes to make replication more resilient for edge-cases.
  • Don’t trigger view updates when requesting _design/doc/_info.
  • Documents are now sealed before being passed to map functions.
  • Force view compaction failure when duplicated document data exists. When this error is seen in the logs users should rebuild their views from scratch to fix the issue. See COUCHDB-999 for details.

Third attempt is always lucky! Congrats!

Original title and link: CouchDB 1.0.2: (NoSQL databases © myNoSQL)

HBase 0.90.0 Released: Over 1000 Fixes and Improvements

As far as I know this is the first major HBase release since becoming a top level Apache project (this using a new versioning too). Until now I thought that Hadoop 0.21.0 had the longest list of fixes, improvements, and new features, but I guess HBase 0.90.0 tops that with over 1000 tracked tickets.

I bet there are quite a few exciting things among these over 1000 tickets, but for now I’d suggest taking a look at the slides below from HUG11:

From a slides, a quick what’s new in HBase 0.90.0:

  • durability and stability
    • HDFS appends + WAL improvements
  • master rewrite
    • cleanup of master, move region transitions to ZK
  • inter-cluster/inter-DB replication
  • Bloom filters
  • bul loading improvements
  • performance improvements
  • peripheral improvements: REST/Stargate, Shell, Avro,
  • HBaseFSCK

Note: HBase coprocessors are scheduled for 0.92

On a negative side, HBase 0.90.0 doesn’t run with Hadoop 0.21.0 nor with Hadoop TRUNK, the only compatible Hadoop version being 0.20.x. The release notes for HBase 0.90 release candidates are mentioning that HBase will lose data unless running on an Hadoop HDFS 0.20.x that has a durable sync. Though there is a Hadoop branch containing the necessary changes, but you’ll have to build that yourself. Update: see Nicolas’ comment below about Hadoop 0.21 being just a development version.

Congrats to the HBase team for their first release as top Apache project!

I didn’t know about the The Apache HBase book. But I’m eagerly awaiting my copy of Lars George’s HBase: The definitive guide.

Update: the official announcement went out

@ saintstack and @ squarecog

Original title and link: HBase 0.90.0 Released: Over 1000 Fixes and Improvements (NoSQL databases © myNoSQL)

Cassandra 0.7: Large Row Support

Something I’ve missed from the what’s new in Cassandra 0.7:

The other big new feature is large row support for up to two billion columns per row. In previous Cassandra releases, there was a limit where a single column value could not be larger than 2 GB.

But number of columns vs size of columns data is quite different…

Original title and link: Cassandra 0.7: Large Row Support (NoSQL databases © myNoSQL)

Neo4j 1.2: What’s New

Neo4j 1.2 was released on December 30th. Now that’s a very weird time to make a major release. But according to the Neo4j roadmap and milestone reports, Neo4j 1.2 brings quite a few major changes and improvements.

First major shift in Neo4j direction is that it is now available as a RESTful server. Even if a Neo4j REST API existed before, this shift from promoting an embedded graph database to a full blown RESTful graph database was firstly announced with the first 1.2 milestone. As someone suggesting this change, I cheer the decision.

The second major feature is the high availability Neo4j cluster. Most of the existing graph databases have started their life as embedded storage solutions. Then a few of them have seen the light of becoming server-based storage solutions. But with that also came questions related to availability and scalability.

Starting with this version, Neo4j offers the option of setting up a high availability cluster and this is a major step forward for graph databases. This is still a first version where writes are slower, the cluster is not elastic, and there are limitations at the distributed transaction layer.

Scaling graph databases remains a very complicated problem to be solved. Darren Wood’s1 presentation covers some of the challenges of distributed graph databases

Neo4j 1.2 features a couple of more goodies like a smaller footprint kernel and an automatic JMX enabled monitoring and management component.

The original announcement covers more details about this major Neo4j new version. The only missing piece from this release and announcement is a document describing Neo4j API changes. But that should not stop you from trying it out.

  1. Darren Wood: Architect at InfiniteGraph/Objectivity  

Original title and link: Neo4j 1.2: What’s New (NoSQL databases © myNoSQL)

Apache Pig 0.8: What is New

Dmitriy Ryaboy1 has a guest post on Cloudera blog covering the new features in Apache Pig 0.8.


  • Support for user defined functions (UDF) in scripting languages
  • Generic UDFs: allows invocation of static java methods
  • PigUnit: as the name suggests, a testing tool for Pig scripts
  • PigStats: once again the name should give you a hint of what it does: better visibility into Pig job through a series of stats, XML-based metadata injected into Map-Reduce jobs, and listeners for the Pig process
  • Scalar values: simplifying access to single-row relations
  • possibility to start a monitoring thread for long running executions
  • HBaseStorage: works with HBase 0.20 releases only
  • flow allows custom Map-Reduce jobs
  • automatic merge of small files
  • custom partitioners

The Pig 0.8 release includes a large number of bug fixes and optimizations, but at the core it is a feature release. It’s been in the works for almost a full year and the amount of time spent on 0.8 really shows.

You can also check Dmitriy’s presentations about the NoSQL ecosystem at Twitter: Twitter, Pig, and HBase and HBase and Pig: The Hadoop ecosystem at Twitter

  1. Dmitriy Ryaboy: Twitter engineer, @squarecog  

Original title and link: Apache Pig 0.8: What is New (NoSQL databases © myNoSQL)


Cassandra 0.7 Released, Lots of Goodies in the Box

The much awaited new version of Cassandra has been quietly release a couple of days ago. As mentioned in Cassandra 2010 in review, this version brings a lot of interesting new features:

  • memory efficient compactions
  • online schema changes
  • secondary indexes
  • improved performance for reads
  • upgraded Thrift

The list of updates is too long, so for start I recommend Gary Dusbabek’s nice post summarizing most important new features.

Then on Riptano’s blog, there’s a series of articles getting into the details of these features:

I guess the only major feature that was talked about and didn’t get in this release is the distributed counters, but that’s already in the Cassandra trunk, so sooner than later users will get it.

Now you can head to the download page and start upgrading your Cassandra cluster.

Original title and link: Cassandra 0.7 Released, Lots of Goodies in the Box (NoSQL databases © myNoSQL)

Riak 0.14 Released with MapReduce Enhancements, Cluster and Node Debugging

I’ve been waiting for the first NoSQL release to post the first time in 2011. So thanks to Basho’s announcement of Riak 0.14, myNoSQL is back officially.

Riak 0.14 is featuring those Map/Reduce improvements I’ve already written about[1] and quite a few other interesting features and improvements:

  • Cluster and node debugging: The ability to monitor and debug a running Riak cluster received some substantial enhancements in 0.14.
  • Windowed merges for Bitcask: Bitcask performs periodic merges over all non-active files to compact the space being occupied by old versions of stored data. In certain situations this can cause some memory and CPU spikes on the Riak node where the merge is taking place. To that end, we’ve added the ability to specify when Bitcask will perform merges.
  • Support for HTTPS and multiple HTTP IPs
  • REST API for listing buckets

Complete release notes available here.

  1. As a side note, Kevin Smith, the Basho engineer that presented these enhancements first has moved to work for Heroku.  ()

Original title and link: Riak 0.14 Released with MapReduce Enhancements, Cluster and Node Debugging (NoSQL databases © myNoSQL)


Now official: Spring Data Riak Support Reaches Milestone 1

Shortly after announcing Redis support in Spring Data and just days after Grails got support for Riak, Spring Data is announcing the 1st milestone of Riak support. The same Costin Leau:

The features in 1.0.0 M1 include:

  • Generified RiakTemplate for exception translation, serialization, and data access
  • Built-in HTTP REST client based on Spring 3.0 RestTemplate
  • and Spring IO resource abstractions for reading/writing streams
  • subclass that represents a Riak resource

Looks like the Springframework NoSQL train is in full movement now.

Original title and link: Now official: Spring Data Riak Support Reaches Milestone 1 (NoSQL databases © myNoSQL)


Cascading 1.2 Released

A bit late with the post, but here is Cascading 1.2:

This release features many performance and usability enhancements while remaining backwards compatible with 1.0 and 1.1. Specifically:

  • Performance optimizations during grouping (StreamComparator)
  • Composable map-side partial aggregations (AggregateBy)
  • Native Riffle support for non-Cascading (or nested iterative Cascading) processes (ProcessFlow and Riffle)

Cascading is part of the extensive Hadoop tooling ecosystem.

Original title and link: Cascading 1.2 Released (NoSQL databases © myNoSQL)