NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



NoSQL releases: All content tagged as NoSQL releases in NoSQL databases and polyglot persistence

Latest NoSQL Releases: HBase 0.92, DataStax Community Server, Hortonworks Data Platform, SolrCloud

Just a quick roundup of the latest releases and announcements.

Hortonworks Data Platform (HDP) version 2

HDP v2 will include:

  • NextGen MapReduce architecture
  • HDFS NameNode HA
  • HDFS Federation
  • up-to-date HCatalog, HBase, Hive, Pig

According to the announcement:

In order to avoid confusion, let me explain the two versions of HDP:

  • HDP v1 is based upon Apache Hadoop 1.0 (which comes from the 0.20.205 branch). It the most stable, production-ready version of Hadoop that is currently found in many large enterprise deployments. HDP v1 is currently available as a private technology preview. A public technology preview will be made available later this quarter.
  • HDP v2 is based upon Apache Hadoop 0.23, which includes the next generation advancements mentioned above. It’s an important step forward in terms of scalability, performance, high availability and data integrity. A technology preview will also be made publicly available later in Q1.

SolrCloud Completes Phase 2

Mark Miller about the completion of phase 2:

The second phase of SolrCloud has been in full swing for a couple of months now and it looks like we are going to be able to commit this work to trunk very soon! In Phase1 we built on top of Solr’s distributed search capabilities and added cluster state, central config, and built-in read side fault tolerance. Phase 2 is even more ambitious and focuses on the write side. We are talking full-blown fault tolerance for reads and writes, near real-time support, real-time GET, true single node durability, optimistic locking, cluster elasticity, improvements to the Phase 1 features, and more.

Not there yet, but it’s coming.

DataStax Community Server 1.0.7

A new release of DataStax’s distribution of Cassandra incorporating Cassandra 1.0.7

HBase 0.92

Don’t let the version number trick you. This is an important release for HBase featuring:

  • coprocessors
  • security
  • new (self-migrating) file format
  • AWS improvements: EBS support, building a HA cluster

The list of new features, improvements, and bug fixes in HBase 0.92 is impressive. But the highlight of this release is in my opinion HBase coprocessors (Jira entry HBASE-200).

I’m leaving you with Andrew Purtell’s slides about HBase Coprocessors:

Couchbase Server 1.8 Released, Rebranding and Some Improvements in Cluster Rebalancing

Couchbase Server 1.8 replaces Membase Server 1.7 as our “flagship” database offering. In addition to the obvious rebranding, we’ve made substantial improvements in the cluster rebalancing process and fixed a number of nagging issues in 1.7.

In case you feel lost with which Couchbase products are which, read my 5 bullet points explanation.

Original title and link: Couchbase Server 1.8 Released, Rebranding and Some Improvements in Cluster Rebalancing (NoSQL database©myNoSQL)


Bug Fix Release Riak 1.0.3 Available for Download

No mentions of any critical bugs in the announcement, but it is almost always a good idea to stay up to date.

Original title and link: Bug Fix Release Riak 1.0.3 Available for Download (NoSQL database©myNoSQL)

DataFu: Open Source Apache Pig UDFs by LinkedIn

Here’s a taste of what you can do with DataFu:

  • Run PageRank on a large number of independent graphs.
  • Perform set operations such as intersect and union.
  • Compute the haversine distance between two points on the globe.
  • Create an assertion on input data which will cause the script to fail if the condition is not met.
  • Perform various operations on bags such as append a tuple, prepend a tuple, concatenate bags, generate unordered pairs, etc.

I’m starting to notice a pattern here. Twitter is open sourcing pretty much everything they are doing related to data storage. Yahoo (now Hortonworks) and Cloudera are the forces behind the open source Hadoop and the tools to bring data to Hadoop. And LinkedIn is starting to open source the tools they are using on top of Hadoop to analyze big data.

What is interesting about this is that you might not get the most polished tools, but they definitely are battle tested.

Original title and link: DataFu: Open Source Apache Pig UDFs by LinkedIn (NoSQL database©myNoSQL)


Apache Hadoop Versions

Cloudera’s Charles Zedlewski gives a great explanation of the Apache Hadoop versioning and the major features included in each production-ready version:

Apache Hadoop Version

Credit Cloudera. Click to see larger version

Patrick Durusau suggests an even shorter explanation:

There has been some confusion over the jump from 0.2* versions of Hadoop to a release of Hadoop 1.0 at Apache.

You have not missed various 0.3* and later releases!

People familiar with open source projects know that most of the time versions do not carry any meta-information about the maturity of a project. But the higher you go into an enterprise hierarchy the more important software versions are considered. So sometimes it is a good idea to bump up the numbers to something that is perceived by everyone as a mature version. Adoption is sometimes also about perception.

Original title and link: Apache Hadoop Versions (NoSQL database©myNoSQL)

Last NoSQL Releases in 2011: MongoDB, Hive, ZooKeeper, Whirr, HBase, Redis, and Hadoop 1.0.0

Let’s start the year with a quick review of the latest releases that happened in December. Make sure that you scroll to the end as there are quite a few important ones.

MongoDB 2.0.2

Announced on Dec.15th, MongoDB 2.0.2 is a bug fix release:

  • Hit config server only once per mongos on meta data change to not overwhelm
  • Removed unnecessary connection close and open between mongos and mongod after getLastError
  • Replica set primaries close all sockets on stepDown()
  • Do not require authentication for the buildInfo command
  • scons option for using system libraries

Apache Hive 0.8.0

Apache Hive 0.8.0 came out on Dec.19th. The list of new features, improvements, and bug fixes is extremely long.

Just as a side note, who came out with the idea of having a Hive fans’ page on Facebook?

Apache ZooKeeper 3.4.2

ZooKeeper 3.4.0 has been followed up shortly by two new minor version updates fixing some critical bugs. The list of issues fixed in ZooKeeper 3.4.1 can be found here and for ZooKeeper 3.4.2 the 2 fixed bugs are listed here.

As with ZooKeeper 3.4.0, these versions are not yet production ready.

Apache Whirr 0.7.0

Apache Whirr 0.7.0 has been released on Dec.21st featuring 56 improvements and bug fixes including support for Puppet & Chef, and Mahout and Ganglia as a service. The complete list can be found here.

Some more details about Whirr 0.7.0 can be found here.

Apache HBase 0.90.5

Released Dec.23rd, HBase 0.90.5 packs 81 bug fixes. The complete list can be found here.

Redis 2.4.5

Redis 2.4.5 was released on Dec.23rd and provides 4 bug fixes:

  • [BUGFIX] Fixed a ZUNIONSTORE/ZINTERSTORE bug that can cause a NaN to be inserted as a sorted set element score. This happens when one of the elements has +inf/-inf score and the weight used is 0.
  • [BUGFIX] Fixed memory leak in CLIENT INFO.
  • [BUGFIX] Fixed a non critical SORT bug (Issue 224).
  • [BUGFIX] Fixed a replication bug: now the timeout configuration is respected during the connection with the master.
  • --quiet option implemented in the Redis test.

Last but definitely one of the most important announcements that came in December:

Hadoop 1.0.0

Based on the 0.20-security code line, Hadoop 1.0.0 was announced on Dec.29. This release includes support for:

  • HBase (append/hsynch/hflush) and Security
  • Webhdfs (with full support for security)
  • Performance enhanced access to local files for HBase
  • Other performance enhancements, bug fixes, and features
  • All version 0.20.205 and prior 0.20.2xx features

Complete release notes are available here.

Stéphane Fréchette, Ryan Slobojan, Duane Moore, Arun C. Murthy

And with this we are ready for 2012.

Original title and link: Last NoSQL Releases in 2011: MongoDB, Hive, ZooKeeper, Whirr, HBase, Redis, and Hadoop 1.0.0 (NoSQL database©myNoSQL)

Apache ZooKeeper 3.4.0 Released to Be Followed Soon by Production-Ready Version

Apache ZooKeeper, the high-performance coordination service exposing services like naming, configuration management, synchronization, etc. for distributed applications, has reached version 3.4.0.

Even if the official announcement was laconic, ZooKeeper 3.4.0 features over 150 fixes.

The most important ones are summarized by Patrick Hunt in this Cloudera blog post:

  • ZooKeeper 3.3.3 clients are compatible with 3.4.0 servers
  • Native Windows version of C client
  • Support Kerberos authentication of clients
  • Support Kerberos authentication of clients
  • Improved REST Interface
  • Existing monitoring support has been extended through the introduction of a new ‘mntr’ 4 letter word
  • Add tools and recipes for monitoring as a contrib
  • Web-based Administrative Interface
  • Automating log and snapshot cleaning
  • Add logging/stats to identify production deployment issues
  • Support for building RPM and DEB packages

Something to keep in mind though: ZooKeeper 3.4.0 is not production ready yet. After extensive testing, it will be followed soon by a minor release that will be production-ready.

Original title and link: Apache ZooKeeper 3.4.0 Released to Be Followed Soon by Production-Ready Version (NoSQL database©myNoSQL)

Redis 2.4.4 Released Fixes Potential Issue in Jemalloc

A new version of Redis is available for download and it is a recommended upgrade for all users as it addresses a potentially serious issue in jemalloc.

The complete list of changes:

  • [BUGFIX] jemalloc upgraded to version 2.2.5, previous versions had a potentially serious issue when allocating big memory areas, something that Redis actually does. However we never received bug reports that appear to be caused by jemalloc.
  • [BUGFIX] DISCARD now clears DIRTY_CAS flag in the client. Now the next transaction will not fail if the previous transaction used WATCH and the key was touched.
  • CLIENT LIST output modified to include the last command executed by clients.
  • Better bug report on crash.
  • Protocol errors are now logged for loglevel >= verbose.
  • Two new INFO fields related to AOF, that can be useful when investigating Redis issues.

Original title and link: Redis 2.4.4 Released Fixes Potential Issue in Jemalloc (NoSQL database©myNoSQL)

Riak 0.14 Released with MapReduce Enhancements, Cluster and Node Debugging

I’ve been waiting for the first NoSQL release to post the first time in 2011. So thanks to Basho’s announcement of Riak 0.14, myNoSQL is back officially.

Riak 0.14 is featuring those Map/Reduce improvements I’ve already written about[1] and quite a few other interesting features and improvements:

  • Cluster and node debugging: The ability to monitor and debug a running Riak cluster received some substantial enhancements in 0.14.
  • Windowed merges for Bitcask: Bitcask performs periodic merges over all non-active files to compact the space being occupied by old versions of stored data. In certain situations this can cause some memory and CPU spikes on the Riak node where the merge is taking place. To that end, we’ve added the ability to specify when Bitcask will perform merges.
  • Support for HTTPS and multiple HTTP IPs
  • REST API for listing buckets

Complete release notes available here.

  1. As a side note, Kevin Smith, the Basho engineer that presented these enhancements first has moved to work for Heroku.  ()

Original title and link: Riak 0.14 Released with MapReduce Enhancements, Cluster and Node Debugging (NoSQL databases © myNoSQL)