NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



releases: All content tagged as releases in NoSQL databases and polyglot persistence

Cassandra 1.0.4 Maintenance Release

Even if it’s only the 4th minor release since Cassandra reached the 1.0 version mid-October, these maintenance updates have brought over 100 bug fixes and improvements.

The list of changes in Cassandra 1.0.4 release:

  • fix self-hinting of timed out read repair updates and make hinted handoff less prone to OOMing a coordinator (CASSANDRA-3440)
  • expose bloom filter sizes via JMX (CASSANDRA-3495)
  • enforce RP tokens 0..2**127 (CASSANDRA-3501)
  • canonicalize paths exposed through JMX (CASSANDRA-3504)
  • fix “liveSize” stat when sstables are removed (CASSANDRA-3496)
  • add bloom filter FP rates to nodetool cfstats (CASSANDRA-3347)
  • record partitioner in sstable metadata component (CASSANDRA-3407)
  • add new upgradesstables nodetool command (CASSANDRA-3406)
  • skip —debug requirement to see common exceptions in CLI (CASSANDRA-3508)
  • fix incorrect query results due to invalid max timestamp (CASSANDRA-3510)
  • fix ConcurrentModificationException in Table.all() (CASSANDRA-3529)
  • make sstableloader recognize compressed sstables (CASSANDRA-3521)
  • avoids race in OutboundTcpConnection in multi-DC setups (CASSANDRA-3530)
  • use SETLOCAL in cassandra.bat (CASANDRA-3506) Merged from 0.8:
  • fix concurrence issue in the FailureDetector (CASSANDRA-3519)
  • fix array out of bounds error in counter shard removal (CASSANDRA-3514)
  • avoid dropping tombstones when they might still be needed to shadow data in a different sstable (CASSANDRA-2786)

Frequent releases are the sign of a very active and involved community working on a fastly evolving technology and not those of a buggy product.

Original title and link: Cassandra 1.0.4 Maintenance Release (NoSQL database©myNoSQL)

Redis 2.4.4 Released Fixes Potential Issue in Jemalloc

A new version of Redis is available for download and it is a recommended upgrade for all users as it addresses a potentially serious issue in jemalloc.

The complete list of changes:

  • [BUGFIX] jemalloc upgraded to version 2.2.5, previous versions had a potentially serious issue when allocating big memory areas, something that Redis actually does. However we never received bug reports that appear to be caused by jemalloc.
  • [BUGFIX] DISCARD now clears DIRTY_CAS flag in the client. Now the next transaction will not fail if the previous transaction used WATCH and the key was touched.
  • CLIENT LIST output modified to include the last command executed by clients.
  • Better bug report on crash.
  • Protocol errors are now logged for loglevel >= verbose.
  • Two new INFO fields related to AOF, that can be useful when investigating Redis issues.

Original title and link: Redis 2.4.4 Released Fixes Potential Issue in Jemalloc (NoSQL database©myNoSQL)

Upcoming Sybase IQ Features Big Data, Support for Hadoop and MapReduce

Chris Kanaracus:

New features in the 15.4 release include a native MapReduce programming interface that uses standard SQL; a Hadoop integration that provides various ways to tie together data from Sybase and Hadoop; a Java interface and additional extensions for existing C++ interfaces for running in-database algorithms; support for PMML (Predictive Model Markup Language) via a partnership with Zementis; and a data mining and statistics library from Fuzzy Logix for use in conjunction with MapReduce.

Sybase is also now offering an Express edition of IQ, which can be used indefinitely, but for development purposes only and with a 5GB database size limit.

If you take a look at the last releases I’ve covered—take for exampleMarkLogic 5—you’ll notice a clear trend these days:

  1. every data related tool integrates with Hadoop
  2. and/or it offers some sort of parallel processing support
  3. there’s a (usually limited) version for developers

Original title and link: Upcoming Sybase IQ Features Big Data, Support for Hadoop and MapReduce (NoSQL database©myNoSQL)


MarkLogic 5: Confidence at Scale, Enterprise Big Data, Hadoop Connector, Express Edition

I rarely write about MarkLogic[1], but the amount of information that hit me about the newly released MarkLogic 5 made me curious. Below are quotes and commentary about MarkLogic 5, the new MarkLogic Express, and MarkLogic and Hadoop integration.

MarkLogic is a next generation database for Big Data and unstructured information. MarkLogic empowers organizations to make high stakes decisions on Big Data in real time.

So far I thought MarkLogic is an XML database with powerful search capabilities. This new message makes it sound like MarkLogic is a Big Data Analytics or BI tool, which I don’t think would be the most accurate description.

MarkLogic Confidence at Scale

There are a couple of new feature falling into this category as presented in the press release:

  • Database Replication – protect your mission-critical information from site-wide disasters and reduce the cost of downtime

    Chris Kanaracus:

    MarkLogic 5 features the ability to keep a “hot copy” of the database in another data center for quick failover in the event of a disaster, as well as a journal-archiving function that allows a database to be restored to a particular point in time.

  • Point-in-Time Recovery – recover from backups to a specific point-in-time then roll forward using the transaction log to a specific point-in-time, minimizing the window for lost data between the occurrence of a disaster and the time the last backup was taken.

Not sure how it works and what are the requirements for getting it to work, but point-in-time recovery sounds like a very interesting feature.

Enterprise Big Data

New features falling into this category as per the press release and coverage:

  • Simplified Monitoring — new monitoring and management features enable organizations to see system status at a glance with real-time charts of metrics such as I/O rates and loads, request activity, and disk usage.
  • Monitoring Plug-Ins — integration with HP Operations Manager and Nagios
  • Tiered Storage – expand Big Data performance by implementing a solid state disk (SSD) tier between memory and disk

This last feature is one that prepares MarkLogic for the future by allowing it to work smartly with different storages. Ron Avnur (CTO, MarkLogic) interviewed by Chris Kanaracus:

We realized people have rotational drives and network-attached storage, and are starting to play more seriously with solid-state. These have different performance profiles.

System administrators will tell MarkLogic where and what the options for storage are, and the system will “do all the optimization.” In this way, more frequently used data can be kept in flash and older or less frequently accessed information held elsewhere.

I’m not aware of other solutions being able to play smart with heterogeneous storage deployments.

MarkLogic Connector for Hadoop

Press release:

The MarkLogic Connector for Hadoop powers large-scale batch processing for Big Data Analytics on the structured, semi-structured, and unstructured data residing inside MarkLogic. Using MarkLogic for real time analytics with Hadoop for batch processing brings the best of Big Data to companies that need real time, secure, enterprise applications that are cost effective with high performance. With simple drop-in installation, organizations can run MapReduce on data inside MarkLogic and take advantage of Hadoop’s development and management tools, all while being able to leverage MarkLogic’s indexes and distributed architecture for performance. This combination results in enhanced search, analytics, and delivery in MarkLogic, and enables organizations to progressively enhance data without having to remove it from the database.

Jason Hunter (deputy CTO, MarkLogic):

MarkLogic sees Hadoop as being able to support MarkLogic for various uses. For example, an intelligence-gathering organization could collect data that is into hundreds of petabytes, not understanding what exactly is there, but then decide to investigate a particular topic in-depth. In such a scenario, users would want to use MarkLogic for interaction with this content, asking questions and getting answers in sub-second time, and then asking other questions and exploring the data for insights. However, Hunter explains, because the data is so large it would probably not be cost-effective to load hundreds of petabytes of data into MarkLogic if they don’t have to, and so they can load the data into Hadoop and run a Hadoop job to select the portion of the content that it makes sense to do real-time analytics against and load that into MarkLogic for interactive queries. “So you go from hundreds of petabytes down to one petabyte, or half a petabyte, do bulk load and do interactive queries against it.”

MarkLogic Express

Press release:

MarkLogic Express, a new MarkLogic 5 license that allows students and developers to download and take MarkLogic into production immediately.

MarkLogic Express includes geospatial capabilities, alerting, and can be used in production environments. That means a developer can take a MarkLogic implementation that leverages a 2 CPU node and up to 40 GB of data live.

Josette Rigsby points out some more limitations of the Express version:

  • Can’t combine with another licensed install of MarkLogic
  • Can’t be used for work on behalf of the U.S. Federal Government
  • No clustering
  • Can’t run multiple production copies of Express for the same application
  • Cannot be used by development teams — note: this point is very confusing.

It looks like MarkLogic is ackowledging the power developers represent in the current organizations and they decided to offer access to the product. While I don’t think the current restrictions would allow someone to go in production with the MarkLogic Express version, I still believe is better than nothing. I’ve also read that students and researchers could get access to a less restrictive version—something that’s easy to appreciate.

MarkLogic 5 includes also some feature that are probably appealing to their users (rich media support, document filters, query console, REST-based API, distributed transaction support, geo-support).

I’m leaving you with Curt Monash’s comments:

MarkLogic seems to have settled on a positioning that, although distressingly buzzword-heavy, is at least partly based upon reality. The real part includes:

  • MarkLogic is a serious, enterprise-class DBMS (see for example Slide 12 of the MarkLogic deck) …
  • … which has been optimized from the getgo for poly-structured data.
  • MarkLogic can and does scale out to handle large amounts of data.
  • MarkLogic is a general-purpose DBMS, suitable for both short-request and analytic tasks.
  • MarkLogic is particularly well suited for analyses with long chains of “progressive enhancement” (MarkLogic’s favorite term when talking about derived data).
  • MarkLogic often plays the role of a content assembler and/or search engine, and the people who MarkLogic in those ways are commonly doing things that can be described as research and analysis.

and a short video of MarkLogic CTO, Ron Avnur summarizing the release:

  1. In case it wasn’t obvious I don’t like XML as a storage format, nor did I like XML databases.  

Original title and link: MarkLogic 5: Confidence at Scale, Enterprise Big Data, Hadoop Connector, Express Edition (NoSQL database©myNoSQL)

Pig 0.9: New Features Documented

Three great posts on the Hortonworks’ blog, part 1, part 2, and part 3, detailing the most important new features included with the Apache Pig 0.9 release:

  • macros
  • embedding: “You can now write a python program and embed Pig scripts inside of it, leveraging all language features provided by Python, including control flow”
  • project-range expressions
  • improved error messages
  • typed maps
  • new UDFs

Original title and link: Pig 0.9: New Features Documented (NoSQL database©myNoSQL)

MongoDB 1.8.3 Bugfix Release

MongoDB 1.8.3 has been pushed out minutes ago and it includes just a couple of small bug fixes and improvements:

  • Increase javascript heap size from 8 to 64mb.
  • Lower default stack size on linux
  • Improve mongos SLAVE_OK processing
  • Command timing no longer includes initial lock acquisition time
  • Reduce impact on the doner shard during shard migration

Original title and link: MongoDB 1.8.3 Bugfix Release (NoSQL database©myNoSQL)

What's New in Redis 2.4

In short:

  1. a bunch of optimizations for both size and speed
  2. improved RDB persistence
  3. deprecated VM
  4. no Redis cluster
  5. no scripting

For the much longer form, read Salvatore’s post.

Original title and link: What’s New in Redis 2.4 (NoSQL database©myNoSQL)


Voldemort V0.9 Released: NIO, Pipelined FSM, Hinted Handoff

I rarely have the chance to write about Project Voldemort and this new release packs so many goodies:

  • non-blocking IO client/server
  • pipelined routing based on finite state machine
  • hinted-handoff
  • zone aware routing
  • read-only stores pipeline
  • updated Java, Python, and Ruby clients

This post from Roshan Sumbaly (LinkedIn) provides all the details:

One of the most important upgrades we have done in production recently has been switching all our clients and servers from the legacy thread-per-socket blocking I/O approach to the new non-blocking implementation which multiplexes using just a fixed number of threads (usually set in proportion to the number of CPU cores on the machine). This is good from an operations perspective on the server because we no longer have to manually keep bumping up the maximum number of threads when new clients are added. From the client’s perspective we now won’t need to worry about thread pool exhaustion due to slow responses from slow servers.

Original title and link: Voldemort V0.9 Released: NIO, Pipelined FSM, Hinted Handoff (NoSQL database©myNoSQL)

Neo4j 1.4 “Kiruna Stol” Released With Many Notable Improvements

Releasing often has too many advantages to list them all, but I think the major ones are: capturing the interest of new users (generating buzz), showing a healthy project velocity, and, probably the most important one, delivering the features and improvements users were asking for in a timely manner . Neo4j has learned these lessons[1] and since Neo4j 1.2 the team at Neo Technologies is trying a very frequent release plan which also includes milestone releases. The other day, Neo4j 1.4, a.k.a. Kiruna Stol, has been released:

Over the last three months, we’ve released 6 milestones in our 1.4 series. Today we’re releasing the final Neo4j 1.4 General Availability (GA) package. We’ve seen a whole host of new features going into the product during this time, along with numerous performance and stability improvements. We think this is our best release yet, and we hope you like the direction in which the product is heading.

There are some notable new features and improvements in this release:

  1. a new query language called Cypher[2]
  2. automatic indexing
  3. a Lucene upgrade leading to faster indexing
  4. self relationships
  5. REST API improvements: exposing batch execution API, paging mechanism for traversers
  6. webadmin, performance, and new server management scripts

  1. In the NoSQL space, they are not alone. 10gen follows a similar aggressive release plan for MongoDB. Redis, even if supported by a 2 people team, has always enjoyed frequent releases. DataStax has also started to push out Cassandra updates more often.  

  2. At first glance the query language looks odd, but I haven’t looked beyond some basic examples to understand its syntax and strenght. Neo4j also supports Gremlin.  

Original title and link: Neo4j 1.4 “Kiruna Stol” Released With Many Notable Improvements (NoSQL database©myNoSQL)


GoldenOrb: Ravel Google Pregel Implementation Released

Announced back in March, Ravel has finally released GoldenOrb an implementation of the Google Pregel paper—if you are not familiar with Google Pregel check the Pregel: Graph Processing at Large-Scale and Ricky Ho’s comparison of Pregel and MapReduce.

Until Ravel’s GoldenOrb the only experimental implementation of Pregel was the Erlang-based Phoebus. GoldenOrb was released under the Apache License v2.0 and is available on GitHub.

GoldenOrb is a cloud-based open source project for massive-scale graph analysis, built upon best-of-breed software from the Apache Hadoop project modeled after Google’s Pregel architecture.

Original title and link: GoldenOrb: Ravel Google Pregel Implementation Released (NoSQL database©myNoSQL)

Redis 2.2.11: Recommended Upgraded Fixing Critical Bug With Dictionary Iterators

Salvatore Sanfilippo about why Redis 2.2.11 is a recommended upgrade:

[…] since Redis 2.2.7 we introduced a new kind of dictionary (hash table) iterator in order to reduce copy-on-write while a child is saving. I severely underestimated the complexity of porting all the code to the new iterator without bugs, so Redis experienced a number of bugs starting from 2.2.7 due to misuses of the new iterator API in different spots of the code.

The latest bug I found of this kind, hopefully the last, is particularly critical as it is in the persistence code.

Original title and link: Redis 2.2.11: Recommended Upgraded Fixing Critical Bug With Dictionary Iterators (NoSQL database©myNoSQL)