releases: All content tagged as releases in NoSQL databases and polyglot persistence
Even if it’s only the 4th minor release since Cassandra reached the 1.0 version mid-October, these maintenance updates have brought over 100 bug fixes and improvements.
- fix self-hinting of timed out read repair updates and make hinted handoff less prone to OOMing a coordinator (CASSANDRA-3440)
- expose bloom filter sizes via JMX (CASSANDRA-3495)
- enforce RP tokens 0..2**127 (CASSANDRA-3501)
- canonicalize paths exposed through JMX (CASSANDRA-3504)
- fix “liveSize” stat when sstables are removed (CASSANDRA-3496)
- add bloom filter FP rates to nodetool cfstats (CASSANDRA-3347)
- record partitioner in sstable metadata component (CASSANDRA-3407)
- add new upgradesstables nodetool command (CASSANDRA-3406)
- skip —debug requirement to see common exceptions in CLI (CASSANDRA-3508)
- fix incorrect query results due to invalid max timestamp (CASSANDRA-3510)
- fix ConcurrentModificationException in Table.all() (CASSANDRA-3529)
- make sstableloader recognize compressed sstables (CASSANDRA-3521)
- avoids race in OutboundTcpConnection in multi-DC setups (CASSANDRA-3530)
- use SETLOCAL in cassandra.bat (CASANDRA-3506) Merged from 0.8:
- fix concurrence issue in the FailureDetector (CASSANDRA-3519)
- fix array out of bounds error in counter shard removal (CASSANDRA-3514)
- avoid dropping tombstones when they might still be needed to shadow data in a different sstable (CASSANDRA-2786)
Frequent releases are the sign of a very active and involved community working on a fastly evolving technology and not those of a buggy product.
Original title and link: Cassandra 1.0.4 Maintenance Release ( ©myNoSQL)
A new version of Redis is available for download and it is a recommended upgrade for all users as it addresses a potentially serious issue in jemalloc.
- [BUGFIX] jemalloc upgraded to version 2.2.5, previous versions had a potentially serious issue when allocating big memory areas, something that Redis actually does. However we never received bug reports that appear to be caused by jemalloc.
- [BUGFIX] DISCARD now clears DIRTY_CAS flag in the client. Now the next transaction will not fail if the previous transaction used WATCH and the key was touched.
- CLIENT LIST output modified to include the last command executed by clients.
- Better bug report on crash.
- Protocol errors are now logged for loglevel >= verbose.
- Two new INFO fields related to AOF, that can be useful when investigating Redis issues.
Original title and link: Redis 2.4.4 Released Fixes Potential Issue in Jemalloc ( ©myNoSQL)
I rarely write about MarkLogic, but the amount of information that hit me about the newly released MarkLogic 5 made me curious. Below are quotes and commentary about MarkLogic 5, the new MarkLogic Express, and MarkLogic and Hadoop integration.
MarkLogic is a next generation database for Big Data and unstructured information. MarkLogic empowers organizations to make high stakes decisions on Big Data in real time.
So far I thought MarkLogic is an XML database with powerful search capabilities. This new message makes it sound like MarkLogic is a Big Data Analytics or BI tool, which I don’t think would be the most accurate description.
MarkLogic Confidence at Scale
There are a couple of new feature falling into this category as presented in the press release:
Database Replication – protect your mission-critical information from site-wide disasters and reduce the cost of downtime
MarkLogic 5 features the ability to keep a “hot copy” of the database in another data center for quick failover in the event of a disaster, as well as a journal-archiving function that allows a database to be restored to a particular point in time.
Point-in-Time Recovery – recover from backups to a specific point-in-time then roll forward using the transaction log to a specific point-in-time, minimizing the window for lost data between the occurrence of a disaster and the time the last backup was taken.
Not sure how it works and what are the requirements for getting it to work, but point-in-time recovery sounds like a very interesting feature.
Enterprise Big Data
- Simplified Monitoring — new monitoring and management features enable organizations to see system status at a glance with real-time charts of metrics such as I/O rates and loads, request activity, and disk usage.
- Monitoring Plug-Ins — integration with HP Operations Manager and Nagios
- Tiered Storage – expand Big Data performance by implementing a solid state disk (SSD) tier between memory and disk
This last feature is one that prepares MarkLogic for the future by allowing it to work smartly with different storages. Ron Avnur (CTO, MarkLogic) interviewed by Chris Kanaracus:
We realized people have rotational drives and network-attached storage, and are starting to play more seriously with solid-state. These have different performance profiles.
System administrators will tell MarkLogic where and what the options for storage are, and the system will “do all the optimization.” In this way, more frequently used data can be kept in flash and older or less frequently accessed information held elsewhere.
I’m not aware of other solutions being able to play smart with heterogeneous storage deployments.
MarkLogic Connector for Hadoop
The MarkLogic Connector for Hadoop powers large-scale batch processing for Big Data Analytics on the structured, semi-structured, and unstructured data residing inside MarkLogic. Using MarkLogic for real time analytics with Hadoop for batch processing brings the best of Big Data to companies that need real time, secure, enterprise applications that are cost effective with high performance. With simple drop-in installation, organizations can run MapReduce on data inside MarkLogic and take advantage of Hadoop’s development and management tools, all while being able to leverage MarkLogic’s indexes and distributed architecture for performance. This combination results in enhanced search, analytics, and delivery in MarkLogic, and enables organizations to progressively enhance data without having to remove it from the database.
MarkLogic sees Hadoop as being able to support MarkLogic for various uses. For example, an intelligence-gathering organization could collect data that is into hundreds of petabytes, not understanding what exactly is there, but then decide to investigate a particular topic in-depth. In such a scenario, users would want to use MarkLogic for interaction with this content, asking questions and getting answers in sub-second time, and then asking other questions and exploring the data for insights. However, Hunter explains, because the data is so large it would probably not be cost-effective to load hundreds of petabytes of data into MarkLogic if they don’t have to, and so they can load the data into Hadoop and run a Hadoop job to select the portion of the content that it makes sense to do real-time analytics against and load that into MarkLogic for interactive queries. “So you go from hundreds of petabytes down to one petabyte, or half a petabyte, do bulk load and do interactive queries against it.”
MarkLogic Express, a new MarkLogic 5 license that allows students and developers to download and take MarkLogic into production immediately.
MarkLogic Express includes geospatial capabilities, alerting, and can be used in production environments. That means a developer can take a MarkLogic implementation that leverages a 2 CPU node and up to 40 GB of data live.
Josette Rigsby points out some more limitations of the Express version:
- Can’t combine with another licensed install of MarkLogic
- Can’t be used for work on behalf of the U.S. Federal Government
- No clustering
- Can’t run multiple production copies of Express for the same application
- Cannot be used by development teams — note: this point is very confusing.
It looks like MarkLogic is ackowledging the power developers represent in the current organizations and they decided to offer access to the product. While I don’t think the current restrictions would allow someone to go in production with the MarkLogic Express version, I still believe is better than nothing. I’ve also read that students and researchers could get access to a less restrictive version—something that’s easy to appreciate.
MarkLogic 5 includes also some feature that are probably appealing to their users (rich media support, document filters, query console, REST-based API, distributed transaction support, geo-support).
I’m leaving you with Curt Monash’s comments:
MarkLogic seems to have settled on a positioning that, although distressingly buzzword-heavy, is at least partly based upon reality. The real part includes:
- MarkLogic is a serious, enterprise-class DBMS (see for example Slide 12 of the MarkLogic deck) …
- … which has been optimized from the getgo for poly-structured data.
- MarkLogic can and does scale out to handle large amounts of data.
- MarkLogic is a general-purpose DBMS, suitable for both short-request and analytic tasks.
- MarkLogic is particularly well suited for analyses with long chains of “progressive enhancement” (MarkLogic’s favorite term when talking about derived data).
- MarkLogic often plays the role of a content assembler and/or search engine, and the people who MarkLogic in those ways are commonly doing things that can be described as research and analysis.
and a short video of MarkLogic CTO, Ron Avnur summarizing the release:
In case it wasn’t obvious I don’t like XML as a storage format, nor did I like XML databases. ↩
Original title and link: MarkLogic 5: Confidence at Scale, Enterprise Big Data, Hadoop Connector, Express Edition ( ©myNoSQL)
- embedding: “You can now write a python program and embed Pig scripts inside of it, leveraging all language features provided by Python, including control flow”
- project-range expressions
- improved error messages
- typed maps
- new UDFs
Original title and link: Pig 0.9: New Features Documented ( ©myNoSQL)
MongoDB 1.8.3 has been pushed out minutes ago and it includes just a couple of small bug fixes and improvements:
- Lower default stack size on linux
- Improve mongos SLAVE_OK processing
- Command timing no longer includes initial lock acquisition time
- Reduce impact on the doner shard during shard migration
Original title and link: MongoDB 1.8.3 Bugfix Release ( ©myNoSQL)
I rarely have the chance to write about Project Voldemort and this new release packs so many goodies:
- non-blocking IO client/server
- pipelined routing based on finite state machine
- zone aware routing
- read-only stores pipeline
- updated Java, Python, and Ruby clients
This post from Roshan Sumbaly (LinkedIn) provides all the details:
One of the most important upgrades we have done in production recently has been switching all our clients and servers from the legacy thread-per-socket blocking I/O approach to the new non-blocking implementation which multiplexes using just a fixed number of threads (usually set in proportion to the number of CPU cores on the machine). This is good from an operations perspective on the server because we no longer have to manually keep bumping up the maximum number of threads when new clients are added. From the client’s perspective we now won’t need to worry about thread pool exhaustion due to slow responses from slow servers.
Original title and link: Voldemort V0.9 Released: NIO, Pipelined FSM, Hinted Handoff ( ©myNoSQL)
Announced back in March, Ravel has finally released GoldenOrb an implementation of the Google Pregel paper—if you are not familiar with Google Pregel check the Pregel: Graph Processing at Large-Scale and Ricky Ho’s comparison of Pregel and MapReduce.
GoldenOrb is a cloud-based open source project for massive-scale graph analysis, built upon best-of-breed software from the Apache Hadoop project modeled after Google’s Pregel architecture.
Original title and link: GoldenOrb: Ravel Google Pregel Implementation Released ( ©myNoSQL)