Hive: All content tagged as Hive in NoSQL databases and polyglot persistence
Let’s start the year with a quick review of the latest releases that happened in December. Make sure that you scroll to the end as there are quite a few important ones.
Announced on Dec.15th, MongoDB 2.0.2 is a bug fix release:
- Hit config server only once per mongos on meta data change to not overwhelm
- Removed unnecessary connection close and open between mongos and mongod after getLastError
- Replica set primaries close all sockets on stepDown()
- Do not require authentication for the buildInfo command
- scons option for using system libraries
Apache Hive 0.8.0
Just as a side note, who came out with the idea of having a Hive fans’ page on Facebook?
Apache ZooKeeper 3.4.2
ZooKeeper 3.4.0 has been followed up shortly by two new minor version updates fixing some critical bugs. The list of issues fixed in ZooKeeper 3.4.1 can be found here and for ZooKeeper 3.4.2 the 2 fixed bugs are listed here.
As with ZooKeeper 3.4.0, these versions are not yet production ready.
Apache Whirr 0.7.0
Apache Whirr 0.7.0 has been released on Dec.21st featuring 56 improvements and bug fixes including support for Puppet & Chef, and Mahout and Ganglia as a service. The complete list can be found here.
Some more details about Whirr 0.7.0 can be found here.
Apache HBase 0.90.5
Redis 2.4.5 was released on Dec.23rd and provides 4 bug fixes:
- [BUGFIX] Fixed a ZUNIONSTORE/ZINTERSTORE bug that can cause a NaN to be inserted as a sorted set element score. This happens when one of the elements has
-infscore and the weight used is 0.
- [BUGFIX] Fixed memory leak in
- [BUGFIX] Fixed a non critical
SORTbug (Issue 224).
- [BUGFIX] Fixed a replication bug: now the timeout configuration is respected during the connection with the master.
--quietoption implemented in the Redis test.
Last but definitely one of the most important announcements that came in December:
Based on the 0.20-security code line, Hadoop 1.0.0 was announced on Dec.29. This release includes support for:
- HBase (append/hsynch/hflush) and Security
- Webhdfs (with full support for security)
- Performance enhanced access to local files for HBase
- Other performance enhancements, bug fixes, and features
- All version 0.20.205 and prior 0.20.2xx features
Complete release notes are available here.
And with this we are ready for 2012.
Original title and link: Last NoSQL Releases in 2011: MongoDB, Hive, ZooKeeper, Whirr, HBase, Redis, and Hadoop 1.0.0 ( ©myNoSQL)
Sifting through the PRish announcements related to Informatica HParser, what I’ve figured out so far is:
- it is the T in ETL
- a visual tool for creating parsing definitions for formats like web logs, XML, JSON, FIX, SWIFT, HL7, CDR, WORD, PDF, XLS, etc.
- transformations can be accessed from Hadoop MapReduce, Hive, or Pig
- the benefits of using HParser come from being able to share the same parsing definitions/transformations in the context of the Hadoop distributed environment
- HParser tries to provide an optimal transformation solution when streaming, splitting, and processing large files
- HParser is available in two licensing formats: community and commercial
Original title and link: What Is Informatica HParser for Hadoop? ( ©myNoSQL)
According to the official documentation, Brisk key advantages:
- No single point of failure
- streamlined setup and operations
- analytics without ETL
- full integration with DataStax OpsCenter
I just heard the announcement DataStax, the company offering Cassandra services, made about Brisk a Hadoop and Hive distribution built on top of Cassandra:
Brisk provides integrated Hadoop MapReduce, Hive and job and task tracking capabilities, while providing an HDFS-compatible storage layer powered by Cassandra.
Brisk was announced officially during the MapReduce panel at Structure Big Data event. But it looks like others have already had a chance to hear about Brisk — is there something that I should be doing to hear the “unofficial” announcements?
DataStax has also made available a whitepaper: “Evolving Hadoop into a Low-Latency Data Infrastructure: Unifying Hadoop, Hive and Apache Cassandra for Real-time and Analytics” that you can download from here