HBase: All content tagged as HBase in NoSQL databases and polyglot persistence
Let’s start the year with a quick review of the latest releases that happened in December. Make sure that you scroll to the end as there are quite a few important ones.
Announced on Dec.15th, MongoDB 2.0.2 is a bug fix release:
- Hit config server only once per mongos on meta data change to not overwhelm
- Removed unnecessary connection close and open between mongos and mongod after getLastError
- Replica set primaries close all sockets on stepDown()
- Do not require authentication for the buildInfo command
- scons option for using system libraries
Apache Hive 0.8.0
Just as a side note, who came out with the idea of having a Hive fans’ page on Facebook?
Apache ZooKeeper 3.4.2
ZooKeeper 3.4.0 has been followed up shortly by two new minor version updates fixing some critical bugs. The list of issues fixed in ZooKeeper 3.4.1 can be found here and for ZooKeeper 3.4.2 the 2 fixed bugs are listed here.
As with ZooKeeper 3.4.0, these versions are not yet production ready.
Apache Whirr 0.7.0
Apache Whirr 0.7.0 has been released on Dec.21st featuring 56 improvements and bug fixes including support for Puppet & Chef, and Mahout and Ganglia as a service. The complete list can be found here.
Some more details about Whirr 0.7.0 can be found here.
Apache HBase 0.90.5
Redis 2.4.5 was released on Dec.23rd and provides 4 bug fixes:
- [BUGFIX] Fixed a ZUNIONSTORE/ZINTERSTORE bug that can cause a NaN to be inserted as a sorted set element score. This happens when one of the elements has
-infscore and the weight used is 0.
- [BUGFIX] Fixed memory leak in
- [BUGFIX] Fixed a non critical
SORTbug (Issue 224).
- [BUGFIX] Fixed a replication bug: now the timeout configuration is respected during the connection with the master.
--quietoption implemented in the Redis test.
Last but definitely one of the most important announcements that came in December:
Based on the 0.20-security code line, Hadoop 1.0.0 was announced on Dec.29. This release includes support for:
- HBase (append/hsynch/hflush) and Security
- Webhdfs (with full support for security)
- Performance enhanced access to local files for HBase
- Other performance enhancements, bug fixes, and features
- All version 0.20.205 and prior 0.20.2xx features
Complete release notes are available here.
And with this we are ready for 2012.
Original title and link: Last NoSQL Releases in 2011: MongoDB, Hive, ZooKeeper, Whirr, HBase, Redis, and Hadoop 1.0.0 ( ©myNoSQL)
In this O’Reilly webcast, long time HBase developer and Cloudera HBase/Hadoop architect Lars George discusses the underlying concepts of the storage layer in HBase and how to do model data in HBase for best possible performance.
We have evaluated various options to backup data inside HBase and built a solution. This post will explain the options and also provide the solution for anyone to download and implement it for their own HBase installations.
After considering these options we developed a simple tool, which backs up data to Amazon S3 and restores it when needed. Another requirement is to take a full backup over weekend and a daily incremental backup.
In a recovery scenario, it should firstly initiate a clean environment with all tables created and populated with latest full backup data. Then it should apply all incremental backups sequentially. However, with this method, deletes are not captured and this may lead to some unnecessary data in tables. This is a known disadvantage for this method of backup and restore.
This backup program uses internally the HBase Import and Export tools to execute the programs in a Map-Reduce way.
Top 10 Features of the backup tool
- Export complete data for the given set of tables to S3 bucket.
- Export incrementally data for the given set of tables to S3 bucket.
- List all complete as well as incremental backup repositories.
- Restore a table from backup based on the given backup repository.
- Runs in Map-Reduce
- In case of connection failure, retries with increasing delays
- Handles special characters like _ which creates the export and import activities.
- Enhancement of existing Export and Import tool with detail logging to report a failure than just exiting with a program status of 1.
- Works in human readable time format for taking, listing and restoring of backup than using system tick time or unix
EPOCHtime (Time represented as a Number than readabale format as
YYYY.MM.DD 24HH:MINUTE:SECOND:MILLSECOND TIMEZONE)
- All parameters are taken from command line which allows the cron job to run this at regular interval.
Setting up the tool
- Download the package from hbackup.install.tar
This package includes the necessary jar files and the source code.
- Setup a configuration file. Download the
hbase-site.xmlfile. Add to this
- Setup the class path with all jars existing inside the
hbackup-1.0-core.jarfile bundled inside the downloaded hbackup.install.tar. Make sure
hbackup-1.0-core.jarat the beginning of the classpath. In addition to this add the configuration directory to CLASSPATH which has kept hbase-site.xml file.
Running the tool
Usage: It runs in 4 modes as [backup.full], [backup.incremental], [backup.history] and [restore].
mode=backup.full tables="comma separated tables" backup.folder=S3-Path date="YYYY.MM.DD 24HH:MINUTE:SECOND:MILLSECOND TIMEZONE"
mode=backup.full tables=tab1,tab2,tab3 backup.folder=s3://S3BucketABC/ date="2011.12.01 17:03:38:546 IST"
mode=backup.full tables=tab1,tab2,tab3 backup.folder=s3://S3BucketABC/
mode=backup.incremental tables="comma separated tables" backup.folder=S3-Path duration.mins=Minutes
Example of backup of changes occurred in the last 30 minutes:
mode=backup.incremental backup.folder=s3://S3BucketABC/ duration.mins=30 tables=tab1,tab2,tab3
Example of listing past archives. Incremental ones end with
mode=restore backup.folder=S3-Path/ArchieveDate tables="comma separated tables"
Example of adding the rows archived during that date. First apply a full backup and then apply incremental backups.
mode=backup.history backup.folder=s3://S3-Path/DAY_MON_HH_MI_SS_SSS_ZZZ_YYYY tables=tab1,tab2,tab3
Sample scripts to run the backup tool
$ cat setenv.sh for file in `ls /mnt/hbase/lib` do export CLASSPATH=$CLASSPATH:/mnt/hbase/lib/$file; done export CLASSPATH=/mnt/hbase/hbase-0.90.4.jar:$CLASSPATH export CLASSPATH=/mnt/hbackup/hbackup-1.0-core.jar:/mnt/hbackup/java-xmlbuilder-0.4.jar:/mnt/hbackup/jets3t-0.8.1a.jar:/mnt/hbackup/conf:$CLASSPATH
$ cat backup_full.sh . /mnt/hbackup/bin/setenv.sh dd=`date "+%Y.%m.%d %H:%M:%S:000 %Z"` echo Backing up for date $dd for table in `echo table1 table2 table3` do /usr/lib/jdk/bin/java com.bizosys.oneline.maintenance.HBaseBackup mode=backup.full backup.folder=s3://mybucket/ tables=$table "date=$dd" sleep 10 done
List of backups:
$ cat list.sh . /mnt/hbackup/bin/setenv.sh /usr/lib/jdk/bin/java com.bizosys.oneline.maintenance.HBaseBackup mode=backup.history backup.folder=s3://mybucket
Original title and link: Backin Up HBase to Amazon S3 ( ©myNoSQL)
A new great article from Todd Hoff dissecting the DataSift architecture:
In terms of data store, DataSift architecture includes:
- MySQL (Percona server) on SSD drives
- HBase cluster (currently, ~30 hadoop nodes, 400TB of storage)
- Memcached (cache)
- Redis (still used for some internal queues, but probably going to be dismissed soon)
Leave whatever you were doing and go read it now.
Original title and link: DataSift Using MySQL, HBase, Memcached to Deal With Twitter Firehose ( ©myNoSQL)
Michael Stack (StumbleUpon & Hadoop PMC) presents on some of the more interesting HBase deployments, HBase scenario usages, HBase and HDFS, and near-future of HBase: