Backing Up HBase to Amazon S3
This is a guest post by Bizosys Team creators of HSearch, an opensource, NoSQL, distributed, real-time search engine built on Hadoop and HBase.
We have evaluated various options to backup data inside HBase and built a solution. This post will explain the options and also provide the solution for anyone to download and implement it for their own HBase installations.
After considering these options we developed a simple tool, which backs up data to Amazon S3 and restores it when needed. Another requirement is to take a full backup over weekend and a daily incremental backup.
In a recovery scenario, it should firstly initiate a clean environment with all tables created and populated with latest full backup data. Then it should apply all incremental backups sequentially. However, with this method, deletes are not captured and this may lead to some unnecessary data in tables. This is a known disadvantage for this method of backup and restore.
This backup program uses internally the HBase Import and Export tools to execute the programs in a Map-Reduce way.
Top 10 Features of the backup tool
- Export complete data for the given set of tables to S3 bucket.
- Export incrementally data for the given set of tables to S3 bucket.
- List all complete as well as incremental backup repositories.
- Restore a table from backup based on the given backup repository.
- Runs in Map-Reduce
- In case of connection failure, retries with increasing delays
- Handles special characters like _ which creates the export and import activities.
- Enhancement of existing Export and Import tool with detail logging to report a failure than just exiting with a program status of 1.
- Works in human readable time format for taking, listing and restoring of backup than using system tick time or unix
EPOCHtime (Time represented as a Number than readabale format asYYYY.MM.DD 24HH:MINUTE:SECOND:MILLSECOND TIMEZONE) - All parameters are taken from command line which allows the cron job to run this at regular interval.
Setting up the tool
- Download the package from hbackup.install.tar
This package includes the necessary jar files and the source code. - Setup a configuration file. Download the
hbase-site.xmlfile. Add to thisfs.s3.awsAccessKeyId,fs.s3.awsSecretAccessKey,fs.s3n.awsAccessKeyIdandfs.s3n.awsSecretAccessKeyproperties - Setup the class path with all jars existing inside the
hbase/libdirectory,hbase.jarfile,java-xmlbuilder-0.4.jar,jets3t-0.8.1a.jarandhbackup-1.0-core.jarfile bundled inside the downloaded hbackup.install.tar. Make surehbackup-1.0-core.jarat the beginning of the classpath. In addition to this add the configuration directory to CLASSPATH which has kept hbase-site.xml file.
Running the tool
Usage: It runs in 4 modes as [backup.full], [backup.incremental], [backup.history] and [restore].
[backup.full]
mode=backup.full tables="comma separated tables" backup.folder=S3-Path date="YYYY.MM.DD 24HH:MINUTE:SECOND:MILLSECOND TIMEZONE"
Example:
mode=backup.full tables=tab1,tab2,tab3 backup.folder=s3://S3BucketABC/ date="2011.12.01 17:03:38:546 IST"mode=backup.full tables=tab1,tab2,tab3 backup.folder=s3://S3BucketABC/
[backup.incremental]
mode=backup.incremental tables="comma separated tables" backup.folder=S3-Path duration.mins=Minutes
Example of backup of changes occurred in the last 30 minutes:
mode=backup.incremental backup.folder=s3://S3BucketABC/ duration.mins=30 tables=tab1,tab2,tab3
backup.history
mode=backup.history backup.folder=S3-Path
Example of listing past archives. Incremental ones end with .incr
mode=backup.history backup.folder=s3://S3BucketABC/
[restore]
mode=restore backup.folder=S3-Path/ArchieveDate tables="comma separated tables"
Example of adding the rows archived during that date. First apply a full backup and then apply incremental backups.
mode=backup.history backup.folder=s3://S3-Path/DAY_MON_HH_MI_SS_SSS_ZZZ_YYYY tables=tab1,tab2,tab3
Sample scripts to run the backup tool
Setup:
$ cat setenv.sh
for file in `ls /mnt/hbase/lib`
do
export CLASSPATH=$CLASSPATH:/mnt/hbase/lib/$file;
done
export CLASSPATH=/mnt/hbase/hbase-0.90.4.jar:$CLASSPATH
export CLASSPATH=/mnt/hbackup/hbackup-1.0-core.jar:/mnt/hbackup/java-xmlbuilder-0.4.jar:/mnt/hbackup/jets3t-0.8.1a.jar:/mnt/hbackup/conf:$CLASSPATH
Full backup:
$ cat backup_full.sh
. /mnt/hbackup/bin/setenv.sh
dd=`date "+%Y.%m.%d %H:%M:%S:000 %Z"`
echo Backing up for date $dd
for table in `echo table1 table2 table3`
do
/usr/lib/jdk/bin/java com.bizosys.oneline.maintenance.HBaseBackup mode=backup.full backup.folder=s3://mybucket/ tables=$table "date=$dd"
sleep 10
done
List of backups:
$ cat list.sh
. /mnt/hbackup/bin/setenv.sh
/usr/lib/jdk/bin/java com.bizosys.oneline.maintenance.HBaseBackup mode=backup.history backup.folder=s3://mybucket
Original title and link: Backin Up HBase to Amazon S3 (©myNoSQL)