NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



Amazon S3: All content tagged as Amazon S3 in NoSQL databases and polyglot persistence

The Story of Scaling Draw Something From an Amazon S3 Custom Key-Value Service to Using Couchbase

This is the story of scaling Draw Something told by its CTO Jason Pearlman.

The early days, a custom key-value service built on top of Amazon S3:

The original backend for Draw Something was designed as a simple key/value store with versioning. The service was built into our existing ruby API (using the merb framework and thin web server). Our initial idea was why not use our existing API for all the stuff we’ve done before, like users, signup/login, virtual currency, inventory; and write some new key/value stuff for Draw Something? Since we design for scale, we initially chose Amazon S3 as our data store for all this key/value data. The idea behind this was why not sacrifice some latency but gain unlimited scalability and storage.

Then the early signs of growth and the same key-value service using a different Ruby stack:

Being always interested in the latest tech out there, we were looking at Ruby 1.9, fibers, and in particular Event Machine + synchrony for a while. Combined with the need for a solution ASAP - this lead us to Goliath, a non-blocking ruby app server written by the guys at PostRank. Over the next 24 hours I ported over the key/value code and other supporting libraries, wrote a few tests and we launched the service live. The result was great. We went from 115 app instances on over six servers to just 15 app instances.

The custom built key-value service didn’t last long though and the switch to a real key-value store was made:

We brought up a small cluster of Membase (a.k.a Couchbase) rewrote the entire app, and deployed it live at 3 a.m. that same night. Instantly, our cloud datastore issues slowed down, although we still relied on it to do a lazy migration of data to our new Couchbase cluster.

Finally, learning to scale, tune and operate Couchbase at scale:

Even with the issues we were having with Couchbase, we decided it was too much of a risk to move off our current infrastructure and go with something completely different. At this point, Draw Something was being played by 3-4 million players each day. We contacted Couchbase, got some advice, which really was to expand our clusters, eventually to really beefy machines with SSD hard drives and tons of ram. We did this, made multiple clusters, and sharded between them for even more scalability over the next few days. We were also continuing to improve and scale all of our backend services, as traffic continued to skyrocket. We were now averaging hundreds of drawings per second.

Scaling “Draw something” is a success story. But looking at the above steps and considering how fast things had to change and evolve, think how many could have stumbled at each of these phases, think what would have meant to not be able to tell which parts of the system had to change or to have to take the system offline for upgrading parts of it.

Original title and link: The Story of Scaling Draw Something From an Amazon S3 Custom Key-Value Service to Using Couchbase (NoSQL database©myNoSQL)


Backing Up HBase to Amazon S3

This is a guest post by Bizosys Team creators of HSearch, an opensource, NoSQL, distributed, real-time search engine built on Hadoop and HBase.

We have evaluated various options to backup data inside HBase and built a solution. This post will explain the options and also provide the solution for anyone to download and implement it for their own HBase installations.

Option Pros Cons
Backup the Hadoop DFS Block data files are backed up quickly.

Even if there is no visible external load on HBase, HBase internal processes such as region balancing, compaction goes on updating the HDFS blocks. So a raw copy may result in an inconsistence state.

Secondly, Hadoop, HBase as well as Hadoop HDFS keeps data in memory and flush at periodic intervals. So raw copy may result in an inconsistent state.

HBase Import and Export tool The Map-Reduce Job downloads data to the given output path. Providing a path like s3://backupbucket/ the program fails with exceptions like: Jets3tFileSystemStore failed with AWSCredentials.
HBase Table Copy tools Another parallel replicated setup to switch. Huge investment to keep running another parallel environment to replicate production data.

After considering these options we developed a simple tool, which backs up  data to Amazon S3 and restores it when needed. Another requirement is to take a full backup over weekend and a daily incremental backup.

In a recovery scenario, it should firstly initiate a clean environment with all tables created and populated with latest full backup data. Then it should apply all incremental backups sequentially. However, with this method, deletes are not captured and this may lead to some unnecessary data in tables. This is a known disadvantage for this method of backup and restore.

This backup program uses internally the HBase Import and Export tools to execute the programs in a Map-Reduce way.

Top 10 Features of the backup tool

  1. Export complete data for the given set of tables to S3 bucket.
  2. Export incrementally data for the given set of tables to S3 bucket.
  3. List all complete as well as incremental backup repositories.
  4. Restore a table from backup based on the given backup repository.
  5. Runs in Map-Reduce
  6. In case of connection failure, retries with increasing delays
  7. Handles special characters like _ which creates the export and import activities.
  8. Enhancement of existing Export and Import tool with detail logging to report a failure than just exiting with a program status of 1.
  9. Works in human readable time format for taking, listing and restoring of backup than using system tick time or unix EPOCH time (Time represented as a Number than readabale format as YYYY.MM.DD 24HH:MINUTE:SECOND:MILLSECOND TIMEZONE)
  10. All parameters are taken from command line which allows the cron job to run this at regular interval.

Setting up the tool

  1. Download the package from hbackup.install.tar
    This package includes the necessary jar files and the source code.
  2. Setup a configuration file. Download the hbase-site.xml file. Add to this fs.s3.awsAccessKeyId, fs.s3.awsSecretAccessKey, fs.s3n.awsAccessKeyId and fs.s3n.awsSecretAccessKey properties
  3. Setup the class path with all jars existing inside the hbase/lib directory, hbase.jar file, java-xmlbuilder-0.4.jar, jets3t-0.8.1a.jar and hbackup-1.0-core.jar file bundled inside the downloaded hbackup.install.tar. Make sure hbackup-1.0-core.jar at the beginning of the classpath. In addition to this add the configuration directory to CLASSPATH which has kept hbase-site.xml file.

Running the tool

Usage: It runs in 4 modes as [backup.full], [backup.incremental], [backup.history] and [restore].


mode=backup.full tables="comma separated tables" backup.folder=S3-Path  date="YYYY.MM.DD 24HH:MINUTE:SECOND:MILLSECOND TIMEZONE"


mode=backup.full tables=tab1,tab2,tab3 backup.folder=s3://S3BucketABC/ date="2011.12.01 17:03:38:546 IST"
mode=backup.full tables=tab1,tab2,tab3 backup.folder=s3://S3BucketABC/


mode=backup.incremental tables="comma separated tables" backup.folder=S3-Path duration.mins=Minutes

Example of backup of changes occurred in the last 30 minutes:

mode=backup.incremental backup.folder=s3://S3BucketABC/ duration.mins=30 tables=tab1,tab2,tab3


mode=backup.history backup.folder=S3-Path

Example of listing past archives. Incremental ones end with .incr

mode=backup.history backup.folder=s3://S3BucketABC/


mode=restore  backup.folder=S3-Path/ArchieveDate tables="comma separated tables"

Example of adding the rows archived during that date. First apply a full backup and then apply incremental backups.

mode=backup.history backup.folder=s3://S3-Path/DAY_MON_HH_MI_SS_SSS_ZZZ_YYYY tables=tab1,tab2,tab3

Sample scripts to run the backup tool


$ cat
 for file in `ls /mnt/hbase/lib`
 export CLASSPATH=$CLASSPATH:/mnt/hbase/lib/$file;

 export CLASSPATH=/mnt/hbase/hbase-0.90.4.jar:$CLASSPATH

 export CLASSPATH=/mnt/hbackup/hbackup-1.0-core.jar:/mnt/hbackup/java-xmlbuilder-0.4.jar:/mnt/hbackup/jets3t-0.8.1a.jar:/mnt/hbackup/conf:$CLASSPATH

Full backup:

 $ cat
 . /mnt/hbackup/bin/

 dd=`date "+%Y.%m.%d %H:%M:%S:000 %Z"`
 echo Backing up for date $dd
 for table in `echo table1 table2 table3`
 /usr/lib/jdk/bin/java com.bizosys.oneline.maintenance.HBaseBackup mode=backup.full backup.folder=s3://mybucket/ tables=$table "date=$dd"
 sleep 10

List of backups:

 $ cat
 . /mnt/hbackup/bin/
 /usr/lib/jdk/bin/java com.bizosys.oneline.maintenance.HBaseBackup mode=backup.history backup.folder=s3://mybucket

Original title and link: Backin Up HBase to Amazon S3 (NoSQL database©myNoSQL)