Introduction to Apache HBase Snapshots
Matteo Bertozzi introduces HBase snapshots:
Prior to CDH 4.2, the only way to back-up or clone a table was to use Copy/Export Table, or after disabling the table, copy all the hfiles in HDFS. Copy/Export Table is a set of tools that uses MapReduce to scan and copy the table but with a direct impact on Region Server performance. Disabling the table stops all reads and writes, which will almost always be unacceptable.
In contrast, HBase snapshots allow an admin to clone a table without data copies and with minimal impact on Region Servers. Exporting the snapshot to another cluster does not directly affect any of the Region Servers; export is just a distcp with an extra bit of logic.
The part that made me really curious and that didn’t make too much sense when first reading the post is “clone a table without data copies”. But the post clarifies what the snapshot is:
A snapshot is a set of metadata information that allows an admin to get back to a previous state of the table. A snapshot is not a copy of the table; it’s just a list of file names and doesn’t copy the data. A full snapshot restore means that you get back to the previous “table schema” and you get back your previous data losing any changes made since the snapshot was taken.
What I still don’t understand is how snapshots are working after a major compaction (which drops deletes and expired cells).
Original title and link: Introduction to Apache HBase Snapshots (©myNoSQL)
via: http://blog.cloudera.com/blog/2013/03/introduction-to-apache-hbase-snapshots/