Matteo Bertozzi introduces HBase snapshots:
Prior to CDH 4.2, the only way to back-up or clone a table was to use
Copy/Export Table, or after disabling the table, copy all the hfiles in
HDFS. Copy/Export Table is a set of tools that uses MapReduce to scan and
copy the table but with a direct impact on Region Server performance.
Disabling the table stops all reads and writes, which will almost always be
In contrast, HBase snapshots allow an admin to clone a table without data
copies and with minimal impact on Region Servers. Exporting the snapshot to
another cluster does not directly affect any of the Region Servers; export
is just a distcp with an extra bit of logic.
The part that made me really curious and that didn’t make too much sense when first reading the post is “clone a table without data copies”. But the post clarifies what the snapshot is:
A snapshot is a set of metadata information that allows an admin to get back
to a previous state of the table. A snapshot is not a copy of the table;
it’s just a list of file names and doesn’t copy the data. A full snapshot
restore means that you get back to the previous “table schema” and you get
back your previous data losing any changes made since the snapshot was
What I still don’t understand is how snapshots are working after a major compaction (which drops deletes and expired cells).
Original title and link: Introduction to Apache HBase Snapshots ( ©myNoSQL)