In a post introducing HyperDex backups, Robert Escriva compares the different backup solutions available in Cassandra, MongoDB, and Riak:
Cassandra: Cassandra’s backups are inconsistent, as they are taken at each server
independently without coordination. Further, “Restoring from snapshots and
incremental backups temporarily causes intensive CPU and I/O activity on the
node being restored.”
MongoDB: MongoDB provides two backup strategies. The first strategy copies the data
on backup, and re-inserts it on restore. This approach introduces high
overhead because it copies the entire data set without opportunity for
The second approach is to use filesystem-provided snapshots to quickly
backup the data of a mongod instance. This approach requires operating
system support and will produce larger backup sizes.
Riak: Riak backups are inconsistent, as they are taken at each server
independently without coordination, and require care when migrating between
IP addresses. Further, Riak requires that each server be shut down before
backing up LevelDB-powered backends.
How is HyperDex’s new backup described:
The HyperDex backup/restore process is strongly
consistent, doesn’t require shutting down servers, and
enables incremental backup support. Further, the process
is quite efficient; it completes quickly, and does not
consume CPU or I/O for extended periods of time.
The caveat is that HyperDex puts the cluster in read-only mode for backing up. That’s loss of availability. Considering both Cassandra and Riak promise is high availability, their choice was clear.
Update: This comment from Emin Gün Sirer makes me wonder if I missed something:
HyperDex quiesces the network, takes a snapshot, resumes. Whole operation takes sub-second latency.
The key point is that the system is online, available while the data copying is taking place.
Original title and link: Comparing NoSQL backup solutions