NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



Migrating databases with zero downtime

That’s how you do it!

One of the most detailed descriptions of migrating data while keeping your service availability:

The solution we came up with was to split the migration into two parts: writing, then reading.

For each component that we were migrating, we would come up with a data schema that made sense for that part of the system. We would then make a branch off of master, the ‘writes’ branch. The writes branch was responsible for 2 things. First, it would mirror all writes to Mongo/Titan into it’s eqivalent Cassandra table. […] Next, it would have a migration script that would copy all of our historical data for that component into Cassandra. So once the writes branch was deployed, and the migration script was run, all of our data was in both Mongo/Titan and Cassandra, and anything that was created or updated was also written to both places.

Next, we would make a branch off of our writes branch, this was our ‘reads’ branch. The reads branch switches all reads from Mongo/Titan to our new Cassandra table(s), removed all references to Mongo/Titan for the migrated component, and stops all writes for them. In practice, this is the most complex branch to write because of minor variations in the way things come back from the different databases.

There’s also a “keep-in-mind” list. To which I’d add:

  1. if your application doesn’t use some sort of data access layer, you’ll have a hard time completing this migration. It won’t be because you cannot identify the data access points, but because each of these would have their own expectations and way of dealing with exceptional cases;
  2. the more different the data models of your source and target databases are, the more difficult the migration will be; if it’s possible once you have the write path covered enabled the read paths one by one;
  3. do NOT disable the double write path for a while; there might be subtle but serious bugs that you haven’t discovered or performance issues that you haven’t or couldn’t predict. There also might be external processes/mini-apps that are rarely used and that you’ve totally forgotten about.

Original title and link: Migrating Databases With Zero Downtime (NoSQL database©myNoSQL)