☞ A simple notice about a short scheduled GitHub downtime resulted in a great dialog about how to perform zero downtime Redis upgrades. The following approaches were suggested:
- using Redis replication (i.e.
SLAVE OF) to move existing data from the old Redis to the new Redis and haproxy to transparently pass client requests to the Redis instance (initially to the existing version and once replication completed to the new one)
- Leave the old redis version running on, say, redis:6379.
- Install and start a new redis on redis:6380 with a different dump file location.
- Execute SLAVE OF redis 6379 against redis:6380. Wait for first SYNC to complete.
echo "enable server redis/redis-6380" | socat stdio unix-connect:/var/run/haproxy/admin.sock
echo "disable server redis/redis-6379" | socat stdio unix-connect:/var/run/haproxy/admin.sock
- Execute SLAVE OF no one on redis:6380.
- Execute SHUTDOWN on redis:6379.
- using virtual IPs instead of haproxy
A more generic idea shared in the comment thread was to architect the solution so that the overall system would continue to work even if some subsystems are temporarily down. If you think in terms of the CAP theorem, this idea could be translated as a system that is AP at all times.
While writing this post I remembered reading about a ☞ solution based on node.js. node.js would be used to proxy requests until it is notified that the back system goes down. It would temporarily hold incoming connections until it would be notified again that the back system is resuming.
Any other ideas?