Etsy went from using HTTP to BitTorrent for replicating Solr indexes:
By integrating BitTorrent protocol into Solr we could replace HTTP replication. BitTorrent supports updating and continuation of downloads, which works well for incremental index updates. When we use BitTorrent for replication, all of the slave servers seed index files allowing us to bring up new slaves (or update stale slaves) very quickly.
Our Ops team started experimenting with a BitTorrent package herd, which sits on top of BitTornado. Using herd they transferred our largest search index in 15 minutes. They spent 8 hours tweaking all the variables and making the transfer faster and faster. Using pigz for compression and herd for transfer, they cut the replication time for the biggest index from 60 minutes to just 6 minutes!
Make sure you don’t miss the part where they were experimenting with multicast UDP rsync.
Original title and link: Solr Index Replication at Etsy: From HTTP to BitTorrent ( ©myNoSQL)
Ars Technica’s Sean Gallagher summarizes a presentation given at Surge conference covering the evolution of Etsy’s architecture from a centralized PostgreSQL stored procedures based solution to a sharded MySQL and going through a failed service oriented-like architecture:
And the team started to shift feature by feature away from a semi-monolithic Postgres back-end to sharded MySQL databases. “It’s a battle-tested approach,” Snyder said. “Flickr is using it on an enormous scale. It scales horizontally, basically, to near infinity, and there’s no single point of failure—it’s all master to master replication.”
Original title and link: The Story of Etsy’s Architecture ( ©myNoSQL)