ALL COVERED TOPICS

NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter

NAVIGATE MAIN CATEGORIES

Close

replication: All content tagged as replication in NoSQL databases and polyglot persistence

Redis Partial Resynchronization PSYNC, Plus Philosophy, Trade-Offs, and Making Decisions When Creating Tools for People to Use

There are way, way too many things I’d want to quote from Salvatore’s post. They are about the philosophy of a strong product, they are about the trade-offs that go into engineering solid but friendly products, they are about making decisions and not allowing misconceptions or old bad experiences have a bad influence on what you are building. You must read it.

Original title and link: Redis Partial Resynchronization PSYNC, Plus Philosophy, Trade-Offs, and Making Decisions When Creating Tools for People to Use (NoSQL database©myNoSQL)

via: http://antirez.com/news/47


MySQL Delayed Replication - Making a Slave Deliberately Lag Behind a Master

Tony Darnell explains in which use cases and how to configure delayed replication, a feature available in MySQL 5.6:

  1. Scenario #1 – To protect against user mistakes on the master. A DBA can roll back a delayed slave to the time just before the disaster.
  2. Scenario #2 – To test how the system behaves when there is a lag. For example, in an application, a lag might be caused by a heavy load on the slave. However, it can be difficult to generate this load level. Delayed replication can simulate the lag without having to simulate the load. It can also be used to debug conditions related to a lagging slave.
  3. Scenario #3 – To inspect what the database looked like long ago, without having to reload a backup. For example, if the delay is one week and the DBA needs to see what the database looked like before the last few days’ worth of development, the delayed slave can be inspected.

The first time I’ve heard about intentional delayed replication was a couple of months ago from an ex-DBA guy. My first thought was: “are you kidding me? Everyone in the databases world tries to make the replication as fast as possible and you want delays???”. After a few seconds of what probably looked to be stupid silence, it clicked. I realized there could be use cases of this weird feature. The guy also taught me about similar scenarios as the ones above.

Mat Keep

Original title and link: MySQL Delayed Replication - Making a Slave Deliberately Lag Behind a Master (NoSQL database©myNoSQL)

via: http://scriptingmysql.wordpress.com/2013/01/02/mysql-5-6-delayed-replication-making-a-slave-deliberately-lag-behind-a-master/


Redis Partial Resyncs and Synchronous Replication

Speaking about what I think to be a wrong decision in MongoDB replication, I’ve found the following in a post about Redis replication:

If a slave lost the connection, it connects again, see if the master RUNID is the same, and asks to continue from a given offset. If this is possible, we continue, nothing is lost, and a full resynchronization is not needed. Otherwise if the offset is about data we no longer have in the backlog, we full resync.

The last part made me think that the behavior of Redis’s replication is exactly the same as the one described by the OP when losing data on the secondary node after setting up replication. But it is the first part that makes the difference: the slave checks first if the master is the same.

Original title and link: Redis Partial Resyncs and Synchronous Replication (NoSQL database©myNoSQL)

via: http://antirez.com/news/45


Oops Replication - MongoDB Secondary Node Data Loss

From SO:

I have two mongod instances without replication each having same collection name but different data.Now initialized replication between them.Secondary machine copies all data from primary machine and looses it’s original data.Can I recover original data present in secondary machine ?

Leaving aside the typos in the question (and any resentments they might generate), would you consider this the expected behavior? To me this sounds like a conflict in the setup and the database should error.

Original title and link: Oops Replication - MongoDB Secondary Node Data Loss (NoSQL database©myNoSQL)


An Overview of RavenDB Replication

Good overview of main characteristics of RavenDB replication by John Bennett:

  1. one-way
  2. push-based
  3. asynchronous
  4. secure
  5. batched

Original title and link: An Overview of RavenDB Replication (NoSQL database©myNoSQL)

via: http://jtbennett.com/blog/2013/01/ravendb-replication-an-overview


Benchmarking MySQL Replication With Multi-Threaded Slaves

Mat Keep about the multi-threaded slave replication available in MySQL 5.6:

The multi-threaded slave splits processing between worker threads based on schema, allowing updates to be applied in parallel, rather than sequentially. This delivers benefits to those workloads that isolate application data using databases - e.g. multi-tenant systems deployed in cloud environments.

Original title and link: Benchmarking MySQL Replication With Multi-Threaded Slaves (NoSQL database©myNoSQL)

via: https://blogs.oracle.com/MySQL/entry/benchmarking_mysql_replication_with_multi


Solr Index Replication at Etsy: From HTTP to BitTorrent

Etsy went from using HTTP to BitTorrent for replicating Solr indexes:

By integrating BitTorrent protocol into Solr we could replace HTTP replication. BitTorrent supports updating and continuation of downloads, which works well for incremental index updates. When we use BitTorrent for replication, all of the slave servers seed index files allowing us to bring up new slaves (or update stale slaves) very quickly.

[…]

Our Ops team started experimenting with a BitTorrent package herd, which sits on top of BitTornado. Using herd they transferred our largest search index in 15 minutes. They spent 8 hours tweaking all the variables and making the transfer faster and faster. Using pigz for compression and herd for transfer, they cut the replication time for the biggest index from 60 minutes to just 6 minutes!

Make sure you don’t miss the part where they were experimenting with multicast UDP rsync.

Original title and link: Solr Index Replication at Etsy: From HTTP to BitTorrent (NoSQL database©myNoSQL)

via: http://codeascraft.etsy.com/2012/01/23/solr-bittorrent-index-replication/


Podcast: MySQL Cluster News: Performance Improvements,New NoSQL Access

Mat Keep and Bernd Ocklin discuss what’s new in the second milesone release of MySQL Cluster 7.2: performance improvements, new NoSQL access (memcached protocol), cross data center scalability. Download the mp3.

Original title and link: Podcast: MySQL Cluster News: Performance Improvements,New NoSQL Access (NoSQL database©myNoSQL)


MongoDB Journaling and Replication Interaction

How do we know our data won’t be rolled back? The answer is that a write is truly committed in a replica set when it has been written at a majority of set members. We can confirm this with the getLastError command. For example, if our write has made it to the journal on two out of three total set members, we know the data is committed even if nodes fail in a cascading sequence, and even if a minority of nodes are permanently lost.

Journaling was added in MongoDB 1.8 for crash safetiness and recovery. But the way I read this post about the way MongoDB journaling and replication works makes me think that MongoDB data is not safe without always using getLastError. And this approach is both decreasing MongoDB speed and might lead to unavailability scenarios.

Original title and link: MongoDB Journaling and Replication Interaction (NoSQL database©myNoSQL)

via: http://blog.mongodb.org/post/6254464258/how-journaling-and-replication-interact


RavenDB Filtered Replication

RavenDB gets filtered replication— like the one CouchDB had for a while

One of the features I asked about was the ability to filter out parts of the namespace for replication — instead of the ‘all or nothing’ approach used by default. […] Basically — my aim is to allow the developer to set replication filters — so only a part of a namespace is replicated — rather than the whole db.

For now this feature is available as a forked project on GitHub.

Original title and link: RavenDB Filtered Replication (NoSQL databases © myNoSQL)

via: http://blogs.sonatribe.com/wayne/2011/06/07/replication-predicates-in-ravendb/


CouchDB Basic Disaster-Recovery

Iris Couch, the CouchDB hosting spin-off, showing how to setup up replication from a hosted CouchDB:

We want to perform one simple task to completely pull the plug, jumping to a different CouchDB system. Thus these are the main objectives:

  • Have a duplicate of the Iris couch, called B-Couch.
  • B-Couch syncs from Iris couch automatically or regularly.
  • Be able to activate B-Couch with a single domain name change.

Thanks to CouchDB peer-to-peer replication setting up online hot or cold copies is mostly a matter of your imagination (and needs).

Original title and link: CouchDB Basic Disaster-Recovery (NoSQL databases © myNoSQL)

via: http://www.iriscouch.com/blog/2011/05/how-to-bail-out-on-iris-couch


MongoDB Replica Sets and Sharding Question List

I’m seeing many questions asked[1] about MongoDB replication and sharding, so I thought it might be an idea to try to gather the most interesting ones and submit them to the MongoDB/10gen people to get some answers. So, after reading the documentation on MongoDB replica sets[2] and MongoDB sharding[3], what questions would you want answered?

Please post a comment with your questions and I’ll be adding them to the list. After accumulating a couple of good questions I’ll forward them to MongoDB/10gen people for answers.

To start with, here are a couple from me:

  1. What may be causing replica sets to lag behind the master or become “disconnected” (except network partitions)?
  2. How would one determine the best size of sharding “chunks”?
  3. Is it possible and what would lead to an unbalanced cluster?

What are yours? Feel free to forward it to any friends using MongoDB that are looking into using replica sets and sharding.

Note: I’m not planning to create yet another forum or Q&A site, so I’ll make sure that once we get some answers these will be published in a place where everyone interested will find them easily.

Note: If you are interested in getting answers about other NoSQL database, please let me know and I’ll create the initial list.


  1. Not only on the ☞ MongoDB group, but also on Quora.com, StackOverflow.com and blogs  ()
  2. MongoDB replica sets resources  ():
  3. MongoDB sharding resources  ():

Original title and link: MongoDB Replica Sets and Sharding Question List (NoSQL databases © myNoSQL)