MySQL: All content tagged as MySQL in NoSQL databases and polyglot persistence
Pierre Bailet and Mathieu Poumeyrol of fotopedia (a French photo site) share their experience of operating a small MongoDB cluster since Sep.2009 compared to a MySQL cluster.
Some details about fotopedia:
- fotopedia is 100% on AWS
- Amazon RDS for MySQL
- 4 nodes MongoDB cluster
- 150mil. photo views
- no alter table
- background index creation
- data backup & restoration
- note: as far as I can tell MySQL is able to do the same
- replica sets
- hardware migration
- note: the same procedure can be used for MySQL
Before leaving you with the slides, here is an interesting accepted trade-off:
Quietly losing seconds of writes is preferable to:
- weekly minutes-long maintenance periods
- minutes-long unscheduled downtime and manual failover in case of hardware failures
- MySQL works well enough most of the time that it’s worth using. Twitter values stability over features so they’ve stayed with older releases.
- MySQL doesn’t work for ID generation and graph storage.
- MySQL is used for smaller datasets of < 1.5TB, which is the size of their RAID array, and as a backing store for larger datasets.
- Typical database server config: HP DL380, 72GB RAM, 24 disk RAID10. Good balance of memory and disk.
In my summary of the talk I’ve noted:
- Use MySQL when it works, something else when not - fortunately MySQL often does work
- MySQL is used by Twitter because it’s robust, replication works and it’s easy to use and run
- MySQL doesn’t work good for graphs, auto_increment, replication lag is a problem
- MySQL replication improvements like crash safe multi-threaded slave is what they need
But Twitter is also one of the most prominent use cases of polyglot persistence.While MySQL is an important piece of the Twitter architecture, it is not the only storage or data processing engine.
The following other data solutions get mentioned in Jeremy’s talk:
- Cassandra is used for high velocity writes, and lower velocity reads. The advantage is Cassandra can run on cheaper hardware than MySQL, it can expand easier, and they like schemaless design.
- Hadoop is used to process unstructured and large datasets, hundreds of billions of rows.
- Vertica is being used for analytics and large aggregations and joins so they don’t have to write MapReduce jobs.
Yet that’s not the whole story. Twitter is using Cassandra and Memcached for real-time URL fetchers and they also experimented with using Gizzard for Redis. After buying BackType, Twitter got and then open sourced Storm, a Hadoop-like real-time data processing tool. And who knows what’s in the Twitter labs right now.
I’m embedding below Jeremy Cole’s “Big and Small Data at @Twitter”:
Q: Copy MySQL to RAM as a poor man’s memcached replacement?
A: Use the the MEMORY storage engine on a read only slave to do your reads from, is exactly what you really want and a sane setup. Forget “dumping it to disk” (?!) or other strange things.
You can even put the slave as another instance on your existing server if you can’t afford to setup a dedicated slave, but properly tuning the MySQL parameters for mostly read workloads will bring a significant performance enhancement too!
Original title and link: MySQL MEMORY as Poor Man’s Memcached Replacement ( ©myNoSQL)
Q: Considering performance only, can a MySQL Cluster beat a custom data sharding MySQL solution?
A: I would say that MySQL Cluster could achieve higher throughput/host than sharded MySQL+InnoDB provided that :
- Queries are simple
- All data fits in-memory
In terms of latency, MySQL Cluster should have more stable latency than sharded MySQL. Actual latencies for purely in-memory data could be similar. As queries become more complex, and data is stored on disk, the performance comparison becomes more confusing.
Make sure you read the complete answer as it covers some more MySQL Sharding vs MySQL Cluster pros and cons.
Original title and link: MySQL Sharding vs MySQL Cluster ( ©myNoSQL)