Big and Small Data at Twitter: MySQL CE 2011
Twitter DBA Lead at Twitter, Jeremy Cole‘s talk about MySQL at Twitter from MySQL CE 2011:
Roland Bouman had some interesting notes (nb: actually tweets) from the talk:
-
115 mln tweets a day, 1 bln tweets a week, about 50.000 new accounts / day
-
random server uptime 212d, 127 bln questions (6943/s) rows read: 1.36 mln/s
-
Use MySQL when it works, something else when not - fortunately MySQL often does work
-
MySQL is used by twitter because it’s robust, replication works and it’s easy to use and run
-
MySQL doesn’t work good for graphs, auto_increment, replication lag is a problem
-
MySQL replication improvements like crash safe multi-threaded slave exactly what they need
-
Twitter open sourced snowflake (id generation system) and Gizzard distributed data storage
-
Use soft launches: new code is launched in a disabled state, turn up slowly, back down if needed
-
Gizzard builds in MySQL/InnoDB handles sharding, replication, job scheduling
-
Twitter uses Cassandra too for some projects. high velocity writes, schemaless design
-
Twitter uses Hadoop for analyzing extremely large datasets: 10 to 100 blns rows (http logs)
-
Twitter also uses vertica for analysis, 100M - 10Blns of rows. Runs 100x faster than MySQL
-
MySQL’s happy place: <= 1.5 TB datasets, archive store for larger sets.
Original title and link: Big and Small Data at Twitter: MySQL CE 2011 (NoSQL databases © myNoSQL)