The interview Dmitry Sotnikov1 had with Monty Widenius was published on so many places that I had a hard time deciding which to link to. Anyways, there are a couple of comments and corrections that I’d like to suggest:
The whole thing with the “new NoSQL movement” started with a blog post from a Twitter employee that said MySQL was not good enough and they needed “something better,” like Cassandra.
That’s not quite correct. The “NoSQL movement” debuted in 2009 when the guys from Last.fm organized an event about “open source, distributed, non relational databases” where they invited people from companies like Cloudera, LinkedIn, StumbleUpon, etc. to talk about the solutions they were building to responde to their platforms’ special requirements. But as papers like Bigtable: A distributed storage system for structured data and Dynamo: Amazon’s Highly Available Key-value Store prove, NoSQL solutions have been in production way before 2009.
I can’t find the original article, but I did find a follow up a bit later where it was said MySQL would be dropped for Cassandra.
I can help find that article as it was posted on this blog: Cassandra @ Twitter: An Interview with Ryan King
The main reason Twitter had problems with MySQL back then, was that they were using it incorrectly.
I don’t think there are many examples in the history of software where a private platform benefited from more scaling advice than Twitter. Judging by how many solutions have been suggested, a possible Twitter IPO will be at risk of IP law suites.
The current state is that now, three years later, Twitter is still using MySQL as their main storage for tweets. Cassandra was, in the end, not able to replace MySQL.
That’s true. What’s also true is that at that time Cassandra was at version 0.9 and and that having to invest into a new databases was considered riskier than investing into more hardware and hiring MySQL experts.
The main reason NoSQL became popular is that, in contrast to SQL, you can start using it without having to design anything. This makes it easier to start with NoSQL, but you pay for this later when you find that you don’t have control of your data (if you are not very careful).
I assume that this is how a vendor would present flexible data models as a drawback. It is also one of the most dangerous misconceptions about NoSQL, i.e. NoSQL databases require no data modeling. The reality is that most of the time using a NoSQL database will require a lot more thinking and analysis of the data models and data access patterns. There are no blueprints, no normalized forms, and no ORMs to hide everything away.
As soon as data can’t fit into memory, SQL generally outperforms NoSQL.
Where’s the proof? According to the data I have, there’s no comparison between let’s say Cassandra and MySQL.
For anything else, you have to write a program and it’s very hard to beat a SQL optimizer for complex things, especially things that are automatically generated based on user requests (required for most web sites).
That’s true. Except when:
- most of the people don’t know how to write those SQL queries—search StackOverflow for a random sample of what I mean
- getting everything out of your database requires using vendor specific solutions
- there’re those moments when the optimizer decides to change the execution plan in such a way that brings down your whole service
The problem with Hadoop is that there is no known business model around it that ensures that the investors will get back 10X money that they expect. Because of that, I have a hard time understanding how Cloudera can survive in the long run.
Everything else in the interview is spot on.
Original title and link: Monty Widenius About NoSQL, Big Data, and Obvioulsy MySQL and MariaDB ( ©myNoSQL)