NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



mysql: All content tagged as mysql in NoSQL databases and polyglot persistence

Scaling Big Data Mining Infrastructure at Twitter

I’m almost always enjoying the lessons learned-style presentations from Twitter’s people. The slides below, by Jimmy Lin and Dmitriy Ryaboy, have been used at HadoopSummit. Besides the technical and practical details, there are two things that I really like:

DJ Patil: “It’s impossible to overstress this: 80% of the work in any data project is in cleaning the data”

and then the reality check:

  1. Your boss says something vague
  2. You think very hard on how to move the needle
  3. Where’s the data?
  4. What’s in this dataset?
  5. What’s all the f#$#$ crap in the data?
  6. Clean the data
  7. Run some off-the-shelf data mining algorithm
  8. Productionize, act on the insight
  9. Rinse, repeat


Memcached vs InnoDB Memcached in MySQL 5.6

Some numbers from comparing Memcached with InnoDB Memcached in MySQL 5.6:

Keep in mind that the entire data set fits into the buffer pool, so there are no reads from disk. However, there is write activity stemming from the fact that this is using InnoDB under the hood (redo logs, etc).

There is a significant impact on the speed so deciding which solution to use gets down to analysing the costs and complexity of maintaining another tool, the cost of Memcached warmup and the performance drop of using InnoDB Memcached.

Original title and link: Memcached vs InnoDB Memcached in MySQL 5.6 (NoSQL database©myNoSQL)


10gen’s MongoDB Following the Steps of MySQL

10gen has never been shy about their plan: replacing MySQL. That’s a bold goal considering Oracle is now behind MySQL. But this could also make things a bit easier for 10gen.

Anyways, what made me write this separate post is the realization of how close 10gen is following the MySQL path:

  1. release early and incomplete. Enhance over time
  2. position the product as the developer friendly and fast
  3. introduce an enterprise edition once your adoption overpassed that of your immediate competitors.

I guess I already know how it’ll end: $2 billion acquisition from a company that gets acquired by Oracle.

While the official announcement of MongoDB 2.4 version mentioned just in passing the “MongoDB Enterprise” version, other websites didn’t leave this aspect aside. Actually it’s what got emphasized about the today’s announcement. In case you wonder what’s the the 10gen’s enterprise box: Kerberos-based security and an on-premise version of the MongoDB Monitoring Service.

The only question I have now is how soon Oracle will start looking into acquiring 10gen. Or how soon it will dedicate marketing and sales resources to directly address 10gen.

Original title and link: 10gen’s MongoDB Following the Steps of MySQL (NoSQL database©myNoSQL)

Cage Match: MySQL vs NoSQL vs Postgres

A post by Brain Aker about the state of MySQL, Postgres and NoSQL databases.

I had a couple of comments and these evolved into a long rant.

MySQL became less interesting once it was acquired […]

I’ve never been very sure what metric is used to measure how interesting a product is. That in case there’s such a metric. As opposed to some suggestions I’m reading, I haven’t seen stories of people moving away from MySQL because Oracle acquired it. Except Fedora and OpenSUSE replacing MySQL with MariaDB and this due to very specific issues (no security infos, no access to regression tests).

the number of Postgres deployments is greater then what all of the NoSQL market combined adds up to

Comparing 15 years of PosgreSQL with 3 years of NoSQL isn’t going to give meaningful results (for a similar unbalanced comparisons try Oracle vs PostgreSQL). I’m not aware of any database that captured a significant market share in the first 3 years of its existance. Except MySQL. Not Postgres.

Would a document model really matter if schemas could be altered online?

Yes, it would definitely remain relevant. Schema flexibility is not only about updating it, but also about the types allowed. PostgreSQL has indeed added support for arrays and JSON. I see this as a confirmation of what’s happening in the NoSQL space and also about the future of storage engines.

no new language has emerged from the NoSQL market that has any size-able adoption

MongoDB’s query language and the aggregation framework are used by a lot of people. It’s probably not the ideal query language and it comes in two different flavors, but it’s there and it’ll most probably evolve. Biasedly, I could also point to RethinkDB’s data manipulation language for an example of something that is probably on par with SQL and without the hidden unknown corner cases of SQL. Indeed none of these can come close the the adoption acquired by SQL in its 30 years of existance.

Bottom line is that I expect bridges to be built between relational databases and NoSQL databases and each side adopting those features that are useful to their users. I also expect that slowly this relational databases are crap vs NoSQL databases are crap debate will go away, people realizing that the data space is not a zero sum game. Vendors will be the last to give up this fight, but customers have a lot of power in making this happen.

Original title and link: Cage Match: MySQL vs NoSQL vs Postgres (NoSQL database©myNoSQL)


MySQL Reference Architectures for Massively Scalable Web Infrastructure

A 25 page whitepaper published by Oracle describing a set of best practices for MySQL deployments to accommodate scenarios from small to very large acccording to the following criteria:

  1. queries/second
  2. transactions/second
  3. concurrent read users
  4. concurrent write users
  5. database sizes for 4 types of use cases (sessions, eCommerce, analytics, content management)

Downloading the paper requires registration, but it’s worth reading and thinking about the suggested architectures (even if in a few spots it pushes for the commercials tools offered by Oracle).

Original title and link: MySQL Reference Architectures for Massively Scalable Web Infrastructure (NoSQL database©myNoSQL)

MySQL 5.6 vs. MariaDB 10.0

A post on SkySQL blog comparing the recently released version of MySQL with MariaDB:

MySQL vs MariaDB

✚ MariaDB 10.0.0 is still alpha with some of the features still under development.

OpenSUSE and Fedora plan to replace MySQL with MariaDB in their corresponding distributions, but that’s not because of the technical capabilities of MySQL.

Original title and link: MySQL 5.6 vs. MariaDB 10.0 (NoSQL database©myNoSQL)


NoSQL on MySQL: Stating the Obvious

Matthew Aslett about Couchbase’s and DataStax’s reactions to Oracle’s announcement of MySQL support of NoSQL API:

Sure, Couchbase and DataStax laid it on a bit thick, but these are corporate blog posts – it goes with the territory.

I’ve already linked and commented about these: Couchbase’s reaction and DataStax’s reaction. What I didn’t know—more accurately I should probably write “I hoped”—is that this sort of reactions come with the “corporate” badge. But I’ll keep my hope considering the exhaustive list of reactions from other NoSQL companies.

Original title and link: NoSQL on MySQL: Stating the Obvious (NoSQL database©myNoSQL)


DataStax's Reaction to MySQL 5.6: Oracle’s MySQL Misses the NoSQL Mark

Jonathan Ellis in a post about MySQL 5.6 and how Oracle got the whole NoSQL wrong, considering NoSQL is, in this exact order, about scaling, continuous availability, flexibility, performance, and queryability:

The big news for MySQL 5.6 was the inclusion of “NoSQL” features in the form of a memcached api for get and put operations.

In cases like this, it’s tough to tell whether Oracle got this so wrong deliberately to sow confusion in the market, or because they really think that’s what NoSQL is about.

I know Jonathan Ellis has always had very strong opinions about the technical superiority of Cassandra and Cassandra is indeed a very solid solution, but I’m always reluctant to calling a competitor stupid and using the myopic argument “if I’m good at X and suck at Y, then what everyone is looking for is only X”.

Original title and link: DataStax’s Reaction to MySQL 5.6: Oracle’s MySQL Misses the NoSQL Mark (NoSQL database©myNoSQL)


Reactions to MySQL 5.6: Couchbase

Bob Wiederhold (Couchbase CEO) about MySQL 5.6, their use of the NoSQL term, and the PR message touting the new version as the solution “combining the best of both worlds”:

What we see is a whole new wave of applications that have very different requirements than applications had just a few years ago. More often than not they are cloud-based, need to support a huge and dynamically changing number of users, need to store huge amounts of data, and need a highly flexible data model that allows them to adjust to rapidly changing data capture requirements and process lots of semi-structured and unstructured data. The fundamentally different architectural decisions embedded in NoSQL technologies – along with the easy scalability, consistently high performance, and flexible data model advantages (along with all the other tradeoffs) NoSQL provides – are turning out to be a better fit for an increasing number of these applications.

That doesn’t mean MySQL (or relational databases) will go away or won’t play a significant role in the database industry in the future.

Bob Wiederhold is also interested in how Oracle positions their products in terms of NoSQL:

As a side note it’s curious that the MySQL team seems out of step with other parts of Oracle. While the MySQL team seems to be convinced MySQL can do it all, Oracle’s NoSQL team seems to feel differently and is busily trying to catch up to NoSQL leaders like Couchbase, MongoDB, and Cassandra with their own NoSQL product. If relational technology is a one size fits all technology, why is Oracle itself making such a big investment in developing its own NoSQL product?

My supposition, expressed in the post MySQL 5.6 - What’s new, is that NoSQL is just a critical checkbox on the marketing and sales departments. Oracle NoSQL database and its precursor BerkleyDB seem to silently live inside the giant.

Original title and link: Reactions to MySQL 5.6: Couchbase (NoSQL database©myNoSQL)


MySQL 5.6 - What’s New

I’ve finally had the time to go through the release notes and documentation of the recent release of MySQL 5.6. My first throughts when skimming over the announcement were:

  1. why is online DDL support so low on the list?
  2. why so much of the announcement is about performance?
  3. how is Oracle going to position the Memcached-based access to InnoDB considering their other key-value database Oracle NoSQL database?

Here’s the opening part of the “DBA and Developer Guide to MySQL 5.6:

At a glance, MySQL 5.6 is simply a better MySQL with improvements that enhance every functional area of the database kernel, including:

  • Better Performance and Scalability
    • Improved InnoDB storage engine for better transactional throughput
    • Improved Optimizer for better query execution times and diagnostics
  • Better Application Availability with Online DDL/Schema changes
  • Better Developer Agility with NoSQL Access with Memcached API to InnoDB
  • Improved Replication for high performance, self-healing distributed deployments
  • Improved Performance Schema for better instrumentation
  • Improved Security for worry-free application deployments
  • And other Important Enhancements

Almost half of the document focuses on the performance improvements in the InnoDB. If this is the part that interests you, I strongly encourage you to read the doc as my notes about this part are very short:

  • InnoDB did a lot of improvements in handling threads and locks
  • this will allow MySQL 5.6 to work more efficiently on beefier machines with over 24 cores. The shape of the TPS/CPU threads looks almost linear.
  • the transactional throughput graph shows improvements, but the shape suggests that MySQL 5.6 tops at around 96 concurrent connections
  • SSDs are mentioned but after digging a bit deeper, it’s difficult to say how much of a difference these changes make.

The next section covers online DDL/schema changes. To my surprise, it’s only a paragraph long, while I was expecting more details considering how many complains I’ve heard about this in the past and how advanced PostgreSQL is. There’s indeed another document, “Overview of Online DDL“, that provides more details:

Overview of Online DDL

Basically, starting with this version, many DDL operations do allow concurrent data access, but the many of the operations remain very expensive (some requiring copying all data row by row). Better, but not awesome.

The next section talks about the Memcached-based API for accessing InnoDB data, basically a mechanism offering key-value access that overpasses the SQL layers. I couldn’t find a direct answer to my question “how is Oracle positioning this solution compared to Oracle NoSQL database”. Plus the use of NoSQL term feels weird: “NoSQL access to InnoDB”, “the new NoSQL API for InnoDB”, “NoSQL benchmarking”. I wouldn’t go as far to say that Oracle’s marketing is trying to trivialize the term NoSQL, but it definitely feels like it was one of the top checkboxes that the department had to check.

The last part I was interested into (based on my past experience of completely random and unexplained replication failures) was about replication improvements. I didn’t get much out of this document and I’ll have to read the “MySQL replication: High availability - building a self-healing replication topology whitepaper“:

  • global transaction identifiers: “enable replication transactional integrity to be tracked through a replication master/slave topology”
  • a new set of Python utilities to use global transaction identifiers
  • schema level multi-threaded slave replication
  • new row-based replication
  • new crash-safe slaves: “stores Binlog positional data within tables so slaves can automatically roll back replication to the last committed event before failure, and resume replication without administrator intervention” (nb: this seems to be the issue I’ve seen before when being responsible for a production master-slave x 2 setup).

Technically, MySQL 5.6 seems a solid improvement over the previous version. But Oracle also needs to address the lack of openness concerns raised by Fedora and OpenSUSE communities.

Original title and link: MySQL 5.6 - What’s New (NoSQL database©myNoSQL)

Proposed Fedora 19 Feature: Replace MySQL With MariaDB

Very long discussion on the Fedora mailing list considering and planning the replacement of MySQL with MariaDB:

Recent changes made by Oracle indicate they are moving the MySQL project to be more closed. They are no longer publishing any useful information about security issues (CVEs), and they are not providing complete regression tests any more, and a very large fraction of the mysql bug database is now not public.

From the reply of Andrew Rist (Oracle):

We’ve been following the discussions to replace MySQL with MariaDB in Fedora, and would like to provide additional data to help the community make the most informed decision. Instead of switching**the default to MariaDB 5.5 we would like to propose that Fedora instead integrate MySQL 5.6. Switching to MariaDB would be going backwards, as their releases usually lag by at least 6 months. The differences between MariaDB 5.5 and MySQL 5.6 are quite significant, with major improvements in both performance and stability [1] , as well as additional features and improved security [2].

Another interesting bit mentioned in the thread by Henrique Junior:

OpenSUSE is dumping MySQL in the next release 12.3 […]

I went through the thread twice and I’m not sure which is the conclusion. But it’s starting to look like Oracle’s approach to managing MySQL is not appreciated by some.

Roland Bouman

Original title and link: Proposed Fedora 19 Feature: Replace MySQL With MariaDB (NoSQL database©myNoSQL)


Monty Widenius About NoSQL, Big Data, and Obvioulsy MySQL and MariaDB

The interview Dmitry Sotnikov1 had with Monty Widenius was published on so many places that I had a hard time deciding which to link to. Anyways, there are a couple of comments and corrections that I’d like to suggest:

The whole thing with the “new NoSQL movement” started with a blog post from a Twitter employee that said MySQL was not good enough and they needed “something better,” like Cassandra.

That’s not quite correct. The “NoSQL movement” debuted in 2009 when the guys from organized an event about “open source, distributed, non relational databases” where they invited people from companies like Cloudera, LinkedIn, StumbleUpon, etc. to talk about the solutions they were building to responde to their platforms’ special requirements. But as papers like Bigtable: A distributed storage system for structured data and Dynamo: Amazon’s Highly Available Key-value Store prove, NoSQL solutions have been in production way before 2009.

I can’t find the original article, but I did find a follow up a bit later where it was said MySQL would be dropped for Cassandra.

I can help find that article as it was posted on this blog: Cassandra @ Twitter: An Interview with Ryan King

The main reason Twitter had problems with MySQL back then, was that they were using it incorrectly.

I don’t think there are many examples in the history of software where a private platform benefited from more scaling advice than Twitter. Judging by how many solutions have been suggested, a possible Twitter IPO will be at risk of IP law suites.

The current state is that now, three years later, Twitter is still using MySQL as their main storage for tweets. Cassandra was, in the end, not able to replace MySQL.

That’s true. What’s also true is that at that time Cassandra was at version 0.9 and and that having to invest into a new databases was considered riskier than investing into more hardware and hiring MySQL experts.

The main reason NoSQL became popular is that, in contrast to SQL, you can start using it without having to design anything. This makes it easier to start with NoSQL, but you pay for this later when you find that you don’t have control of your data (if you are not very careful).

I assume that this is how a vendor would present flexible data models as a drawback. It is also one of the most dangerous misconceptions about NoSQL, i.e. NoSQL databases require no data modeling. The reality is that most of the time using a NoSQL database will require a lot more thinking and analysis of the data models and data access patterns. There are no blueprints, no normalized forms, and no ORMs to hide everything away.

As soon as data can’t fit into memory, SQL generally outperforms NoSQL.

Where’s the proof? According to the data I have, there’s no comparison between let’s say Cassandra and MySQL.

For anything else, you have to write a program and it’s very hard to beat a SQL optimizer for complex things, especially things that are automatically generated based on user requests (required for most web sites).

That’s true. Except when:

  1. most of the people don’t know how to write those SQL queries—search StackOverflow for a random sample of what I mean
  2. getting everything out of your database requires using vendor specific solutions
  3. there’re those moments when the optimizer decides to change the execution plan in such a way that brings down your whole service

The problem with Hadoop is that there is no known business model around it that ensures that the investors will get back 10X money that they expect. Because of that, I have a hard time understanding how Cloudera can survive in the long run.


Everything else in the interview is spot on.

  1. Dmitry Sotnikov: COO at Jelastic 

Original title and link: Monty Widenius About NoSQL, Big Data, and Obvioulsy MySQL and MariaDB (NoSQL database©myNoSQL)