MySQL: All content tagged as MySQL in NoSQL databases and polyglot persistence
Tuesday, 19 February 2013
MySQL 5.6 vs. MariaDB 10.0
A post on SkySQL blog comparing the recently released version of MySQL with MariaDB:
✚ MariaDB 10.0.0 is still alpha with some of the features still under development.
✚ OpenSUSE and Fedora plan to replace MySQL with MariaDB in their corresponding distributions, but that’s not because of the technical capabilities of MySQL.
Original title and link: MySQL 5.6 vs. MariaDB 10.0 (©myNoSQL)
via: http://www.skysql.com/blogs/max-mether/mysql-56-vs-mariadb-100
Wednesday, 13 February 2013
NoSQL on MySQL: Stating the Obvious
Matthew Aslett about Couchbase’s and DataStax’s reactions to Oracle’s announcement of MySQL support of NoSQL API:
Sure, Couchbase and DataStax laid it on a bit thick, but these are corporate blog posts – it goes with the territory.
I’ve already linked and commented about these: Couchbase’s reaction and DataStax’s reaction. What I didn’t know—more accurately I should probably write “I hoped”—is that this sort of reactions come with the “corporate” badge. But I’ll keep my hope considering the exhaustive list of reactions from other NoSQL companies.
Original title and link: NoSQL on MySQL: Stating the Obvious (©myNoSQL)
DataStax's Reaction to MySQL 5.6: Oracle’s MySQL Misses the NoSQL Mark
Jonathan Ellis in a post about MySQL 5.6 and how Oracle got the whole NoSQL wrong, considering NoSQL is, in this exact order, about scaling, continuous availability, flexibility, performance, and queryability:
The big news for MySQL 5.6 was the inclusion of “NoSQL” features in the form of a memcached api for get and put operations.
In cases like this, it’s tough to tell whether Oracle got this so wrong deliberately to sow confusion in the market, or because they really think that’s what NoSQL is about.
I know Jonathan Ellis has always had very strong opinions about the technical superiority of Cassandra and Cassandra is indeed a very solid solution, but I’m always reluctant to calling a competitor stupid and using the myopic argument “if I’m good at X and suck at Y, then what everyone is looking for is only X”.
Original title and link: DataStax’s Reaction to MySQL 5.6: Oracle’s MySQL Misses the NoSQL Mark (©myNoSQL)
via: http://www.datastax.com/dev/blog/oracles-mysql-misses-the-nosql-mark
Reactions to MySQL 5.6: Couchbase
Bob Wiederhold (Couchbase CEO) about MySQL 5.6, their use of the NoSQL term, and the PR message touting the new version as the solution “combining the best of both worlds”:
What we see is a whole new wave of applications that have very different requirements than applications had just a few years ago. More often than not they are cloud-based, need to support a huge and dynamically changing number of users, need to store huge amounts of data, and need a highly flexible data model that allows them to adjust to rapidly changing data capture requirements and process lots of semi-structured and unstructured data. The fundamentally different architectural decisions embedded in NoSQL technologies – along with the easy scalability, consistently high performance, and flexible data model advantages (along with all the other tradeoffs) NoSQL provides – are turning out to be a better fit for an increasing number of these applications.
That doesn’t mean MySQL (or relational databases) will go away or won’t play a significant role in the database industry in the future.
Bob Wiederhold is also interested in how Oracle positions their products in terms of NoSQL:
As a side note it’s curious that the MySQL team seems out of step with other parts of Oracle. While the MySQL team seems to be convinced MySQL can do it all, Oracle’s NoSQL team seems to feel differently and is busily trying to catch up to NoSQL leaders like Couchbase, MongoDB, and Cassandra with their own NoSQL product. If relational technology is a one size fits all technology, why is Oracle itself making such a big investment in developing its own NoSQL product?
My supposition, expressed in the post MySQL 5.6 - What’s new, is that NoSQL is just a critical checkbox on the marketing and sales departments. Oracle NoSQL database and its precursor BerkleyDB seem to silently live inside the giant.
Original title and link: Reactions to MySQL 5.6: Couchbase (©myNoSQL)
via: http://blog.couchbase.com/why-mysql-56-no-real-threat-nosql
MySQL 5.6 - What’s New
I’ve finally had the time to go through the release notes and documentation of the recent release of MySQL 5.6. My first throughts when skimming over the announcement were:
- why is online DDL support so low on the list?
- why so much of the announcement is about performance?
- how is Oracle going to position the Memcached-based access to InnoDB considering their other key-value database Oracle NoSQL database?
Here’s the opening part of the “DBA and Developer Guide to MySQL 5.6:
At a glance, MySQL 5.6 is simply a better MySQL with improvements that enhance every functional area of the database kernel, including:
- Better Performance and Scalability
- Improved InnoDB storage engine for better transactional throughput
- Improved Optimizer for better query execution times and diagnostics
- Better Application Availability with Online DDL/Schema changes
- Better Developer Agility with NoSQL Access with Memcached API to InnoDB
- Improved Replication for high performance, self-healing distributed deployments
- Improved Performance Schema for better instrumentation
- Improved Security for worry-free application deployments
- And other Important Enhancements
Almost half of the document focuses on the performance improvements in the InnoDB. If this is the part that interests you, I strongly encourage you to read the doc as my notes about this part are very short:
- InnoDB did a lot of improvements in handling threads and locks
- this will allow MySQL 5.6 to work more efficiently on beefier machines with over 24 cores. The shape of the TPS/CPU threads looks almost linear.
- the transactional throughput graph shows improvements, but the shape suggests that MySQL 5.6 tops at around 96 concurrent connections
- SSDs are mentioned but after digging a bit deeper, it’s difficult to say how much of a difference these changes make.
The next section covers online DDL/schema changes. To my surprise, it’s only a paragraph long, while I was expecting more details considering how many complains I’ve heard about this in the past and how advanced PostgreSQL is. There’s indeed another document, “Overview of Online DDL“, that provides more details:
Basically, starting with this version, many DDL operations do allow concurrent data access, but the many of the operations remain very expensive (some requiring copying all data row by row). Better, but not awesome.
The next section talks about the Memcached-based API for accessing InnoDB data, basically a mechanism offering key-value access that overpasses the SQL layers. I couldn’t find a direct answer to my question “how is Oracle positioning this solution compared to Oracle NoSQL database”. Plus the use of NoSQL term feels weird: “NoSQL access to InnoDB”, “the new NoSQL API for InnoDB”, “NoSQL benchmarking”. I wouldn’t go as far to say that Oracle’s marketing is trying to trivialize the term NoSQL, but it definitely feels like it was one of the top checkboxes that the department had to check.
The last part I was interested into (based on my past experience of completely random and unexplained replication failures) was about replication improvements. I didn’t get much out of this document and I’ll have to read the “MySQL replication: High availability - building a self-healing replication topology whitepaper“:
- global transaction identifiers: “enable replication transactional integrity to be tracked through a replication master/slave topology”
- a new set of Python utilities to use global transaction identifiers
- schema level multi-threaded slave replication
- new row-based replication
- new crash-safe slaves: “stores Binlog positional data within tables so slaves can automatically roll back replication to the last committed event before failure, and resume replication without administrator intervention” (nb: this seems to be the issue I’ve seen before when being responsible for a production master-slave x 2 setup).
Technically, MySQL 5.6 seems a solid improvement over the previous version. But Oracle also needs to address the lack of openness concerns raised by Fedora and OpenSUSE communities.
Original title and link: MySQL 5.6 - What’s New (©myNoSQL)
Monday, 4 February 2013
Proposed Fedora 19 Feature: Replace MySQL With MariaDB
Very long discussion on the Fedora mailing list considering and planning the replacement of MySQL with MariaDB:
Recent changes made by Oracle indicate they are moving the MySQL project to be more closed. They are no longer publishing any useful information about security issues (CVEs), and they are not providing complete regression tests any more, and a very large fraction of the mysql bug database is now not public.
From the reply of Andrew Rist (Oracle):
We’ve been following the discussions to replace MySQL with MariaDB in Fedora, and would like to provide additional data to help the community make the most informed decision. Instead of switching**the default to MariaDB 5.5 we would like to propose that Fedora instead integrate MySQL 5.6. Switching to MariaDB would be going backwards, as their releases usually lag by at least 6 months. The differences between MariaDB 5.5 and MySQL 5.6 are quite significant, with major improvements in both performance and stability [1] , as well as additional features and improved security [2].
Another interesting bit mentioned in the thread by Henrique Junior:
OpenSUSE is dumping MySQL in the next release 12.3 […]
I went through the thread twice and I’m not sure which is the conclusion. But it’s starting to look like Oracle’s approach to managing MySQL is not appreciated by some.
Original title and link: Proposed Fedora 19 Feature: Replace MySQL With MariaDB (©myNoSQL)
Monday, 28 January 2013
Monty Widenius About NoSQL, Big Data, and Obvioulsy MySQL and MariaDB
The interview Dmitry Sotnikov1 had with Monty Widenius was published on so many places that I had a hard time deciding which to link to. Anyways, there are a couple of comments and corrections that I’d like to suggest:
The whole thing with the “new NoSQL movement” started with a blog post from a Twitter employee that said MySQL was not good enough and they needed “something better,” like Cassandra.
That’s not quite correct. The “NoSQL movement” debuted in 2009 when the guys from Last.fm organized an event about “open source, distributed, non relational databases” where they invited people from companies like Cloudera, LinkedIn, StumbleUpon, etc. to talk about the solutions they were building to responde to their platforms’ special requirements. But as papers like Bigtable: A distributed storage system for structured data and Dynamo: Amazon’s Highly Available Key-value Store prove, NoSQL solutions have been in production way before 2009.
I can’t find the original article, but I did find a follow up a bit later where it was said MySQL would be dropped for Cassandra.
I can help find that article as it was posted on this blog: Cassandra @ Twitter: An Interview with Ryan King
The main reason Twitter had problems with MySQL back then, was that they were using it incorrectly.
I don’t think there are many examples in the history of software where a private platform benefited from more scaling advice than Twitter. Judging by how many solutions have been suggested, a possible Twitter IPO will be at risk of IP law suites.
The current state is that now, three years later, Twitter is still using MySQL as their main storage for tweets. Cassandra was, in the end, not able to replace MySQL.
That’s true. What’s also true is that at that time Cassandra was at version 0.9 and and that having to invest into a new databases was considered riskier than investing into more hardware and hiring MySQL experts.
The main reason NoSQL became popular is that, in contrast to SQL, you can start using it without having to design anything. This makes it easier to start with NoSQL, but you pay for this later when you find that you don’t have control of your data (if you are not very careful).
I assume that this is how a vendor would present flexible data models as a drawback. It is also one of the most dangerous misconceptions about NoSQL, i.e. NoSQL databases require no data modeling. The reality is that most of the time using a NoSQL database will require a lot more thinking and analysis of the data models and data access patterns. There are no blueprints, no normalized forms, and no ORMs to hide everything away.
As soon as data can’t fit into memory, SQL generally outperforms NoSQL.
Where’s the proof? According to the data I have, there’s no comparison between let’s say Cassandra and MySQL.
For anything else, you have to write a program and it’s very hard to beat a SQL optimizer for complex things, especially things that are automatically generated based on user requests (required for most web sites).
That’s true. Except when:
- most of the people don’t know how to write those SQL queries—search StackOverflow for a random sample of what I mean
- getting everything out of your database requires using vendor specific solutions
- there’re those moments when the optimizer decides to change the execution plan in such a way that brings down your whole service
The problem with Hadoop is that there is no known business model around it that ensures that the investors will get back 10X money that they expect. Because of that, I have a hard time understanding how Cloudera can survive in the long run.
???
Everything else in the interview is spot on.
-
Dmitry Sotnikov: COO at Jelastic ↩
Original title and link: Monty Widenius About NoSQL, Big Data, and Obvioulsy MySQL and MariaDB (©myNoSQL)
via: http://blog.jelastic.com/2013/01/21/are-nosql-and-big-data-just-hype/
Tuesday, 22 January 2013
Automating MySQL Backups at Facebook Scale
Eric Barrett (Facebook) describes the process used for backing up Facebook’s MySQL cluster1:
Backups are not the most glamorous type of engineering. They are technical, repetitive, and when everything works, nobody notices. They are also cross-discipline, requiring systems, network, and software expertise from multiple teams. But ensuring your memories and connections are safe is incredibly important, and at the end of the day, incredibly rewarding.
If you’d want to make it sound simple, just enumerate the steps:
- Binary logs and
mysqldump - Hadoop DFS
- Long-term storage
Then start asking how you’d accomplish this. With 1 server. With more servers. With more servers while maintaining the availability of the system. See how far you’d be able to answer these questions. At least theoretically.
-
As a side note, in Fun with numbers: How much data is Facebook ingesting, I’ve guestimated the number of MySQL servers in the 20k range. This post mentions: “thousands of database servers in multiple regions”. ↩
Original title and link: Automating MySQL Backups at Facebook Scale (©myNoSQL)
Friday, 11 January 2013
MySQL Delayed Replication - Making a Slave Deliberately Lag Behind a Master
Tony Darnell explains in which use cases and how to configure delayed replication, a feature available in MySQL 5.6:
- Scenario #1 – To protect against user mistakes on the master. A DBA can roll back a delayed slave to the time just before the disaster.
- Scenario #2 – To test how the system behaves when there is a lag. For example, in an application, a lag might be caused by a heavy load on the slave. However, it can be difficult to generate this load level. Delayed replication can simulate the lag without having to simulate the load. It can also be used to debug conditions related to a lagging slave.
- Scenario #3 – To inspect what the database looked like long ago, without having to reload a backup. For example, if the delay is one week and the DBA needs to see what the database looked like before the last few days’ worth of development, the delayed slave can be inspected.
The first time I’ve heard about intentional delayed replication was a couple of months ago from an ex-DBA guy. My first thought was: “are you kidding me? Everyone in the databases world tries to make the replication as fast as possible and you want delays???”. After a few seconds of what probably looked to be stupid silence, it clicked. I realized there could be use cases of this weird feature. The guy also taught me about similar scenarios as the ones above.
Original title and link: MySQL Delayed Replication - Making a Slave Deliberately Lag Behind a Master (©myNoSQL)
Wednesday, 9 January 2013
Redis Mass Data Import: MySQL to Redis in One Step
Derek Watson:
In moving a relatively large table from MySQL to Redis, you may find that extracting, transforming and loading a row at a time can be excruciatingly slow. Here’s a quick trick you can use that pipes the output of the mysql command directly to redis-cli, bypassing middleware and allowing both data stores to operate at their peak speed.
Nice trick. Which by the way was documented on the Redis site. With some more work (worth for larger data sets), you could actually generate a Redis RDB file directly.
Original title and link: Redis Mass Data Import: MySQL to Redis in One Step (©myNoSQL)
via: http://dcw.ca/blog/2013/01/02/mysql-to-redis-in-one-step/
Monday, 17 December 2012
Three Analyst Predictions for 2013: Hadoop, SAP, and MySQL vs NoSQL
The season of predictions is here. Chris Kanaracus in an all-bold post, quoting analysts:
Jon Reed: “Expect SAP to purchase an up-and-coming “big data” product or vendor, and perhaps several, including at least one that specializes in integration with the Hadoop framework for large-scale data processing”.
I’m still scratching my head to come up with the long list of product or vendors specialized in integration of Hadoop that SAP could acquire.
Curt Monash: “Expect plenty of additional adoption for Hadoop. Everybody has the ‘big bit bucket’ use case, largely because of machine-generated data. Even today’s technology is plenty good enough for that purpose, and hence justifies initial Hadoop adoption.”
True.
What I hope to see happening is that besides the companies putting together the building blocks to make Hadoop friendly enough (real work) and the companies claiming integration with Hadoop (not that fantastic work), there’ll be some companies that take the Hadoop stack and built tools whose immediate impact on the business can be measured. Basically vertical solutions applying the Hadoop stack to specific markets, segments, and scenarios.
The main challenge of “Big Data” these days is not that there isn’t value behind it. It’s the measurability of this value. What each company looking into Big Data tries to answer is what value does big data carry for my case? This is a founded question as not every company has an infinite budget, time, and magic resource pool.
Curt Monash: “Usually when the topic of alternative databases comes up, the incumbent is often Oracle or IBM DB2. But in 2013, MySQL could be playing the latter role. NoSQL and NewSQL products often are developed as MySQL alternatives.
Until now NoSQL companies have understood that the competition is not with each. The huge market that relational databases have it covered has enough potential to welcome a few solid NoSQL solutions and there’s no long term need to fight over the few people that already paid attention to them.
Make your bets.
Original title and link: Three Analyst Predictions for 2013: Hadoop, SAP, and MySQL vs NoSQL (©myNoSQL)
Monday, 1 October 2012
Diff Tools for MySQL Configurations
From Webyog blog two tools to compare and diff both the static and runtime configurations of MySQL servers:
pt-config-diffpart of the Percona toolkit (free)- MONyog visual diff and configuration tracker (commercial)
Original title and link: Diff Tools for MySQL Configurations (©myNoSQL)
via: http://www.webyog.com/blog/2012/09/26/monitoring-your-mysql-configuration/
Most Popular Articles
- Translate SQL to MongoDB MapReduce
- Tutorial: Getting Started With Cassandra
- CouchDB vs MongoDB: An attempt for a More Informed Comparison
- Cassandra @ Twitter: An Interview with Ryan King
- A Couple of Nice GUI Tools for MongoDB
- NoSQL benchmarks and performance evaluations
- Ehcache: Distributed Cache or NoSQL Store?
- Document Databases Compared: CouchDB, MongoDB, RavenDB
- Quick Review of Existing Graph Databases
- NoSQL Data Modeling

