mysql: All content tagged as mysql in NoSQL databases and polyglot persistence
I’m almost always enjoying the lessons learned-style presentations from Twitter’s people. The slides below, by Jimmy Lin and Dmitriy Ryaboy, have been used at HadoopSummit. Besides the technical and practical details, there are two things that I really like:
DJ Patil: “It’s impossible to overstress this: 80% of the work in any data project is in cleaning the data”
and then the reality check:
- Your boss says something vague
- You think very hard on how to move the needle
- Where’s the data?
- What’s in this dataset?
- What’s all the f#$#$ crap in the data?
- Clean the data
- Run some off-the-shelf data mining algorithm
- Productionize, act on the insight
- Rinse, repeat
10gen has never been shy about their plan: replacing MySQL. That’s a bold goal considering Oracle is now behind MySQL. But this could also make things a bit easier for 10gen.
Anyways, what made me write this separate post is the realization of how close 10gen is following the MySQL path:
- release early and incomplete. Enhance over time
- position the product as the developer friendly and fast
- introduce an enterprise edition once your adoption overpassed that of your immediate competitors.
I guess I already know how it’ll end: $2 billion acquisition from a company that gets acquired by Oracle.
While the official announcement of MongoDB 2.4 version mentioned just in passing the “MongoDB Enterprise” version, other websites didn’t leave this aspect aside. Actually it’s what got emphasized about the today’s announcement. In case you wonder what’s the the 10gen’s enterprise box: Kerberos-based security and an on-premise version of the MongoDB Monitoring Service.
The only question I have now is how soon Oracle will start looking into acquiring 10gen. Or how soon it will dedicate marketing and sales resources to directly address 10gen.
Original title and link: 10gen’s MongoDB Following the Steps of MySQL ( ©myNoSQL)
A 25 page whitepaper published by Oracle describing a set of best practices for MySQL deployments to accommodate scenarios from small to very large acccording to the following criteria:
- concurrent read users
- concurrent write users
- database sizes for 4 types of use cases (sessions, eCommerce, analytics, content management)
Downloading the paper requires registration, but it’s worth reading and thinking about the suggested architectures (even if in a few spots it pushes for the commercials tools offered by Oracle).
Original title and link: MySQL Reference Architectures for Massively Scalable Web Infrastructure ( ©myNoSQL)
I’ve finally had the time to go through the release notes and documentation of the recent release of MySQL 5.6. My first throughts when skimming over the announcement were:
- why is online DDL support so low on the list?
- why so much of the announcement is about performance?
- how is Oracle going to position the Memcached-based access to InnoDB considering their other key-value database Oracle NoSQL database?
Here’s the opening part of the “DBA and Developer Guide to MySQL 5.6:
At a glance, MySQL 5.6 is simply a better MySQL with improvements that enhance every functional area of the database kernel, including:
- Better Performance and Scalability
- Improved InnoDB storage engine for better transactional throughput
- Improved Optimizer for better query execution times and diagnostics
- Better Application Availability with Online DDL/Schema changes
- Better Developer Agility with NoSQL Access with Memcached API to InnoDB
- Improved Replication for high performance, self-healing distributed deployments
- Improved Performance Schema for better instrumentation
- Improved Security for worry-free application deployments
- And other Important Enhancements
Almost half of the document focuses on the performance improvements in the InnoDB. If this is the part that interests you, I strongly encourage you to read the doc as my notes about this part are very short:
- InnoDB did a lot of improvements in handling threads and locks
- this will allow MySQL 5.6 to work more efficiently on beefier machines with over 24 cores. The shape of the TPS/CPU threads looks almost linear.
- the transactional throughput graph shows improvements, but the shape suggests that MySQL 5.6 tops at around 96 concurrent connections
- SSDs are mentioned but after digging a bit deeper, it’s difficult to say how much of a difference these changes make.
The next section covers online DDL/schema changes. To my surprise, it’s only a paragraph long, while I was expecting more details considering how many complains I’ve heard about this in the past and how advanced PostgreSQL is. There’s indeed another document, “Overview of Online DDL“, that provides more details:
Basically, starting with this version, many DDL operations do allow concurrent data access, but the many of the operations remain very expensive (some requiring copying all data row by row). Better, but not awesome.
The next section talks about the Memcached-based API for accessing InnoDB data, basically a mechanism offering key-value access that overpasses the SQL layers. I couldn’t find a direct answer to my question “how is Oracle positioning this solution compared to Oracle NoSQL database”. Plus the use of NoSQL term feels weird: “NoSQL access to InnoDB”, “the new NoSQL API for InnoDB”, “NoSQL benchmarking”. I wouldn’t go as far to say that Oracle’s marketing is trying to trivialize the term NoSQL, but it definitely feels like it was one of the top checkboxes that the department had to check.
The last part I was interested into (based on my past experience of completely random and unexplained replication failures) was about replication improvements. I didn’t get much out of this document and I’ll have to read the “MySQL replication: High availability - building a self-healing replication topology whitepaper“:
- global transaction identifiers: “enable replication transactional integrity to be tracked through a replication master/slave topology”
- a new set of Python utilities to use global transaction identifiers
- schema level multi-threaded slave replication
- new row-based replication
- new crash-safe slaves: “stores Binlog positional data within tables so slaves can automatically roll back replication to the last committed event before failure, and resume replication without administrator intervention” (nb: this seems to be the issue I’ve seen before when being responsible for a production master-slave x 2 setup).
Technically, MySQL 5.6 seems a solid improvement over the previous version. But Oracle also needs to address the lack of openness concerns raised by Fedora and OpenSUSE communities.
Original title and link: MySQL 5.6 - What’s New ( ©myNoSQL)