NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



mysql: All content tagged as mysql in NoSQL databases and polyglot persistence

Continuent Replication to Hadoop – Now in Stereo!

Hopefully by now you have already seen that we are working on Hadoop replication. I’m happy to say that it is going really well. I’ve managed to push a few terabytes of data and different data sets through into Hadoop on Cloudera, HortonWorks, and Amazon’s Elastic MapReduce (EMR). For those who have been following my long association with the IBM InfoSphere BigInsights Hadoop product, and I’m pleased to say that it’s working there too.

Continuent is the company behing Tungsten connector and replicator products which, in their words:

Continuent Tungsten allows enterprises running business- critical MySQL applications to provide high-availability (HA) and globally reduntant disaster recover (DR) capabilities for cloud-based and private data center installations. Tungsten Replicator provides high performance open source data replication for MySQL and Oracle and is a key part of Continuent Tungsten.

Original title and link: Continuent Replication to Hadoop – Now in Stereo! (NoSQL database©myNoSQL)


Storage technologies at HipChat - CouchDB, ElasticSearch, Redis, RDS

As per the list below, HipChat’s storage solution is based on a couple of different solutions:

  • Hosting: AWS EC2 East with 75 Instance currently all Ubuntu 12.04 LTS
  • Database: CouchDB currently for Chat History, transitioning to ElasticSearch. MySQL-RDS for everything else
  • Caching: Redis
  • Search: ElasticSearch
  1. This post made me wonder what led HipChat team to use CouchDB in the first place. I’m tempted to say that it was the master-master replication and the early integration with Lucene.
  2. This is only the 2nd time in quite a while I’m reading an article mentioning CouchDB — after the February “no-releases-but-we’re-still-merging-BigCouch” report for ASF. And according to the story, CouchDB is on the way out.

Original title and link: Storage technologies at HipChat - CouchDB, ElasticSearch, Redis, RDS (NoSQL database©myNoSQL)


Count Distinct Compared on Top 4 SQL Databases

Performance and query plans for count distinct :

Truly, the gauntlet had been thrown, and we are here to answer. We ran the queries on Postgres 9.3, MySQL 5.6, SQL Server 2012 SE 11.0, and Oracle SE1 11.2.

count distinct performance

Interestingly, but quite expected, the query plans for queries in SQL Server and Oracle were identical. What’s intriguing is how with a more “naïve” query plan, they both outperformed MySQL and PostgreSQL.

Original title and link: Count Distinct Compared on Top 4 SQL Databases (NoSQL database©myNoSQL)


MySQL backup improvements based on Dropbox's recent outage

Dropbox’s service has been affected over the weekend due to a faulty upgrade procedure that created duplicated master/slave MySQL setups:

When running infrastructure at large scale, the standard practice of running multiple slaves provides redundancy. However, should those slaves fail, the only option is to restore from backup. The standard tool used to recover MySQL data from backups is slow when dealing with large data sets.

To speed up our recovery, we developed a tool that parallelizes the replay of binary logs. This enables much faster recovery from large MySQL backups. We plan to open source this tool so others can benefit from what we’ve learned.

  1. A backup and restore strategy that is not continuously tested and timed is of (almost) no value for services that require high availability.
  2. This is a good example of why highly available services are choosing solutions where there are no special nodes.

Original title and link: MySQL backup improvements based on Dropbox’s recent outage (NoSQL database©myNoSQL)


MySQL is a great Open Source project. How about open source NoSQL databases?

In a post titled Some myths on Open Source, the way I see it, Anders Karlsson writes about MySQL:

As far as code, adoption and reaching out to create an SQL-based RDBMS that anyone can afford, MySQL / MariaDB has been immensely successful. But as an Open Source project, something being developed together with the community where everyone work on their end with their skills to create a great combined piece of work, MySQL has failed. This is sad, but on the other hand I’m not so sure that it would have as much influence and as wide adoption if the project would have been a “clean” Open Source project.

The article offers a very black-and-white perspective on open source versus commercial code. But that’s not why I’m linking to it.

The above paragraph made me think about how many of the most popular open source NoSQL databases would die without the companies (or people) that created them.

Here’s my list: MongoDB, Riak, Neo4j, Redis, Couchbase, etc. And I could continue for quite a while considering how many there are out there: RavenDB, RethinkDB, Voldemort, Tokyo, Titan.

Actually if you reverse the question, the list would get extremely short: Cassandra, CouchDB (still struggling though), HBase. All these were at some point driven by community. Probably the only special case could be LevelDB.

✚ As a follow up to Anders Karlsson post, Robert Hodges posted The Scale-Out Blog: Why I Love Open Source.

Original title and link: MySQL is a great Open Source project. How about open source NoSQL databases? (NoSQL database©myNoSQL)


Maybe Oracle isn't the MySQL villain so many people think

Matt Asay digs a bit under the quite widely spread not really confirmed gut feelings that Oracle is screwing MySQL:

In sum, I suspect most MySQL users today are grateful for the Oracle’s contributions to MySQL. Its backtracking on core community best practices are regrettable but understandable, in light of the company’s security policies. Arguably, these should be revisited so that MySQL can benefit from Oracle’s technical leadership while giving the MySQL community the unfettered access to information that will increase its trust in Oracle’s technical leadership.

With the risk of saying “I’ve told you”, I’ve always said that Oracle has no interest in killing MySQL. Oracle didn’t kill BerkleyDB and they didn’t kill InnoDB while MySQL was still independent or under Sun. Killing it right now when it’s bringing potential customers into the door makes no sense.

The fact that Oracle’s policies and management practices are not community friendly is a different matter. But I’d bet that digging deeper into these would reveal that other companies that are perceived as open and community friendly are not very different.

Original title and link: Maybe Oracle isn’t the MySQL villain so many people think (NoSQL database©myNoSQL)


Level up your MySQL query tuning

Great article by Alexander Rubin that goes through a brief intro of B-trees (the basis of many databases indexes) and gets into the details of understanding and optimizing MySQL queries:

In this article we will talk about query optimization with the focus on the queries with GROUP BY and ORDER BY. We will start with the basic concepts (Indexes, Explain plan in MySQL, etc) and then will talk about the advanced query optimization techniques. We will cover “loose index scan” and “tight index scan” optimizations and show the benchmark results.

Original title and link: Level up your MySQL query tuning (NoSQL database©myNoSQL)


From MySQL to MongoDB and back - The world’s biggest biometrics database

The main subject of the article “Inside India’s Aadhar, The World’s Biggest Biometrics Database” published on TechCrunch is about possible information leaks, privacy issues, etc.. But I have found some interesting bits about the databases used towards its end:

Sudhir Narayana, assistant director general at Aadhar’s technology center, told me that MongoDB was among several database products, apart from MySQL, Hadoop and HBase, originally procured for running the database search. Unlike MySQL, which could only store demographic data, MongoDB was able to store pictures.

That’s the warning sign right there. You can already see what follows:

However, Aadhar has been slowly shifting most of its database related work to MySQL, after realizing that MongoDB was not being able to cope with massive chunks of data, millions of packets.

✚ You can see more details about Aadhaar’s complex database architecture in Big Data at Aadhaar With Hadoop, HBase, MongoDB, MySQL, and Solr

Original title and link: From MySQL to MongoDB and back - The world’s biggest biometrics database (NoSQL database©myNoSQL)


MySQL slow query collection sources

Morgan Tocker:

The other day it struck me that MySQL applications have no fewer than four sources to be able to collect potentially slow queries for analysis, and that I actually find myself using 3/4 methods available.

The source listed:

  1. application logging/monitoring
  2. performance schema
  3. slow query log file
  4. slow query log table — I didn’t know about this one.

For the details of each of these you must read the post.

Original title and link: MySQL slow query collection sources (NoSQL database©myNoSQL)


Dropbox: Challenges in mirroring large MySQL systems to HBase

A presentation by Todd Eisenberger about the archival system used by Dropbox based on MySQL and HBase:

MySQL benefits:

  • fast queries for known keys over a (relatively) small dataset
  • high read throughput

HBase benetits:

  • high write throughput
  • large suite of pre-existing tools for distributed computation
  • easier to perform large processing tasks

✚ Both are consistent

✚ Most of the benefits in HBase’s section point in the direction of data processing benefits (and not data storage benefits)

The SSL performance overhead in MongoDB and MySQL

How to use MongoDB with SSL:

As you can see the SSL overhead is clearly visible being about 0.05ms slower than a plain connection. The median for the inserts with SSL is 0.28ms. Plain connections have a median at around 0.23ms. So there is a performance loss of about 25%. These are all just rough numbers. Your mileage may vary.

Then 2 posts on “MySQL Performance Blog“: SSL Performance Overhead in MySQL and MySQL encryption performance, revisited:

Some of you may recall my security webinar from back in mid-August; one of the follow-up questions that I was asked was about the performance impact of enabling SSL connections. My answer was 25%, based on some 2011 data that I had seen over on yaSSL’s website, but I included the caveat that it is workload-dependent, because the most expensive part of using SSL is establishing the connection.

These 2 articles are diving much deeper and more scientifically into the impact of using SSL with MySQL. The results are interesting and the recommendations are well worth spending the time reading them.

Original title and link: The SSL performance overhead in MongoDB and MySQL (NoSQL database©myNoSQL)

MySQL Error: Too many connections

Just a strange number from MySQL:

By default 151 is the maximum permitted number of simultaneous client connections in MySQL 5.5

And then the issue related to it:

MySQL uses one thread per client connection and many active threads are performance killer. Usually a high number of concurrent connections executing queries in parallel can cause significant slowdown and increase chances for deadlocks. Prior to MySQL 5.5, it doesn’t scale well although MySQL is getting better and better since then — but still if you have hundreds of active connections doing real work (this doesn’t count sleeping [idle] connections) then the memory usage will grow. Each connection requires per thread buffers. Also implicit in memory tables require more memory plus memory requirement for global buffers. On top of that, tmp_table_size/max_heap_table_size that each connection may use, although they are not allocated immediately per new connection.

Facebook has been doing a lot of work in this area. Just one example: Making MySQL accept connections faster; I strongly encourage you to read if you have a busy MySQL server.

Original title and link: MySQL Error: Too many connections (NoSQL database©myNoSQL)