NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



MySQL: All content tagged as MySQL in NoSQL databases and polyglot persistence

You want NoSQL? I’ll give you memcached

Tony Darnell in Use MySQL to store NoSQL and SQL data in the same database using memcached and InnoDB | Scripting MySQL:

With MySQL version 5.6 (and above), you have the ability to store and retrieve NoSQL data, using NoSQL commands, while keeping the data inside a MySQL InnoDB database. So, you can use NoSQL and SQL at the same time, on the same data, stored in the same database. And the beauty is that it takes just a few minutes to setup. This post will provide you with a quick lesson on how to setup NoSQL on a MySQL InnoDb database.

I see this trivialization of the term NoSQL quite frequently in the communications signed by Oracle: “Oh, you want NoSQL? Take memcached. Now shut up!” This is quite disrespectful to their customers and the developer community in general.

Original title and link: You want NoSQL? I’ll give you memcached (NoSQL database©myNoSQL)

MySQL automation at Facebook

An old, but nonetheless very interesting article from Facebook on the tools they’ve built to automate the management of their MySQL cluster — most probably one of the largest in operation:

MPS is a sophisticated state machine written mostly in Python. It replaces a DBA for many routine tasks and enables us to perform maintenance operations in bulk with little or no human intervention.

Original title and link: MySQL automation at Facebook (NoSQL database©myNoSQL)


Continuent Replication to Hadoop – Now in Stereo!

Hopefully by now you have already seen that we are working on Hadoop replication. I’m happy to say that it is going really well. I’ve managed to push a few terabytes of data and different data sets through into Hadoop on Cloudera, HortonWorks, and Amazon’s Elastic MapReduce (EMR). For those who have been following my long association with the IBM InfoSphere BigInsights Hadoop product, and I’m pleased to say that it’s working there too.

Continuent is the company behing Tungsten connector and replicator products which, in their words:

Continuent Tungsten allows enterprises running business- critical MySQL applications to provide high-availability (HA) and globally reduntant disaster recover (DR) capabilities for cloud-based and private data center installations. Tungsten Replicator provides high performance open source data replication for MySQL and Oracle and is a key part of Continuent Tungsten.

Original title and link: Continuent Replication to Hadoop – Now in Stereo! (NoSQL database©myNoSQL)


Storage technologies at HipChat - CouchDB, ElasticSearch, Redis, RDS

As per the list below, HipChat’s storage solution is based on a couple of different solutions:

  • Hosting: AWS EC2 East with 75 Instance currently all Ubuntu 12.04 LTS
  • Database: CouchDB currently for Chat History, transitioning to ElasticSearch. MySQL-RDS for everything else
  • Caching: Redis
  • Search: ElasticSearch
  1. This post made me wonder what led HipChat team to use CouchDB in the first place. I’m tempted to say that it was the master-master replication and the early integration with Lucene.
  2. This is only the 2nd time in quite a while I’m reading an article mentioning CouchDB — after the February “no-releases-but-we’re-still-merging-BigCouch” report for ASF. And according to the story, CouchDB is on the way out.

Original title and link: Storage technologies at HipChat - CouchDB, ElasticSearch, Redis, RDS (NoSQL database©myNoSQL)


Count Distinct Compared on Top 4 SQL Databases

Performance and query plans for count distinct :

Truly, the gauntlet had been thrown, and we are here to answer. We ran the queries on Postgres 9.3, MySQL 5.6, SQL Server 2012 SE 11.0, and Oracle SE1 11.2.

count distinct performance

Interestingly, but quite expected, the query plans for queries in SQL Server and Oracle were identical. What’s intriguing is how with a more “naïve” query plan, they both outperformed MySQL and PostgreSQL.

Original title and link: Count Distinct Compared on Top 4 SQL Databases (NoSQL database©myNoSQL)


MySQL backup improvements based on Dropbox's recent outage

Dropbox’s service has been affected over the weekend due to a faulty upgrade procedure that created duplicated master/slave MySQL setups:

When running infrastructure at large scale, the standard practice of running multiple slaves provides redundancy. However, should those slaves fail, the only option is to restore from backup. The standard tool used to recover MySQL data from backups is slow when dealing with large data sets.

To speed up our recovery, we developed a tool that parallelizes the replay of binary logs. This enables much faster recovery from large MySQL backups. We plan to open source this tool so others can benefit from what we’ve learned.

  1. A backup and restore strategy that is not continuously tested and timed is of (almost) no value for services that require high availability.
  2. This is a good example of why highly available services are choosing solutions where there are no special nodes.

Original title and link: MySQL backup improvements based on Dropbox’s recent outage (NoSQL database©myNoSQL)


MySQL is a great Open Source project. How about open source NoSQL databases?

In a post titled Some myths on Open Source, the way I see it, Anders Karlsson writes about MySQL:

As far as code, adoption and reaching out to create an SQL-based RDBMS that anyone can afford, MySQL / MariaDB has been immensely successful. But as an Open Source project, something being developed together with the community where everyone work on their end with their skills to create a great combined piece of work, MySQL has failed. This is sad, but on the other hand I’m not so sure that it would have as much influence and as wide adoption if the project would have been a “clean” Open Source project.

The article offers a very black-and-white perspective on open source versus commercial code. But that’s not why I’m linking to it.

The above paragraph made me think about how many of the most popular open source NoSQL databases would die without the companies (or people) that created them.

Here’s my list: MongoDB, Riak, Neo4j, Redis, Couchbase, etc. And I could continue for quite a while considering how many there are out there: RavenDB, RethinkDB, Voldemort, Tokyo, Titan.

Actually if you reverse the question, the list would get extremely short: Cassandra, CouchDB (still struggling though), HBase. All these were at some point driven by community. Probably the only special case could be LevelDB.

✚ As a follow up to Anders Karlsson post, Robert Hodges posted The Scale-Out Blog: Why I Love Open Source.

Original title and link: MySQL is a great Open Source project. How about open source NoSQL databases? (NoSQL database©myNoSQL)


Maybe Oracle isn't the MySQL villain so many people think

Matt Asay digs a bit under the quite widely spread not really confirmed gut feelings that Oracle is screwing MySQL:

In sum, I suspect most MySQL users today are grateful for the Oracle’s contributions to MySQL. Its backtracking on core community best practices are regrettable but understandable, in light of the company’s security policies. Arguably, these should be revisited so that MySQL can benefit from Oracle’s technical leadership while giving the MySQL community the unfettered access to information that will increase its trust in Oracle’s technical leadership.

With the risk of saying “I’ve told you”, I’ve always said that Oracle has no interest in killing MySQL. Oracle didn’t kill BerkleyDB and they didn’t kill InnoDB while MySQL was still independent or under Sun. Killing it right now when it’s bringing potential customers into the door makes no sense.

The fact that Oracle’s policies and management practices are not community friendly is a different matter. But I’d bet that digging deeper into these would reveal that other companies that are perceived as open and community friendly are not very different.

Original title and link: Maybe Oracle isn’t the MySQL villain so many people think (NoSQL database©myNoSQL)


Level up your MySQL query tuning

Great article by Alexander Rubin that goes through a brief intro of B-trees (the basis of many databases indexes) and gets into the details of understanding and optimizing MySQL queries:

In this article we will talk about query optimization with the focus on the queries with GROUP BY and ORDER BY. We will start with the basic concepts (Indexes, Explain plan in MySQL, etc) and then will talk about the advanced query optimization techniques. We will cover “loose index scan” and “tight index scan” optimizations and show the benchmark results.

Original title and link: Level up your MySQL query tuning (NoSQL database©myNoSQL)


From MySQL to MongoDB and back - The world’s biggest biometrics database

The main subject of the article “Inside India’s Aadhar, The World’s Biggest Biometrics Database” published on TechCrunch is about possible information leaks, privacy issues, etc.. But I have found some interesting bits about the databases used towards its end:

Sudhir Narayana, assistant director general at Aadhar’s technology center, told me that MongoDB was among several database products, apart from MySQL, Hadoop and HBase, originally procured for running the database search. Unlike MySQL, which could only store demographic data, MongoDB was able to store pictures.

That’s the warning sign right there. You can already see what follows:

However, Aadhar has been slowly shifting most of its database related work to MySQL, after realizing that MongoDB was not being able to cope with massive chunks of data, millions of packets.

✚ You can see more details about Aadhaar’s complex database architecture in Big Data at Aadhaar With Hadoop, HBase, MongoDB, MySQL, and Solr

Original title and link: From MySQL to MongoDB and back - The world’s biggest biometrics database (NoSQL database©myNoSQL)


MySQL slow query collection sources

Morgan Tocker:

The other day it struck me that MySQL applications have no fewer than four sources to be able to collect potentially slow queries for analysis, and that I actually find myself using 3/4 methods available.

The source listed:

  1. application logging/monitoring
  2. performance schema
  3. slow query log file
  4. slow query log table — I didn’t know about this one.

For the details of each of these you must read the post.

Original title and link: MySQL slow query collection sources (NoSQL database©myNoSQL)


Dropbox: Challenges in mirroring large MySQL systems to HBase

A presentation by Todd Eisenberger about the archival system used by Dropbox based on MySQL and HBase:

MySQL benefits:

  • fast queries for known keys over a (relatively) small dataset
  • high read throughput

HBase benetits:

  • high write throughput
  • large suite of pre-existing tools for distributed computation
  • easier to perform large processing tasks

✚ Both are consistent

✚ Most of the benefits in HBase’s section point in the direction of data processing benefits (and not data storage benefits)