NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



MySQL: All content tagged as MySQL in NoSQL databases and polyglot persistence

Automating MySQL Backups at Facebook Scale

Eric Barrett (Facebook) describes the process used for backing up Facebook’s MySQL cluster1:

Backups are not the most glamorous type of engineering. They are technical, repetitive, and when everything works, nobody notices. They are also cross-discipline, requiring systems, network, and software expertise from multiple teams. But ensuring your memories and connections are safe is incredibly important, and at the end of the day, incredibly rewarding.

If you’d want to make it sound simple, just enumerate the steps:

  1. Binary logs and mysqldump
  2. Hadoop DFS
  3. Long-term storage

Then start asking how you’d accomplish this. With 1 server. With more servers. With more servers while maintaining the availability of the system. See how far you’d be able to answer these questions. At least theoretically.

  1. As a side note, in Fun with numbers: How much data is Facebook ingesting, I’ve guestimated the number of MySQL servers in the 20k range. This post mentions: “thousands of database servers in multiple regions”. 

Original title and link: Automating MySQL Backups at Facebook Scale (NoSQL database©myNoSQL)


MySQL Delayed Replication - Making a Slave Deliberately Lag Behind a Master

Tony Darnell explains in which use cases and how to configure delayed replication, a feature available in MySQL 5.6:

  1. Scenario #1 – To protect against user mistakes on the master. A DBA can roll back a delayed slave to the time just before the disaster.
  2. Scenario #2 – To test how the system behaves when there is a lag. For example, in an application, a lag might be caused by a heavy load on the slave. However, it can be difficult to generate this load level. Delayed replication can simulate the lag without having to simulate the load. It can also be used to debug conditions related to a lagging slave.
  3. Scenario #3 – To inspect what the database looked like long ago, without having to reload a backup. For example, if the delay is one week and the DBA needs to see what the database looked like before the last few days’ worth of development, the delayed slave can be inspected.

The first time I’ve heard about intentional delayed replication was a couple of months ago from an ex-DBA guy. My first thought was: “are you kidding me? Everyone in the databases world tries to make the replication as fast as possible and you want delays???”. After a few seconds of what probably looked to be stupid silence, it clicked. I realized there could be use cases of this weird feature. The guy also taught me about similar scenarios as the ones above.

Mat Keep

Original title and link: MySQL Delayed Replication - Making a Slave Deliberately Lag Behind a Master (NoSQL database©myNoSQL)


Redis Mass Data Import: MySQL to Redis in One Step

Derek Watson:

In moving a relatively large table from MySQL to Redis, you may find that extracting, transforming and loading a row at a time can be excruciatingly slow. Here’s a quick trick you can use that pipes the output of the mysql command directly to redis-cli, bypassing middleware and allowing both data stores to operate at their peak speed.

Nice trick. Which by the way was documented on the Redis site. With some more work (worth for larger data sets), you could actually generate a Redis RDB file directly.

Original title and link: Redis Mass Data Import: MySQL to Redis in One Step (NoSQL database©myNoSQL)


Three Analyst Predictions for 2013: Hadoop, SAP, and MySQL vs NoSQL

The season of predictions is here. Chris Kanaracus in an all-bold post, quoting analysts:

Jon Reed: “Expect SAP to purchase an up-and-coming “big data” product or vendor, and perhaps several, including at least one that specializes in integration with the Hadoop framework for large-scale data processing”.

I’m still scratching my head to come up with the long list of product or vendors specialized in integration of Hadoop that SAP could acquire.

Curt Monash: “Expect plenty of additional adoption for Hadoop. Everybody has the ‘big bit bucket’ use case, largely because of machine-generated data. Even today’s technology is plenty good enough for that purpose, and hence justifies initial Hadoop adoption.”


What I hope to see happening is that besides the companies putting together the building blocks to make Hadoop friendly enough (real work) and the companies claiming integration with Hadoop (not that fantastic work), there’ll be some companies that take the Hadoop stack and built tools whose immediate impact on the business can be measured. Basically vertical solutions applying the Hadoop stack to specific markets, segments, and scenarios.

The main challenge of “Big Data” these days is not that there isn’t value behind it. It’s the measurability of this value. What each company looking into Big Data tries to answer is what value does big data carry for my case? This is a founded question as not every company has an infinite budget, time, and magic resource pool.

Curt Monash: “Usually when the topic of alternative databases comes up, the incumbent is often Oracle or IBM DB2. But in 2013, MySQL could be playing the latter role. NoSQL and NewSQL products often are developed as MySQL alternatives.

Until now NoSQL companies have understood that the competition is not with each. The huge market that relational databases have it covered has enough potential to welcome a few solid NoSQL solutions and there’s no long term need to fight over the few people that already paid attention to them.

Make your bets.

Original title and link: Three Analyst Predictions for 2013: Hadoop, SAP, and MySQL vs NoSQL (NoSQL database©myNoSQL)

Diff Tools for MySQL Configurations

From Webyog blog two tools to compare and diff both the static and runtime configurations of MySQL servers:

  1. pt-config-diff part of the Percona toolkit (free)
  2. MONyog visual diff and configuration tracker (commercial)

Original title and link: Diff Tools for MySQL Configurations (NoSQL database©myNoSQL)


Test Driving Database Indexes

Myron Marston:

Database indexes are conceptually very simple, but in practice, I’ve found that it’s hard to predict when they’ll get used and what indexes a given table needs. On a project at work I came up with the idea to test-drive my database indexes, just like I test-drive the rest of my code. I’d like to share the approach I came up with.

A very interesting idea at least for MySQL users.

Original title and link: Test Driving Database Indexes (NoSQL database©myNoSQL)


Provisioned IOPS for Amazon RDS

Werner Vogels:

Following the huge success of being able to provision a consistent, user-requested I/O rate for DynamoDB and Elastic Block Store (EBS), the AWS Database Services team has now released Provisioned IOPS, a new high performance storage option for the Amazon Relational Database Service (Amazon RDS). Customers can provision up to 10,000 IOPS (input/output operations per second) per database instance to help ensure that their databases can run the most stringent workloads with rock solid, consistent performance.

Amazon is the first company I know of championing guaranteed performance SLAs. Until recently most of the SLAs were referring to availability, resilience, and redundancy. But soon performance-based SLAs will become the norm for other service providers. I’d also expect appliance vendors to be asked for similar guarantees sooner than later.

Original title and link: Provisioned IOPS for Amazon RDS (NoSQL database©myNoSQL)


Big Data at Aadhaar With Hadoop, HBase, MongoDB, MySQL, and Solr

It’s unfortunate that the post focuses mostly on the usage of Spring and RabitMQ and the slidedeck doesn’t dive deeper into the architecture, data flows, and data stores, but the diagrams below should give you an idea of this truly polyglot persistentency architecture:

Architecture of Big Data at Aadhaar

Big Data at Aadhaar Data Stores

The slide deck presenting architecture principles and numbers about the platform after the break.

A Quick Test of the New MySQL Memcached Plugin With (J)Ruby

Gabor Vitez:

With a new post hitting Hacker News again on MySQL’s memcached plugin, I really wanted to do a quick-and-dirty benchmark on it, just to see what good it is – does this interface offer any extra speed when compared to SQL+ActiveRecord? Does it have it’s place in the software stack? How much work is needed to get this combination off the ground?

When running into micro-benchmarks, I really try my best to figure out if they hide any value. But it’s hard to find any in one that uses different libraries, runs both the benchmark and the server on the same machine and uses no concurrency.

Original title and link: A Quick Test of the New MySQL Memcached Plugin With (J)Ruby (NoSQL database©myNoSQL)


Klout Data Architecture: MySQL, HBase, Hive, Pig, Elastic Search, MongoDB, SSAS

Just found slideck (embedded below) describing the data workflow at Klout. Their architecture includes many interesting pieces combining both NoSQL and relational databases with Hadoop and Hive and Pig and traditional BI. Even Excel gets a mention in the slides:

  1. Pig and Hive
  2. HBase
  3. Elastic Search
  4. MongoDB
  5. MySQL

Klout Data Architecture

Generating Meaningful Test Data Using a MySQL Function

Ronald Speelman:

You can use this MySQL function to generate names, (e-mail)addresses, phone numbers, urls, bit values, colors, IP address, etc.. As usual, the code is provided in a zipfile and the code is fully documented.

The last couple of days I’ve been looking for generating some good test data in JSON format1, so if you are aware of something please drop me a note.

  1. Right now I’m using as input a corpus of combines JSON files I’ve found online, but I’m not happy with the solution. 

Original title and link: Generating Meaningful Test Data Using a MySQL Function (NoSQL database©myNoSQL)


MySQL Is Bazillion Times Faster Than MemSQL

Domas Mituzas about the MemSQL vs MySQL benchmark:

Though I usually understand that those claims don’t make any sense, I was wondering what did they do wrong. Apparently they got MySQL with default settings running and MemSQL with default settings running, then compared the two. They say it is a good benchmark, as it compares what users get just by installing standard packages.

That is already cheating, because systems are forced to work in completely different profiles.

The first paragraph of the post summarizes very well the general feeling about benchmarks:

I don’t like stupid benchmarks, as they waste my time.

I think that most of the generic benchmarks are stupid, even if some generic numbers are considered interesting by software engineers. Benchmarks designed around specific scenarios of applications will most of the time give more realistic results. But even those are difficult to design and account for all the configuration options, scaling, or changes of the use cases.

Original title and link: MySQL Is Bazillion Times Faster Than MemSQL (NoSQL database©myNoSQL)