ALL COVERED TOPICS

NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter

NAVIGATE MAIN CATEGORIES

Close

MySQL: All content tagged as MySQL in NoSQL databases and polyglot persistence

Provisioned IOPS for Amazon RDS

Werner Vogels:

Following the huge success of being able to provision a consistent, user-requested I/O rate for DynamoDB and Elastic Block Store (EBS), the AWS Database Services team has now released Provisioned IOPS, a new high performance storage option for the Amazon Relational Database Service (Amazon RDS). Customers can provision up to 10,000 IOPS (input/output operations per second) per database instance to help ensure that their databases can run the most stringent workloads with rock solid, consistent performance.

Amazon is the first company I know of championing guaranteed performance SLAs. Until recently most of the SLAs were referring to availability, resilience, and redundancy. But soon performance-based SLAs will become the norm for other service providers. I’d also expect appliance vendors to be asked for similar guarantees sooner than later.

Original title and link: Provisioned IOPS for Amazon RDS (NoSQL database©myNoSQL)

via: http://www.allthingsdistributed.com/2012/09/provisioned-iops-rds.html


Big Data at Aadhaar With Hadoop, HBase, MongoDB, MySQL, and Solr

It’s unfortunate that the post focuses mostly on the usage of Spring and RabitMQ and the slidedeck doesn’t dive deeper into the architecture, data flows, and data stores, but the diagrams below should give you an idea of this truly polyglot persistentency architecture:

Architecture of Big Data at Aadhaar

Big Data at Aadhaar Data Stores

The slide deck presenting architecture principles and numbers about the platform after the break.


A Quick Test of the New MySQL Memcached Plugin With (J)Ruby

Gabor Vitez:

With a new post hitting Hacker News again on MySQL’s memcached plugin, I really wanted to do a quick-and-dirty benchmark on it, just to see what good it is – does this interface offer any extra speed when compared to SQL+ActiveRecord? Does it have it’s place in the software stack? How much work is needed to get this combination off the ground?

When running into micro-benchmarks, I really try my best to figure out if they hide any value. But it’s hard to find any in one that uses different libraries, runs both the benchmark and the server on the same machine and uses no concurrency.

Original title and link: A Quick Test of the New MySQL Memcached Plugin With (J)Ruby (NoSQL database©myNoSQL)

via: http://elevat.eu/blog/2012/08/a-quick-test-of-the-new-mysql-memcached-plugin-with-ruby/


Klout Data Architecture: MySQL, HBase, Hive, Pig, Elastic Search, MongoDB, SSAS

Just found slideck (embedded below) describing the data workflow at Klout. Their architecture includes many interesting pieces combining both NoSQL and relational databases with Hadoop and Hive and Pig and traditional BI. Even Excel gets a mention in the slides:

  1. Pig and Hive
  2. HBase
  3. Elastic Search
  4. MongoDB
  5. MySQL

Klout Data Architecture


Generating Meaningful Test Data Using a MySQL Function

Ronald Speelman:

You can use this MySQL function to generate names, (e-mail)addresses, phone numbers, urls, bit values, colors, IP address, etc.. As usual, the code is provided in a zipfile and the code is fully documented.

The last couple of days I’ve been looking for generating some good test data in JSON format1, so if you are aware of something please drop me a note.


  1. Right now I’m using as input a corpus of combines JSON files I’ve found online, but I’m not happy with the solution. 

Original title and link: Generating Meaningful Test Data Using a MySQL Function (NoSQL database©myNoSQL)

via: http://moinne.com/blog/ronald/mysql/howto-generate-meaningful-test-data-using-a-mysql-function


MySQL Is Bazillion Times Faster Than MemSQL

Domas Mituzas about the MemSQL vs MySQL benchmark:

Though I usually understand that those claims don’t make any sense, I was wondering what did they do wrong. Apparently they got MySQL with default settings running and MemSQL with default settings running, then compared the two. They say it is a good benchmark, as it compares what users get just by installing standard packages.

That is already cheating, because systems are forced to work in completely different profiles.

The first paragraph of the post summarizes very well the general feeling about benchmarks:

I don’t like stupid benchmarks, as they waste my time.

I think that most of the generic benchmarks are stupid, even if some generic numbers are considered interesting by software engineers. Benchmarks designed around specific scenarios of applications will most of the time give more realistic results. But even those are difficult to design and account for all the configuration options, scaling, or changes of the use cases.

Original title and link: MySQL Is Bazillion Times Faster Than MemSQL (NoSQL database©myNoSQL)

via: http://dom.as/2012/06/26/memsql-rage/


A Tragically Comedic Security Flaw in MySQL

In short, if you try to authenticate to a MySQL server affected by this flaw, there is a chance it will accept your password even if the wrong one was supplied. The following one-liner in bash will provide access to an affected MySQL server as the root user account, without actually knowing the password.

  $ for i in `seq 1 1000`; do mysql -u root --password=bad -h 127.0.0.1 2>/dev/null; done

Don’t try this at home. Or if you try it, don’t tell anyone the result.

Original title and link: A Tragically Comedic Security Flaw in MySQL (NoSQL database©myNoSQL)

via: https://community.rapid7.com/community/metasploit/blog/2012/06/11/cve-2012-2122-a-tragically-comedic-security-flaw-in-mysql


Tumblr Jetpants: A Toolkit for Huge MySQL Topologies

Jetpants:

Today, we’re happy to announce the open source release of Jetpants, Tumblr’s in-house toolchain for managing huge MySQL database topologies. Jetpants offers a command suite for easily cloning replicas, rebalancing shards, and performing master promotions. It’s also a full Ruby library for use in developing custom billion-row migration scripts, automating database manipulations, and copying huge files quickly to multiple remote destinations.

This toolkit helps Tumblr manage its 60bn rows totaling 21TB of data in a MySQL cluster.

Original title and link: Tumblr Jetpants: A Toolkit for Huge MySQL Topologies (NoSQL database©myNoSQL)

via: http://engineering.tumblr.com/post/24612921290/jetpants-a-toolkit-for-huge-mysql-topologies


MySQL KILL Command: How It Works

Have you ever tried to kill a query, but rather than just go away, it remained among the running ones for an extended period of time? Or perhaps you have noticed some threads makred with killed showing up from time to time and not actually dying. What are these zombies? Why does MySQL sometimes seem to fail to terminate queries quickly? Is there any way to force the kill command to actually work instantaneously?

Bookmarked.

Marco Tusa

Original title and link: MySQL KILL Command: How It Works (NoSQL database©myNoSQL)

via: http://www.dbasquare.com/2012/05/15/why-do-threads-sometimes-stay-in-killed-state-in-mysql/


MySQL Is Done. NoSQL Is Done. It's the Postgres Age

Jeff Dickey enumerates some of the new features available in PostgreSQL—schema-less data, array columns, queuing, full-text searching, geo-spatial indexing—concluding that PosgreSQL has now everything an application needs:

Postgres has taken the features out of all of these tools and integrate it right inside the platform. Now you don’t need to spin up a mongo cluster for non-rel data, rabbitmq cluster for queueing, solr box for searching. You can just have a single postgres server. That saves a huge ops headache since each of those clusters/boxes have to be durable, replicated, and scalable.

Sounds a bit too optimistic? As we’ve learned from the NoSQL space there are no silver bullets:

Now obviously, there’s a glaring downside with this approach: you get one box. Maybe a read slave or something, but really, you can’t scale it.

As you can imagine I disagree with most of the points, the only exception being that it is great to see so many useful features packaged with PostgreSQL—these are definitely going to make like easier for some of the developers.

But when talking about MySQL and NoSQL being done:

  1. MySQL is done, except it has a huge community, there are tons of developers very familiar with it, and last but not least MySQL powers massive deployments. This last part matters a lot.
  2. NoSQL is done, except many NoSQL solutions tackle different problem spaces providing optimal solutions for these by staying focused. Neither Oracle, nor MongoDB, nor PosgreSQL will be able to solve all problems. The wider range of problems they are covering, the less optimal solutions they are providing for corner case or extreme scenarios.

Original title and link: MySQL Is Done. NoSQL Is Done. It’s the Postgres Age (NoSQL database©myNoSQL)

via: http://dickey.xxx/mysql-is-done-it-s-the-postgres-age


Pinterest Architecture Numbers

Todd Hoff caught some new numbers about Pinterest architecture and from those the ones interesting from the data point of view:

  • 125 EC2 memcached instances, from which 90 for production and 35 for internal usage:

    Another 90 EC2 instances are dedicated towards caching, through memcache. “This allows us to keep a lot of data in memory that is accessed very often, so we can keep load off of our database system,” Park said. Another 35 instances are used for internal purposes.

  • 70 master MySQL databases on EC2

    • sharded at 50% capacity
    • backup databases in different regions

    Behind the application, Pinterest runs about 70 master databases on EC2, as well as another set of backup databases located in different regions around the world for redundancy.

    In order to serve its users in a timely fashion, Pinterest sharded its database tables across multiple servers. When a database server gets more than 50% filled, Pinterest engineers move half its contents to another server, a process called sharding. Last November, the company had eight master-slave database pairs. Now it has 64 pairs of databases. “The sharded architecture has let us grow and get the I/O capacity we need,” Park said.

  • 80 million/410TB objects stored in S3

  • no details about Redis

Original title and link: Pinterest Architecture Numbers (NoSQL database©myNoSQL)


NoSQL and Relational Databases Podcast With Mathias Meyer

EngineYard’s Ines Sombra recorded a conversation with Mathias Meyer about NoSQL databases and their evolution towards more friendlier functionality, relational databases and their steps towards non-relational models, and a bit more on what polyglot persistence means.

Mathias Meyer is one of the people I could talk for days about NoSQL and databases in general with different infrastructure toppings and he has some of the most well balanced thoughts when speaking about this exciting space—see this conversation I’ve had with him in the early days of NoSQL. I strongly encourage you to download the mp3 and listen to it.

Original title and link: NoSQL and Relational Databases Podcast With Mathias Meyer (NoSQL database©myNoSQL)