ALL COVERED TOPICS

NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Membase Amazon SimpleDB MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter

NAVIGATE MAIN CATEGORIES

Close

mysql: All content tagged as mysql in NoSQL databases and polyglot persistence

MongoDB vs MySQL: A DevOps point of view

Pierre Bailet and Mathieu Poumeyrol of fotopedia (a French photo site) share their experience of operating a small MongoDB cluster since Sep.2009 compared to a MySQL cluster.

Some details about fotopedia:

  • fotopedia is 100% on AWS
  • Amazon RDS for MySQL
  • 4 nodes MongoDB cluster
  • 150mil. photo views

MongoDB advantages:

  • no alter table
  • background index creation
  • data backup & restoration
    • note: as far as I can tell MySQL is able to do the same
  • replica sets
  • hardware migration
    • note: the same procedure can be used for MySQL

Before leaving you with the slides, here is an interesting accepted trade-off:

Quietly losing seconds of writes is preferable to:

  • weekly minutes-long maintenance periods
  • minutes-long unscheduled downtime and manual failover in case of hardware failures


Where Does Xeround Fit In The CAP Theorem?

Itamar Haber over the Xeround blog:

Q: Is Xeround Inconsistent? Xeround employs a set of majority-based algorithms to facilitate its reading and writing of data from/to multiple, distributed nodes. […] Via the use of these algorithms we are ensured that all access to the data is consistent so inconsistency is not an issue.

Q: Is Xeround Unavailable? There is no single point of failure in Xeround and every component that the system consists of is redundant and replaceable.

Q: Is Xeround Partitioning-Intolerant? Yes, to a certain extent it is.

After reading it, I got the same impression as VoltDB’s John Hugg who commented:

It sounds like you’ve gotten this backwards. According to you, in the face of a network event, the system becomes unavailable, but remains consistent. I think you have partition tolerance, but with reduced availability.

Instead of focusing strictly on the CAP characteristics of a distributed database, one should focus on what is the required behavior for their system and look for the database solution that offers them the guarantees they need.

Original title and link: Where Does Xeround Fit In The CAP Theorem? (NoSQL database©myNoSQL)

via: http://xeround.com/blog/2012/01/xeround-and-the-cap-theorem


Jelastic Database Marketshare: MySQL, MongoDB, MariaDB

Jelastic, a company offering a cloud platform for Java server hosting, has published some stats about the databases used by their over 7000 users:

Jelastic Database Marketshare

While it would be wrong to generalize these results to absolute database marketshare, it is interesting nonetheless to see that MongoDB is already outrunning PostrgeSQL being the second most used database and that CouchDB, which was added only one month ago, is already used by 5% of Jelastic’s users. MySQL detains the first position with over 40% users or differently put double the number of the second place (MongoDB).

These numbers would be even more interesting if they would account for some real usage stats like database sizes or query volumes.

Mat Keep

Original title and link: Jelastic Database Marketshare: MySQL, MongoDB, MariaDB (NoSQL database©myNoSQL)

via: http://blog.jelastic.com/2012/01/23/database-marketshare-january-2012/


MySQL at Twitter: Storing 250mil Tweets Daily

Todd Hoff took the time to disect and extract in a post the interesting bits from Jeremy Cole’s talk[1]Big and Small Data at @Twitter from the O’Reilly MySQL conference:

  • MySQL works well enough most of the time that it’s worth using. Twitter values stability over features so they’ve stayed with older releases.
  • MySQL doesn’t work for ID generation and graph storage.
  • MySQL is used for smaller datasets of < 1.5TB, which is the size of their RAID array, and as a backing store for larger datasets.
  • Typical database server config: HP DL380, 72GB RAM, 24 disk RAID10. Good balance of memory and disk.

In my summary of the talk I’ve noted:

  • Use MySQL when it works, something else when not - fortunately MySQL often does work
  • MySQL is used by Twitter because it’s robust, replication works and it’s easy to use and run
  • MySQL doesn’t work good for graphs, auto_increment, replication lag is a problem
  • MySQL replication improvements like crash safe multi-threaded slave is what they need

But Twitter is also one of the most prominent use cases of polyglot persistence.While MySQL is an important piece of the Twitter architecture, it is not the only storage or data processing engine.

The following other data solutions get mentioned in Jeremy’s talk:

  • Cassandra is used for high velocity writes, and lower velocity reads. The advantage is Cassandra can run on cheaper hardware than MySQL, it can expand easier, and they like schemaless design.
  • Hadoop is used to process unstructured and large datasets, hundreds of billions of rows.
  • Vertica is being used for analytics and large aggregations and joins so they don’t have to write MapReduce jobs. 

Yet that’s not the whole story. Twitter is using Cassandra and Memcached for real-time URL fetchers and they also experimented with using Gizzard for Redis. After buying BackType, Twitter got and then open sourced Storm, a Hadoop-like real-time data processing tool. And who knows what’s in the Twitter labs right now.

I’m embedding below Jeremy Cole’s “Big and Small Data at @Twitter”:


Database Sharding Using a Proxy

ScaleBase’s Liran Zelkha is making the case for database sharding using a proxy:

First and foremost, since the sharding logic is not embedded inside the application, third party applications can be used, be it MySQL Workbench, MySQL command line interface or any other third party product. This translates to a huge saving in the day-to-day costs of both developers and system administrators.

Compare ScaleBase’s proxy-based sharding:

ScaleBase Proxy Sharding

with MongoDB’s sharding:

MongoDB sharding

Another example would be the Hadoop HDFS NodeName which provides somehow similar functionality.

Original title and link: Database Sharding Using a Proxy (NoSQL database©myNoSQL)

via: http://www.scalebase.com/making-the-case-for-sharding-using-a-database-proxy/


MySQL MEMORY as Poor Man’s Memcached Replacement

ServerFault Q&A:

Q: Copy MySQL to RAM as a poor man’s memcached replacement?

A: Use the the MEMORY storage engine on a read only slave to do your reads from, is exactly what you really want and a sane setup. Forget “dumping it to disk” (?!) or other strange things.

You can even put the slave as another instance on your existing server if you can’t afford to setup a dedicated slave, but properly tuning the MySQL parameters for mostly read workloads will bring a significant performance enhancement too!

Jiminy

Original title and link: MySQL MEMORY as Poor Man’s Memcached Replacement (NoSQL database©myNoSQL)


Facebook: There Are No Published Cases of NoSQL Databases Operating at the Scale of Facebook’s MySQL Database

Joe Maguire referring to the Facebook talk embedded below MySQL and HBase:

if Facebook doesn’t need NoSQL, who does?

My answer: many of those that cannot employ a specialized team to hack the hell out of MySQL to make it work at that scale.

On the flipside, many other companies don’t have the time or engineering power to grow their product together with a NoSQL database.

Original title and link: Facebook: There Are No Published Cases of NoSQL Databases Operating at the Scale of Facebook’s MySQL Database (NoSQL database©myNoSQL)

via: http://josephmaguire.blogspot.com/2011/12/facebook-there-are-no-published-cases.html


MySQL Sharding vs MySQL Cluster

StackExchange Q&A:

Q: Considering performance only, can a MySQL Cluster beat a custom data sharding MySQL solution?

A: I would say that MySQL Cluster could achieve higher throughput/host than sharded MySQL+InnoDB provided that :

  • Queries are simple
  • All data fits in-memory

In terms of latency, MySQL Cluster should have more stable latency than sharded MySQL. Actual latencies for purely in-memory data could be similar. As queries become more complex, and data is stored on disk, the performance comparison becomes more confusing.

Make sure you read the complete answer as it covers some more MySQL Sharding vs MySQL Cluster pros and cons.

Mat Keep

Original title and link: MySQL Sharding vs MySQL Cluster (NoSQL database©myNoSQL)


MongoDB, Data Modeling, and Adoption

Micheal Shallop describes in this post how he “built and re-buit” a geospatial table, replacing several tables in MySQL with MongoDB:

The mongo geospatial repository will be replacing several tables in the legacy mySQL system – as you may know, mongodb comes with full geospatial support so executing queries against a collection (table) built in this manner is shocking in terms of it’s response speeds — especially when you compare those speeds to the traditional mySQL algorithms for extracting geo-points based on distance ranges for lat/lon coordinates.  The tl;dr for this paragraph is: no more hideous trigonometric mySQL queries!

But what actually picked my attention was this paragraph:

What I learned in this exercise was that the key to architecting a mongo collection requires you to re-think how data is stored.  Mongo stores data as a collection of documents.  The key to successful thinking, at least in terms of mongo storage, is denormalization of your data objects.

This made me realize that MongoDB adoption is benefiting hugely from the fact that its data model and querying are the closest to the relational databases, neither requiring a radical mindshift from developers that have at least once touched a database. It is like knowing a programming language and learning a 2nd one that follows almost the same paradigms.

The same cannot be said about key-value stores, multi-dimensional maps, MapReduce algorithms, or graph databases. Any of these would require one to dismiss pretty much everything learned in the relational model and completely remodel the world. It’s a tougher job, but when used right the reward pays off.

Original title and link: MongoDB, Data Modeling, and Adoption (NoSQL database©myNoSQL)


Typekit Architecture Includes Redis, MongoDB, and MySQL

As revealed by Ryan Carver in a web pulp TV interview:

  • Besides MySQL, the stack also contains Redis and MongoDB.
  • Redis is used for stashing Resque data, Vanity metrics, etc.
  • MongoDB is used for storing CDN logs, basic analytics data, traffic-tracking data, etc.
  • Typekit has a unique type of revenue-share deal with its Type foundry partners, distributing revenues based on the popularity/usage of font faces.
  • MongoDB is particularly used for such usage-based data collection and calculation along with its built-in MapReduce framework for reporting.
  • Ryan thinks on-the-fly-report-generation is technically very much possible with MapReduce.

About Typekit infrastructure:

  • Ryan says Typekit currently has about a dozen servers in total, hosted on Slicehost.
  • Typekit plans to shift to an EC2 environment in near future because of the easy scaling and flexibility of EC2.
  • They are currently preparing a cloud formation with Chef, rebuilding Typekit’s operations infrastructure.

Now go watch the whole interview.

Original title and link: Typekit Architecture Includes Redis, MongoDB, and MySQL (NoSQL database©myNoSQL)


DataSift Using MySQL, HBase, Memcached to Deal With Twitter Firehose

A new great article from Todd Hoff dissecting the DataSift architecture:

DataSift architecture

Click for a larger image

In terms of data store, DataSift architecture includes:

  • MySQL (Percona server) on SSD drives
  • HBase cluster (currently, ~30 hadoop nodes, 400TB of storage)
  • Memcached (cache)
  • Redis (still used for some internal queues, but probably going to be dismissed soon)

Leave whatever you were doing and go read it now.

Original title and link: DataSift Using MySQL, HBase, Memcached to Deal With Twitter Firehose (NoSQL database©myNoSQL)