ALL COVERED TOPICS

NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter

NAVIGATE MAIN CATEGORIES

Close

mysql: All content tagged as mysql in NoSQL databases and polyglot persistence

Nokia’s Big Data Ecosystem: Hadoop, Teradata, Oracle, MySQL

Nokia’s big data ecosystem consists of a centralized, petabyte-scale Hadoop cluster that is interconnected with a 100-TB Teradata enterprise data warehouse (EDW), numerous Oracle and MySQL data marts, and visualization technologies that allow Nokia’s 60,000+ users around the world tap into the massive data store. Multi-structured data is constantly being streamed into Hadoop from the relational systems, and hundreds of thousands of Scribe processes run every day to move data from, for example, servers in Singapore to a Hadoop cluster in the UK. Nokia is also a big user of Apache Sqoop and Apache HBase.

In the coming years you’ll hear more often stories—sales pitches—about single unified platforms solving all these problems at once. But platforms that will survive and thrive are those that will accomplish two things:

  1. keep the data gates open: in and out.
  2. work with different other platform to make this efficiently for users

Original title and link: Nokia’s Big Data Ecosystem: Hadoop, Teradata, Oracle, MySQL (NoSQL database©myNoSQL)

via: http://blog.cloudera.com/blog/2013/04/customer-spotlight-nokias-big-data-ecosystem-connects-cloudera-teradata-oracle-and-others/


MySQL 5.6, InnoDB and fast storage: 240k QPS

Mark Callaghan runs some benchmarks against MySQL 5.6.11:

Using MySQL 5.6.11 and InnoDB with a few hacks the peak throughput was about 240,000 QPS and 210,000 block reads/second. The test server has 32 cores (16 physical cores, 32 logical cores with HT enabled). This is a great result that can probably be even better. Contention on fil_system->mutex was the bottleneck and I think that can be improved (see feature request #69276). I wonder if 400,000 block reads/second is possible?

Original title and link: MySQL 5.6, InnoDB and fast storage: 240k QPS (NoSQL database©myNoSQL)

via: http://mysqlha.blogspot.com/2013/05/mysql-56-innodb-and-fast-storage.html


Wikipedia Adopts MariaDB

The technical details of Wikipedia’s migration from MySQL to MariaDB:

As a read-heavy site, Wikipedia aggressively uses edge caching. Approximately 90% of pageviews are served entirely from the edge while at the application layer, we utilize both memcached and redis in addition to MySQL. Despite that, the MySQL databases serving English Wikipedia alone reach a daily peak of ~50k queries/second. Most are read queries served by load-balanced slaves, depending on consistency requirements. 80% of the English Wikipedia query load (up to 40k qps) are typically handled by just two database servers at any given time. Our most common query type (40% of all) has a median execution time of ~0.2ms and a 95th percentile time of ~50ms. To successfully use MariaDB in production, we need it to keep up with the level of performance obtained from Facebook’s MySQL fork, and to behave consistently as traffic patterns change.

As you can see in this post, the only “political” point made is hidden within true reasons:

Equally important, as supporters of the free culture movement, the Wikimedia Foundation strongly prefers free software projects; that includes a preference for projects without bifurcated code bases between differently licensed free and enterprise editions. We welcome and support the MariaDB Foundation as a not-for-profit steward of the free and open MySQL related database community.

Slightly different to Wikipedia Migrates to MariaDB.

Original title and link: Wikipedia Adopts MariaDB (NoSQL database©myNoSQL)

via: https://blog.wikimedia.org/2013/04/22/wikipedia-adopts-mariadb/


MySQL in the Cloud: Discontinuing of Xeround Cloud Database Public Service

Cloud and MySQL related:

We are deeply sorry to announce that Xeround’s public cloud offering will be discontinued soon. All Xeround FREE database instances will be terminated on May 8th, and the paid plans terminated on May 15th.

This was announced on May 1st.

✚ This only means more for Amazon RDS.

Original title and link: MySQL in the Cloud: Discontinuing of Xeround Cloud Database Public Service (NoSQL database©myNoSQL)

via: http://xeround.com/blog/2013/05/discontinuing-of-xeround-cloud-database-public-service


Wikipedia Migrates to MariaDB... but facts are facts

Jon Buys:

There was, and continues to be, concern over Oracle’s treatment of the open source competitor to their own Oracle database. I personally have wondered what motivation, if any, Oracle has to maintain MySQL. They may simply be milking the revenue stream created by MySQL AB until the well goes dry. Since MariaDB is surpassing MySQL in performance and community goodwill, that day may come sooner rather than later.

A couple of little known things:

  1. Oracle has been house for InnoDB since 2005. InnoDB was and continues to be the default, recommended engine for MySQL. Before and after Oracle acquired MySQL through Sun Microsystems.
  2. Oracle has been house for Sleepycat’s BerkleyDB since 2006. Those products are definitely not dead. Community-wise maybe they haven’t put much effort into extending it.

Facts are facts.

Original title and link: Wikipedia Migrates to MariaDB… but facts are facts (NoSQL database©myNoSQL)

via: http://ostatic.com/blog/wikipedia-migrates-to-mariadb


Amazon Web Services Annual Revenue Estimation

Over the weekend, Christopher Mims has published an article in which he derives a figure for Amazon Web Services’s annual revenue: $2.4 billions:

Amazon is famously reticent about sales figures, dribbling out clues without revealing actual numbers. But it appears the company has left enough hints to, finally, discern how much revenue it makes on its cloud computing business, known as Amazon Web Services, which provides the backbone for a growing portion of the internet: about $2.4 billion a year.

There’s no way to decompose this number into the revenue of each AWS solution. For the data space I’d be interested into:

  1. S3 revenues. This is the space Basho’s Riak CS competes into.

    After writing my first post about Riak CS, I’ve learned that in Japan, the same place where Riak CS is run by Yahoo! new cloud storage, Gemini Mobile Technologies has been offering to local ISPs a similar S3-service built on top of Cassandra.

  2. Redshift is pretty new and while I’m not aware of immediate competitors (what am I missing?), I don’t think it accounts for a significant part of this revenue. Even if some of the early users, like AirBnb, report getting very good performance and costs from it.

    Redshift is powered by ParAccell, which, over the weekend, has been acquired by Actian.

  3. Amazon Elastic MapReduce. This is another interesting space from which Microsoft wants a share with its Azure HDInsight developed in collaboration with Hortonworks.

    In this space there’s also MapR and Google Compute combination which seem to be extremely performant.

  4. Interestingly Amazon is making money also from some of the competitors of its Amazon Dynamo and RDS services. The advantage of owning the infrastructure.

Original title and link: Amazon Web Services Annual Revenue Estimation (NoSQL database©myNoSQL)


Using Redis to Optimize MySQL Queries

I somehow missed this post from Flickr team describing their use of (app enforced) capped sorted sets in Redis as sort of a reduced optimized secondary index for MySQL:

[…] the bottleneck was not in generating the list of photos for your most recently active contact, it was just in finding who your most recently active contact was (specifically if you have thousands or tens of thousands of contacts). What if, instead of fully denormalizing, we just maintain a list of your recently active contacts? That would allow us to optimize the slow query, much like a native MySQL index would; instead of needing to look through a list of 20,000 contacts to see which one has uploaded a photo recently, we only need to look at your most recent 5 or 10 (regardless of your total contacts count)!

This is the first time I’m encountaring this approach where a NoSQL database is used not to provide directly the final data (usually in a denormalized format), but rather to optimize the access to the master of data. Basically this is a metadata layer optimizer. Neat!

Original title and link: Using Redis to Optimize MySQL Queries (NoSQL database©myNoSQL)

via: http://code.flickr.net/2013/03/26/using-redis-as-a-secondary-index-for-mysql/


Scaling Big Data Mining Infrastructure at Twitter

I’m almost always enjoying the lessons learned-style presentations from Twitter’s people. The slides below, by Jimmy Lin and Dmitriy Ryaboy, have been used at HadoopSummit. Besides the technical and practical details, there are two things that I really like:

DJ Patil: “It’s impossible to overstress this: 80% of the work in any data project is in cleaning the data”

and then the reality check:

  1. Your boss says something vague
  2. You think very hard on how to move the needle
  3. Where’s the data?
  4. What’s in this dataset?
  5. What’s all the f#$#$ crap in the data?
  6. Clean the data
  7. Run some off-the-shelf data mining algorithm
  8. Productionize, act on the insight
  9. Rinse, repeat

Enjoy!


Memcached vs InnoDB Memcached in MySQL 5.6

Some numbers from comparing Memcached with InnoDB Memcached in MySQL 5.6:

Keep in mind that the entire data set fits into the buffer pool, so there are no reads from disk. However, there is write activity stemming from the fact that this is using InnoDB under the hood (redo logs, etc).

There is a significant impact on the speed so deciding which solution to use gets down to analysing the costs and complexity of maintaining another tool, the cost of Memcached warmup and the performance drop of using InnoDB Memcached.

Original title and link: Memcached vs InnoDB Memcached in MySQL 5.6 (NoSQL database©myNoSQL)

via: http://www.mysqlperformanceblog.com/2013/03/29/mysql-5-6-innodb-memcached-plugin-as-a-caching-layer/


10gen’s MongoDB Following the Steps of MySQL

10gen has never been shy about their plan: replacing MySQL. That’s a bold goal considering Oracle is now behind MySQL. But this could also make things a bit easier for 10gen.

Anyways, what made me write this separate post is the realization of how close 10gen is following the MySQL path:

  1. release early and incomplete. Enhance over time
  2. position the product as the developer friendly and fast
  3. introduce an enterprise edition once your adoption overpassed that of your immediate competitors.

I guess I already know how it’ll end: $2 billion acquisition from a company that gets acquired by Oracle.

While the official announcement of MongoDB 2.4 version mentioned just in passing the “MongoDB Enterprise” version, other websites didn’t leave this aspect aside. Actually it’s what got emphasized about the today’s announcement. In case you wonder what’s the the 10gen’s enterprise box: Kerberos-based security and an on-premise version of the MongoDB Monitoring Service.

The only question I have now is how soon Oracle will start looking into acquiring 10gen. Or how soon it will dedicate marketing and sales resources to directly address 10gen.

Original title and link: 10gen’s MongoDB Following the Steps of MySQL (NoSQL database©myNoSQL)


Cage Match: MySQL vs NoSQL vs Postgres

A post by Brain Aker about the state of MySQL, Postgres and NoSQL databases.

I had a couple of comments and these evolved into a long rant.

MySQL became less interesting once it was acquired […]

I’ve never been very sure what metric is used to measure how interesting the product is. As opposed to some suggestions I’m reading, I haven’t seen stories of people moving away from MySQL because Oracle acquired it. Except Fedora and OpenSUSE replacing MySQL with MariaDB and this due to very specific issues (no security infos, no access to regression tests).

the number of Postgres deployments is greater then what all of the NoSQL market combined adds up to

Comparing 15 years of PosgreSQL with 3 years of NoSQL isn’t going to give meaningful results (for a similar unbalanced comparisons try Oracle vs PostgreSQL). I’m not aware of any database that captured a significant market share in the first 3 years of its existance. Except MySQL. Not Postgres.

Would a document model really matter if schemas could be altered online?

Yes, it would definitely remain relevant. Schema flexibility is not only about updating it, but also about the types allowed. PostgreSQL has indeed added support for arrays and JSON. I see this as a confirmation of what’s happening in the NoSQL space and also about the future of storage engines.

no new language has emerged from the NoSQL market that has any size-able adoption

MongoDB’s query language and the aggregation framework are used by a lot of people. It’s probably not the ideal query language and it comes in two different flavors, but it’s there and it’ll most probably evolve. Biasedly, I could also point to RethinkDB’s data manipulation language for an example of something that is probably on par with SQL and without the hidden unknown corner cases of SQL. Indeed none of these can come close the the adoption acquired by SQL in its 30 years of existance.

Bottom line is that I expect bridges to be built between relational databases and NoSQL databases and each side adopting those features that are useful to their users. I also expect that slowly this relational databases are crap vs NoSQL databases are crap debate will go away, people realizing that the data space is not a zero sum game. Vendors will be the last to give up this fight, but customers have a lot of power in making this happen.

Original title and link: Cage Match: MySQL vs NoSQL vs Postgres (NoSQL database©myNoSQL)

via: http://blog.krow.net/2013/03/mysql-vs-nosql-vs-postgres-vs-sql.html


MySQL Reference Architectures for Massively Scalable Web Infrastructure

A 25 page whitepaper published by Oracle describing a set of best practices for MySQL deployments to accommodate scenarios from small to very large acccording to the following criteria:

  1. queries/second
  2. transactions/second
  3. concurrent read users
  4. concurrent write users
  5. database sizes for 4 types of use cases (sessions, eCommerce, analytics, content management)

Downloading the paper requires registration, but it’s worth reading and thinking about the suggested architectures (even if in a few spots it pushes for the commercials tools offered by Oracle).

Original title and link: MySQL Reference Architectures for Massively Scalable Web Infrastructure (NoSQL database©myNoSQL)