ALL COVERED TOPICS

NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter

NAVIGATE MAIN CATEGORIES

Close

scalability: All content tagged as scalability in NoSQL databases and polyglot persistence

Paper: An Analysis of Linux Scalability to Many Cores

A paper authored by a team from MIT CSAIL whose goal is to identify various scalability issues in the Linux kernel:

This paper analyzes the scalability of seven system applications (Exim, memcached, Apache, PostgreSQL, gmake, Psearchy, and MapReduce) running on Linux on a 48- core computer. Except for gmake, all applications trigger scalability bottlenecks inside a recent Linux kernel. Us-ing mostly standard parallel programming techniques—this paper introduces one new technique, sloppy counters—these bottlenecks can be removed from the kernel or avoided by changing the applications slightly. Modifying the kernel required in total 3002 lines of code changes. A speculative conclusion from this analysis is that there is no scalability reason to give up on traditional operating system organizations just yet.

Interesting choice of tools. Note that the team used an in-memory file system to eliminate the disk-related bottlenecks.

Original title and link: Paper: An Analysis of Linux Scalability to Many Cores (NoSQL database©myNoSQL)

via: http://www.stanford.edu/class/cs240/readings/analysis-linux-scalability.pdf


How Do I Freaking Scale Oracle?

Andrew Oliver for InfoWorld:

That said, many companies I work with have spent 20 years painting themselves into an Oracle corner. While they may have one eye on a brighter future, they still must ensure their Oracle database is high-performance and highly available — and scales as well as possible. Despite what you may read in NoSQL vendor marketing materials (or even in my blog [2]), it is possible to scale Oracle.

If you can actually find the answer to the question in the title, please teach me or give me links. This article left me in the dark. Or the only light I’ve seen involves way too many additional products that most probably cost a bit.

Original title and link: How Do I Freaking Scale Oracle? (NoSQL database©myNoSQL)

via: http://www.infoworld.com/print/212392


The Myth of Auto Scaling as a Capacity Planning Approach

A quite old, but very educative post by James Golick dissecting the mythical extra server capacity:

There’s this idea floating around that we can scale out our data services “just in time”. Proponents of cloud computing frequently tout this as an advantage of such a platform. Got a load spike? No problem, just spin up a few new instances to handle the demand. It’s a great sounding story, but sadly, things don’t quite work that way.

This is the Mythical Man-Month of the IT department.

John Allspaw

Original title and link: The Myth of Auto Scaling as a Capacity Planning Approach (NoSQL database©myNoSQL)

via: http://jamesgolick.com/2010/10/27/we-are-experiencing-too-much-load-lets-add-a-new-server..html


Two Sides of the OMGPOP Cloud and Couchbase Scalability Story

Many media sites published on Friday the PR release of OMGPOP growth story citing the usage of cloud services and Couchbase as their scaling solution (GigaOm, BusinessInsider).

When reading it, I’ve jotted down:

  1. The good: using a combination of cloud and a NoSQL database (Couchbase) allowed OMGPOP to scale
  2. The bad: OMGPOP had to call in people from Couchbase to help out with scaling

Question is if you can throw in more iron and hire experts wouldn’t many other database solutions be able to cope with OMGPOP’s growth?

Original title and link: Two Sides of the OMGPOP Cloud and Couchbase Scalability Story (NoSQL database©myNoSQL)


What other popular paradigms/architectures can handle large scale computational problems?

Interesting answers on Quora mostly expanding on Krishna Sankar’s short answer:

There are two ways one can address large scale computational problems:

  • Task Parallelism : This is where MPI and so forth fit in
  • Data Parallelism : This is the sweet spot for map/reduce

Original title and link: What other popular paradigms/architectures can handle large scale computational problems? (NoSQL database©myNoSQL)


Auto Scaling in the Amazon Cloud: Netflix's Approach and Lessons Learned

Another great post for today from the engineering team at Netflix:

Auto scaling is a very powerful tool, but it can also be a double-edged sword. Without the proper configuration and testing it can do more harm than good. A number of edge cases may occur when attempting to optimize or make the configuration more complex. As seen above, when configured carefully and correctly, auto scaling can increase availability while simultaneously decreasing overall costs.

Original title and link: Auto Scaling in the Amazon Cloud: Netflix’s Approach and Lessons Learned (NoSQL database©myNoSQL)

via: http://techblog.netflix.com/2012/01/auto-scaling-in-amazon-cloud.html


Asking for Performance and Scalability Advice on StackOverflow

How many times have you got an answer that applies to your specific scenario when providing a short list of performance and scalability requirements? MySQL/InnoDB can do 750k qps, Cassandra is scaling linearly, MongoDB can do 8 mil ops/s. Is any of these the answer for your application?

Actually:

  • How many times did you get all the requirements right at the spec time?

  • How many times did requirements remain the same during the development cycle?

  • How many times did production reality confirmed your bullet list requirements?

Original title and link: Asking for Performance and Scalability Advice on StackOverflow (NoSQL database©myNoSQL)


Podcast: MySQL Cluster News: Performance Improvements,New NoSQL Access

Mat Keep and Bernd Ocklin discuss what’s new in the second milesone release of MySQL Cluster 7.2: performance improvements, new NoSQL access (memcached protocol), cross data center scalability. Download the mp3.

Original title and link: Podcast: MySQL Cluster News: Performance Improvements,New NoSQL Access (NoSQL database©myNoSQL)


The Story of Etsy's Architecture

Ars Technica’s Sean Gallagher summarizes a presentation given at Surge conference covering the evolution of Etsy’s architecture from a centralized PostgreSQL stored procedures based solution to a sharded MySQL and going through a failed service oriented-like architecture:

And the team started to shift feature by feature away from a semi-monolithic Postgres back-end to sharded MySQL databases. “It’s a battle-tested approach,” Snyder said. “Flickr is using it on an enormous scale. It scales horizontally, basically, to near infinity, and there’s no single point of failure—it’s all master to master replication.”

Original title and link: The Story of Etsy’s Architecture (NoSQL database©myNoSQL)

via: http://arstechnica.com/business/news/2011/10/when-clever-goes-wrong-how-etsy-overcame-poor-architectural-choices.ars


Help CouchDB Break the C10K Barrier

Over the weekend, I was experimenting with CouchDB to see if it can pass the C10K barrier. Some of the performance optimizations I made along the way are really OS-level optimizations that affect MochiWeb (erlang web server) and fairly well documented in many blogs. This one by @metabrew in particular is a pretty good read, since it focuses on Erlang and MochiWeb. While I am a performance junkie, I am not an Erlang hacker. So this is a call for help to the CouchDB hackers for recommendations on scaling out CouchDB.

The initial tweaks made by the blitz.io guys, took CouchDB from under 1000 concurrent users to around 2300 concurrent users. There’s still a long way to 10k concurrent users and they’d appreciate your help.

Original title and link: Help CouchDB Break the C10K Barrier (NoSQL database©myNoSQL)

via: http://blog.mudynamics.com/2011/09/05/help-couchdb-break-the-c10k-barrier/


What Scales Best?

Tony Bain:

What is best?  Well that comes down to the resulting complexity, cost, performance and other trade-offs.  Trade-offs are key as there are almost always significant concessions to be made as you scale up.

[…]

So what is my point? Well I guess what I am saying is physical scalability is of course an important consideration in determining what is best. But it is only one side of the coin. What it “costs” you in terms of complexity, actual dollars, performance, flexibility, availability, consistency etc, etc are all important too. And these are often relative, what is complex for you may not be complex for someone else.

I concur—a long time ago I wrote: Complexity is a dimension of scalability.

Original title and link: What Scales Best? (NoSQL database©myNoSQL)

via: http://blog.tonybain.com/tony_bain/2011/07/what-scales-best.html