performance: All content tagged as performance in NoSQL databases and polyglot persistence
Wednesday, 21 March 2012
When More Machines Equals Worse Results
Galileo observed how things broke if they were naively scaled up.
Google found the larger the scale the greater the impact of latency variability. When a request is implemented by work done in parallel, as is common with today’s service oriented systems, the overall response time is dominated by the long tail distribution of the parallel operations. Every response must have a consistent and low latency or the overall operation response time will be tragically slow.
Fantastic post from Todd Hoff on the (hopefully) well known truth: “the reponse time in a distributed parallel systems is the time of the slowest component“.
Original title and link: When More Machines Equals Worse Results (©myNoSQL)
Wednesday, 15 February 2012
Possible 100-fold increase in data storage speed
European researchers may have found a way to speed up data storage 100-fold, breaking one barrier holding back how fast data can be transferred. […] The researchers at York University in the U.K. and Nijmegen University in the Netherlands accomplished the feat by heating a magnetic material with laser bursts that alter what is called the magnetic spin of the material at the atomic level, according to an explanation by York University. There are two possible spins, parallel and anti-parallel, and in storage, these binary states would represent the ones and zeros that designate bit types.
I still find the salmon storage more mouthwatering.
Original title and link: Possible 100-fold increase in data storage speed (©myNoSQL)
via: http://www.networkworld.com/news/2012/020812-data-storage-speed-255864.html
Friday, 27 January 2012
MapReduce With Hadoop: What Happens During Mapping
An interesting look at what happens during the map phase in Hadoop and the impact of emitting key-value pairs:
- a direct negative impact on the map time and CPU usage, due to more serialization
- an indirect negative impact on CPU due to more spilling and additional deserialization in the combine step
- a direct impact on the map task, due to more intermediate files, which makes the final merge more expensive

The main point of the dynaTrace blog post is that even if Hadoop makes it easy to throw more hardware at a problem, wasting resources with bad code in MapReduce tasks comes with a noticeable and measurable cost.
Original title and link: MapReduce With Hadoop: What Happens During Mapping (©myNoSQL)
via: http://blog.dynatrace.com/2012/01/25/about-the-performance-of-map-reduce-jobs/
Tuesday, 17 January 2012
Asking for Performance and Scalability Advice on StackOverflow
How many times have you got an answer that applies to your specific scenario when providing a short list of performance and scalability requirements? MySQL/InnoDB can do 750k qps, Cassandra is scaling linearly, MongoDB can do 8 mil ops/s. Is any of these the answer for your application?
Actually:
-
How many times did you get all the requirements right at the spec time?
-
How many times did requirements remain the same during the development cycle?
-
How many times did production reality confirmed your bullet list requirements?
Original title and link: Asking for Performance and Scalability Advice on StackOverflow (©myNoSQL)
Wednesday, 8 June 2011
Optimizing MongoDB: Lessons Learned at Localytics
These slides have generated quite a reaction on Twitter. I’ll let you decide for yourself the reasons:
While there have been lots of retweets, here’s just a glimpse of what type of reactions I’m referring to:
Monday, 17 January 2011
VoltDB: 3 Concepts that Makes it Fast
John Hugg lists the 3 concepts that make VoltDB fast:
- Exploit repeatable workloads: VoltDB exclusively uses a stored procedure interface.
- Partition data to horizontally scale: VoltDB devides data among a set of machines (or nodes) in a cluster to achieve parallelization of work and near linear scale-out.
- Build a SQL executor that’s specialized for the problem you’re trying to solve.: If stored procedures take microseconds, why interleave their execution with a complex system of row and table locks and thread synchronization? It’s much faster and simpler just to execute work serially.
Let’s take a quick look at these.
Using stored procedures — instead of allowing free form queries — would allow the system:
- to completely skip query parsing, creating and optimizing execution plans at runtime
- by analyzing (at deploy time) the set of stored procedures, it might also be possible to generate the appropriate indexes
The benefits of horizontally partitioned data are well understood: parallelization and also easier and cost effective hardware usage.
Single threaded execution can also help by removing the need for locking and reducing data access contention.
While these 3 solutions are making a lot of sense and can definitely make a system faster, there’s one major aspect of VoltDB that’s missing from the above list and which I think is critical to explaining its speed: VoltDB is an in-memory storage solution.
Here are a couple of examples of other NoSQL databases that benefit from being in memory (or as close as possible to it). MongoDB, while being a lot more liberal with the queries it accepts, can deliver very fast results by keeping as much data in memory as possible — remember what happened when it had to hit the disk more often? — and using appropriate indexes where needed. Redis and Memcached can deliver amazingly fast results because they keep all data in-memory. And Redis is single threaded while Memcached is not.
Original title and link: VoltDB: 3 Concepts that Makes it Fast (NoSQL databases © myNoSQL)
Friday, 7 January 2011
High Rate insertion with MySQL and Innodb
On 8 core Opteron Box we were able to achieve 275K inserts/sec at which time we started to see load to get IO bound because of log writes and flushing dirty buffers. I’m confident you can get to 400K+ inserts/sec on faster hardware and disks (say better RAID or Flash) which is a very cool number. Of course, mind you this is in memory insertion in the simple table and table with long rows and bunch of indexes will see lower numbers.
There are more caveats in the article. Not sure though how to compare this number with the 750k qps on the NoSQLish MySQL.
Original title and link: High Rate insertion with MySQL and Innodb (NoSQL databases © myNoSQL)
via: http://www.mysqlperformanceblog.com/2011/01/07/high-rate-insertion-with-mysql-and-innodb/
Thursday, 16 December 2010
Deferring Processing Updates to Increase HBase Write Performance
Alex Baranau:
The idea behind deferred updates processing is to postpone updating of the existing record and store incoming deltas as a new record. Thus, record update operations become a simple write operations with corresponding performance. Deferred updates technique elaborated here fits well when system handles a lot of updates of stored data and write performance is the main concern, while reading speed requirements are not that strict. The following cases (each of them separately or any combination of them) may indicate that one can benefit from using the technique (the list is not complete):
- updates are well spread over the whole and large dataset
- lower rate (among write operations) of “true updates” (i.e. low percentage of writes are for completely new data, not really updates of existing data)
- good portion of data stored/updated may never be accessed
- system should be able to handle high write peaks without major performance degradation
Sounds like CQRS event stores.
Project available on ☞ GitHub.
Original title and link: Deferring Processing Updates to Increase HBase Write Performance (NoSQL databases © myNoSQL)
Thursday, 3 June 2010
Project Voldemort Performance Tool
Project Voldemort gets a performance tool from Roshan Sumbaly :
- Run using bin/voldemort-performance-tool.sh
- Has a warmup phase to insert records (—record-count)
- Various record selection distributions
- Can fix client throughput to measure latency under certain load
via: http://github.com/voldemort/voldemort/commit/8462031d2d8b676b27f533f8f7de5631c8eb70dd
Friday, 26 March 2010
Release: Production Ready MongoDB 1.4 Released
Judging by the number of posts I’ve seen around I’d guess you’ve already heard about the MongoDB 1.4 release[1]. Anyways, I definitely had to include it here as myNoSQL covers all major NoSQL projects and follows closely all things related to the NoSQL ecosystem.
While some MongoDB users seemed quite excited about the addition of ☞ geospatial indexing, others about some ☞ query language improvements, the things that caught my attention were:
- background indexing and indexing improvements
- concurrency improvements
- the lack of autosharding (still alpha, still pushing, still…)
- the lack of improvements or alternatives for the MongoDB durability tradeoff
Speaking of performance, the 10gen people[2] have run some benchmarks comparing MongoDB 1.2 with MongoDB 1.4. Without a couple of exceptions, the performance haven’t improved radically, so I’d speculate that there is still a lot of locking involved. The benchmark source code was made available[3] so you can dig deeper into it.
All in all, good and exciting news for the NoSQL world!
References
- [1] ☞ MongoDB 1.4 Release Notes (↩)
- [2] ☞ MongoDB 1.4 Performance (↩)
- [3] ☞ Benchmark code (↩)
Thursday, 24 December 2009
CouchDB vs MongoDB: An attempt for a More Informed Comparison
After posting about Scott Motte’s comparison of MongoDB and CouchDB, I thought there should be some more informative sources out there, so I’ve started to dig.
The first I came upon (thanks to Debasish Ghosh @debasishg) is an article about ☞ Raindrop requirements and the issues faced while attacking them with CouchDB and the pros and cons of possibly replacing CouchDB with MongoDB:
[Pros]
- Uses update-in-place, so the file system impact/need for compaction is less if we store our schemas in one document are likely to work better.
- Queries are done at runtime. Some indexes are still helpful to set up ahead of time though.
- Has a binary format for passing data around. One of the issues we have seen is the JSON encode/decode times as data passes around through couch and to our API layer. This may be improving though.
- Uses language-specific drivers. While the simplicity of REST with CouchDB sounds nice, due to our data model, the megaview and now needing a server API layer means that querying the raw couch with REST calls is actually not that useful. The harder issue is trying to figure out the right queries to do and how to do the “joins” effectively in our API app code.
[Cons]
- easy master-master replication. However, for me personally, this is not so important. […] So while we need backups, we probably are fine with master-slave. To support the sometimes-offline case, I think it is more likely that using HTML5 local storage is the path there. But again, that is just my opinion.
- ad-hoc query cost may still be too high. It is nice to be able to pass back a JavaScript function to do the query work. However, it is not clear how expensive that really is. On the other hand, at least it is a formalized query language — right now we are on the path to inventing our own with the server API with a “query language” made up of other API calls.
Anyway while some of the points above are generic, you should definitely try to consider them through the Raindrop requirements perspective about which you can read more here.
Another article comparing MongoDB and CouchDB is hosted by ☞ MongoDB docs. I find it well balanced and you should read it all as it covers a lot of different aspects: horizontal scalability, query expressions, atomicity, durability, mapreduce support, javascript, performance, etc.
I’d also mention this ☞ benchmark comparing the performance of MongoDB, CouchDB, Tokyo Cabinet/Tyrant (note: the author of the benchmark is categorizing Tokyo Cabinet as a document database, while Tokyo is a key-value store) and uses MySQL results as a reference.
In case you have other resources that you think would be worth including do not hesitate to send them over.
Update: Just found a nice comparison matrix [1].
As a teaser, very soon I will introduce you to a new solution available in this space, so make sure to check MyNoSQL regularly.
Update: The main article about this new document store has been published: Terrastore: A Consistent, Partitioned and Elastic Document Database. I would strongly encourage you to check it, as Terrastore is looking quite promising.
Most Popular Articles
- Translate SQL to MongoDB MapReduce
- Tutorial: Getting Started With Cassandra
- CouchDB vs MongoDB: An attempt for a More Informed Comparison
- Cassandra @ Twitter: An Interview with Ryan King
- A Couple of Nice GUI Tools for MongoDB
- NoSQL benchmarks and performance evaluations
- Ehcache: Distributed Cache or NoSQL Store?
- Document Databases Compared: CouchDB, MongoDB, RavenDB
- Quick Review of Existing Graph Databases
- NoSQL Data Modeling
