NoSQL comparison: All content tagged as NoSQL comparison in NoSQL databases and polyglot persistence
Thursday, 10 March 2011
From CouchDB to Riak at Linkfluence
We were already aware of Riak before we started using CouchDB, but we weren’t sure about trusting a new product at this point, so we decided, after some benchmark, to go for CouchDB.
After the first couple of months, it was obvious that this was a bad choice.
Our main problems with CouchDB is scalability, versioning and stability.
I am wondering how using BigCouch would have addressed Linkfluence requirements:
- easy to replicate (CouchDB already has replication)
- REST interface (CouchDB has that)
- master/slave (CouchDB replication is peer-to-peer)
- sharding (BigCouch was created to make CouchDB horizontally scalable)
and the stability/maintenance issues.
The article also gives an overview of Linkfluence polyglot persistence architecture:
- PostgreSQL: some indexes on documents’ ID
- MongoDB: store tweets relationships and some indexes
-
CouchDBRiak for content and metadata - Redis for caching
- Solr for search indexes
- ElasticSearch for secondary indexes
You might also enjoy some of the comments on the Hacker News thread.
Original title and link: From CouchDB to Riak at Linkfluence (NoSQL databases © myNoSQL)
via: http://labs.linkfluence.net/nosql/2011/03/07/moving_from_couchdb_to_riak.html
Friday, 4 March 2011
Redis: Le système de cache parfait
I love how this sounds in French:
Après 3 ans d’une histoire d’amour fidèle avec Memcached; le serveur de cache notamment utilisé par Facebook, Youtube ou Twitter; je suis au bord de la rupture après avoir rencontré redis.
The author, Julien Crouzet, mentions three key features of Redis:
- non-volatile data
- performance
- support for data types
On these points:
-
MemcacheDB and Membase are just two solutions that have solved the memcached data volatility.
-
Benchmarking Redis and memcached performance has resulted in a heated conversation:
But Redis’ support for data types (lists, sets, sorted sets, and hashes) is not up for debate.
Original title and link: redis : Le système de cache parfait (NoSQL databases © myNoSQL)
via: http://blog.juliencrouzet.fr/484/redis-le-systeme-de-cache-parfait/
Wednesday, 2 March 2011
Adku's Choice: Cassandra or HBase
The 8 6 reasons[1] Adku prefers Cassandra to HBase:
- Reliability
- Performance
- Consistency
Single point of failure- Hot spot problem
MapReduce- Simpler, Hackable
- Community support
Before jumping to any conclusions make sure you read the disclaimer:
While these decisions apply to Adku, they might not apply to your situation. Always do your own investigation and experimentation before choosing any large part of your system.
Update: JD Cryans2 commented on the points listed above (thanks JD):
This comparison reminds me of the pain we went through in the late 2009 when lots of similar comparisons came out from all sides — the “NoSQL war”. Unfortunately as we all found out, no one wins.
But let’s look at the points mentioned in this post.
Reliability: As far as I can tell that’s not a reliability test. The first thing that raises questions is the large number of crashes of the region servers. Considering the data set used (1 million rows of the full “Alice in Wonderland” text) is small compared to the ones other HBase users (StumbleUpon, Mozilla) are handling, that would point out to a configuration problem that wasn’t taken care of.
One could say it’s because HBase is hard to configure or that the default configurations aren’t good, and to some extent I agree, but you don’t quantify reliability based on these.
Hot Spot Problem: This point is an interesting one, and more likely falls into the disclaimer.
Distribution based on timestamp row keys will be better with Cassandra. But usually when using timestamps you also want range scans which is impossible with hashing. For example OpenTSDB provides a very efficient way to store time series by using a clever row key design. A design that you’ll probably also use if you need scans in Cassandra.
Not to mention that using MapReduce will require sorted row keys anyways.
Community Support: Comparing communities only based on the number of IRC users is too much of a simplification. Someone looking to use an open source project should spend some time getting to know and interact with the users before stating that “one community is more helpful” than the other — a message that could also be perceived as disrespectful.
There are also a couple of points that are mentioned in the post even if HBase is the “winner” (MapReduce) or the feature is not a hard requirement (consistency).
I left performance last as the post mentions similar write performance results. But there is too little information about the benchmark to be able to comment on it. At first glance those results look weird considering they weren’t using a Hadoop version that supports append, which as shown by the original YCSB paper would make quite a difference.
After the Adku blog came out, Edward Capriolo wrote this response (rant?) to all who try to do the same as them and I think it’s worth the read: http://www.edwardcapriolo.com/roller/edwardcapriolo/entry/myths_rumors_fud_hate_nosql
Original title and link: Adku’s Choice: Cassandra or HBase (NoSQL databases © myNoSQL)
Monday, 28 February 2011
Benchmarking MongoDB
The code is purposely a naive implementation, to test how fast each back end is without resorting to optimizations, hacks or tricks. There are probably ways of making it much faster. And even though the production code will be very different to this early experiment, it is not an evil, synthetic micro-benchmark: on the contrary, it is a real application!
You could say that being a benchmark for a specific scenario the results are relevant in that context. But I’d also include the following two checks:
- inserting some rogue data and try to recover
- run a kill -9 midway through the import
Original title and link: Benchmarking MongoDB (NoSQL databases © myNoSQL)
via: http://tobami.wordpress.com/2011/02/28/benchmarking-mongodb/
Monday, 21 February 2011
Project Voldermort and Terrastore: Key-Value vs Document Stores
It is an apples to oranges comparison, but it underlines, from a beginner perspective, the major differences between a pure key-value store (Project Voldemort) and a document database (Terrastore):
Being a simpler KV store than Terrastore, to my understanding Project Voldemort offers no ability to leverage the server to evaluate the Values. In order to, for example, produce a list of documents whose “publish date” is in the past, it is necessary to either fetch all documents and evaluate the publish date each time this operation is needed — or — manage a lookup list of document IDs that were “published” when the lookup list was created.
In the end, the author also emphasizes how important the first impression is: clean documentation, simple installation, etc.. Or differently put, an end user judges a project by how fast he can start using it.
Original title and link: Project Voldermort and Terrastore: Key-Value vs Document Stores (NoSQL databases © myNoSQL)
via: http://groups.google.com/group/terrastore-discussions/msg/8e16342222deadbf
Friday, 31 December 2010
NoSQL Comparison: Cassandra, CouchDB, HBase, MongoDB, Redis, Riak
Just before the end of year, a brief comparison — bullet style — of Cassandra, CouchDB, HBase, MongoDB, Redis, and Riak:
But the differences between “NoSQL” databases are much bigger than it ever was between one SQL database and another. This means that it is a bigger responsibility on software architects to choose the appropriate one for a project right at the beginning.
Original title and link: NoSQL Comparison: Cassandra, CouchDB, HBase, MongoDB, Redis, Riak (NoSQL databases © myNoSQL)
via: http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redis
Thursday, 9 December 2010
Planning for Data Migration
From the Amazon ☞ Migrating your Existing Applications to the AWS Cloud paper (PDF):
- What are the different storage options available in the cloud today?
- What are the different RDBMS (commercial and open source) options available in the cloud today?
- What is my data segmentation strategy? What trade-offs do I have to make?
- How much effort (in terms new development, one-off scripts) is required to migrate all my data to the cloud?
When choosing the appropriate storage option, one size does not fit all (nb: my emphasis). There are several dimensions that you might have to consider so that your application can scale to your needs appropriately with minimal effort. You have to make the right tradeoffs among various dimensions - cost, durability, query-ability, availability, latency, performance (response time), relational (SQL joins), size of object stored (large, small), accessibility, read heavy vs. write heavy, update frequency, cache-ability, consistency (strict, eventual) and transience (short-lived).
Just replace the words “cloud” and “AWS” with NoSQL database and you get a good base for your migration plan.
Original title and link: Planning for Data Migration (NoSQL databases © myNoSQL)
Friday, 3 December 2010
Why NoSQL … Why Not
Interesting article from Xeround Avi Kapuya ☞ NoSQL: The Sequel. Couple of comments though:
Why NoSQL
In other words, in SQL, the data model does not enforce a specific way to work with the data — it is built with an emphasis on data integrity, simplicity, data normalization and abstraction, which are all extremely important for large complex applications.
I’d say that data normalization is not a goal per se, but a solution to a problem (data duplication, frequent updates to common entities). But what if this solution is introducing another bigger problem (read JOINs)?
The NoSQL approach presents huge advantages over SQL databases because it allows one to scale an application to new levels
Plus it may give you more flexibility in your data model, plus it may be a better (as in operational, complexity, performance, etc.) storage for different formats of data.
Why not NoSQL
At the system level, data models are key*. Not having a skilled authority to design a single, well-defined data model, regardless of the technology used, has its drawbacks.
Actually I think the reality might be a bit different. Because NoSQL imposes a “narrow predefined access pattern” it will require one to spend more time understanding and organizing data. Secondly, the final model will reflect and be based on the reality of the application, on not only on pure theory (as is the case with most initial relational model designs).
At the architecture level, two major issues are interfaces and interoperability. Interfaces for the NoSQL data services are yet to be standardized.
The interface limitation is a temporary issue in terms of getting more/better/quicker tooling support and probably a longer term issue for developers needing to learn different models. But as we’ve agreed, NoSQL has a small, predefined access mode and so we are not talking about learning completely new languages.
Personally, I think the real issue is steep learning curve of understanding each of these NoSQL databases semantics and operational behavior then not having a common API.
Interoperability is an important point, especially when data needs to be accessed by multiple services.
I’m not seeing the problem here. As far as I know each relational database is coming with its per-language drivers. On the NoSQL side, there are already quite a few products using standard protocols.
Moving to the operational realm, here, from my experience, lies the toughest resistance, and rightfully so… The operational environment requires a set of tools that is not only scalable but also manageable and stable, be it on the cloud or on a fixed set of servers. […] Operation needs to be systematic and self contained.
Now, this is completely the other way around. If you read any large scale application story, you’ll notice the pattern: the operational costs where a significant factor in deciding to use NoSQL. Just check the stories of Twitter, Adobe, Adobe products, Facebook. Complexity is a fundamental dimension of scalability and right now the balance is towards NoSQL databases .
It is my opinion that a SQL database built on NoSQL foundations can provide the highest value to customers who wish to be both agile and efficient while they grow.
Unfortunately I don’t think that’s actually possible or at least not for all solutions. But If we just want some common access language, we will probably get it.
If, on the other hand, what we want is more tunable and scenario specific engines, we will probably get these too. (nb: as far as I’ve heard the PostgreSQL community is learning a lot from the various NoSQL databases and trying to bring in as many of the good ideas they can).
Conclusion
My conclusion is simple. As with programming languages where we are not stuck with COBOL, polyglot persistence is here to stay and it’ll only get better.
Original title and link: Why NoSQL … Why Not (NoSQL databases © myNoSQL)
Tuesday, 23 November 2010
Another NoSQL Comparison: Evaluation Guide
The requirements were clear:
- Fast data insertion.
- Extremely fast random reads on large datasets.
- Consistent read/write speed across the whole data set.
- Efficient data storage.
- Scale well.
- Easy to maintain.
- Have a network interface.
- Stable, of course.
The list of NoSQL databases to be compared: Tokyo Cabinet, BerkleyDB, MemcacheDB, Project Voldemort, Redis, and MongoDB, not so clear.
The methodology to evaluate and the results definitely not clear at all.
And the conclusion is quite wrong:
Although MongoDB is the solution for most NoSQL use cases, it’s not the only solution for all NoSQL needs.
There were a couple of people asking for more details about my comments on this NoSQL comparison, so here they are:
- the initial list of NoSQL databases to be evaluated looks at the first glance a bit random. It includes some not so used solutions (memcachedb), some that are not , while leaving aside others that at least at the high level would correspond to the characteristics of others in the list (Riak, Membase)
- another reason for considering the initial choice a bit random is that while scaling is listed as one of the requirements, the only truly scalable in the list would be Project Voldemort. The recently added auto-sharding and replica sets would make MongoDB a candidate too, but a search on the MongoDB group would show that the solution is still young
- even if the set of requirements is clear, there’s no indication of what kind of evaluation and how was it performed. Without knowing what and how and it is impossible to consider the results as being relevant.
- as Janl was writing about benchmarks, most of the time you are doing it wrong. Creating good, trustworthy, useful, relevant benchmarks is very difficult .
- the matrix lists characteristics that are difficult to measure. And there are no comments on how the thumbs up were given. Examples: what is manageability and how was that measured? Same questions for stability and feature set.
- because most of it sounds speculative here are a couple of speculations:
- judging by the thumbs up MongoDB received for insertion/random reads for large data set, I can assume that data hasn’t overpassed the available memory. But on the other hand, Redis was dismissed and received less votes due to its “more” in-memory character
- Tokyo Cabinet and Redis project activity and community were ranked the same. When was the last release of Tokyo Cabinet?
- I’m leaving up to you to decide why the conclusion — “Although MongoDB is the solution for most NoSQL use cases”” is wrong.
Original title and link: Another NoSQL Comparison: Evaluation Guide (NoSQL databases © myNoSQL)
via: http://perfectmarket.com/blog/not_only_nosql_review_solution_evaluation_guide_chart
Monday, 22 November 2010
RavenDB and CouchDB Compared
A fair emphasis on what differentiates RavenDB from CouchDB (nb coming from RavenDB creator). Just to mention the most interesting ones:
- transactions: support for single document, document batch, multi request, multi node transactions […]
- set based operations:
update active = false where last_login < '2010-10-01'- includes and live projections (local data only)
Original title and link: RavenDB and CouchDB Compared (NoSQL databases © myNoSQL)
via: http://ayende.com/Blog/archive/2010/10/17/ravendb-in-comparison-to-couchdb.aspx
Monday, 15 November 2010
Railo Cache Benchmark - CouchDB, MongoDB, RAM
They’re all fast, but what amazes me is how little difference there is between RAM vs MongoDB performance!
Not sure why that’d would be amazing considering MongoDB will keep all that data in memory. In fact I’d say that the interesting part is CouchDB performance considering it goes to the disk for each read.
Original title and link: Railo Cache Benchmark - CouchDB, MongoDB, RAM (NoSQL databases © myNoSQL)
via: http://zefer.posterous.com/railo-cache-benchmark-couchdb-mongodb-ram
Most Popular Articles
- Translate SQL to MongoDB MapReduce
- Tutorial: Getting Started With Cassandra
- CouchDB vs MongoDB: An attempt for a More Informed Comparison
- Cassandra @ Twitter: An Interview with Ryan King
- A Couple of Nice GUI Tools for MongoDB
- NoSQL benchmarks and performance evaluations
- Ehcache: Distributed Cache or NoSQL Store?
- Document Databases Compared: CouchDB, MongoDB, RavenDB
- Quick Review of Existing Graph Databases
- NoSQL Data Modeling
