ALL COVERED TOPICS

NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter

NAVIGATE MAIN CATEGORIES

Close

mongodb: All content about mongodb in NoSQL databases and polyglot persistence

Comparing NoSQL backup solutions

In a post introducing HyperDex backups, Robert Escriva compares the different backup solutions available in Cassandra, MongoDB, and Riak:

Cassandra: Cassandra’s backups are inconsistent, as they are taken at each server independently without coordination. Further, “Restoring from snapshots and incremental backups temporarily causes intensive CPU and I/O activity on the node being restored.”

MongoDB: MongoDB provides two backup strategies. The first strategy copies the data on backup, and re-inserts it on restore. This approach introduces high overhead because it copies the entire data set without opportunity for incremental backup.

The second approach is to use filesystem-provided snapshots to quickly backup the data of a mongod instance. This approach requires operating system support and will produce larger backup sizes.

Riak: Riak backups are inconsistent, as they are taken at each server independently without coordination, and require care when migrating between IP addresses. Further, Riak requires that each server be shut down before backing up LevelDB-powered backends.

How is HyperDex’s new backup described:

The HyperDex backup/restore process is strongly consistent, doesn’t require shutting down servers, and enables incremental backup support. Further, the process is quite efficient; it completes quickly, and does not consume CPU or I/O for extended periods of time.

The caveat is that HyperDex puts the cluster in read-only mode for backing up. That’s loss of availability. Considering both Cassandra and Riak promise is high availability, their choice was clear.

Update: This comment from Emin Gün Sirer makes me wonder if I missed something:

HyperDex quiesces the network, takes a snapshot, resumes. Whole operation takes sub-second latency.

The key point is that the system is online, available while the data copying is taking place.

Original title and link: Comparing NoSQL backup solutions (NoSQL database©myNoSQL)

via: http://hackingdistributed.com/2014/01/14/back-that-nosql-up/


MySQL is a great Open Source project. How about open source NoSQL databases?

In a post titled Some myths on Open Source, the way I see it, Anders Karlsson writes about MySQL:

As far as code, adoption and reaching out to create an SQL-based RDBMS that anyone can afford, MySQL / MariaDB has been immensely successful. But as an Open Source project, something being developed together with the community where everyone work on their end with their skills to create a great combined piece of work, MySQL has failed. This is sad, but on the other hand I’m not so sure that it would have as much influence and as wide adoption if the project would have been a “clean” Open Source project.

The article offers a very black-and-white perspective on open source versus commercial code. But that’s not why I’m linking to it.

The above paragraph made me think about how many of the most popular open source NoSQL databases would die without the companies (or people) that created them.

Here’s my list: MongoDB, Riak, Neo4j, Redis, Couchbase, etc. And I could continue for quite a while considering how many there are out there: RavenDB, RethinkDB, Voldemort, Tokyo, Titan.

Actually if you reverse the question, the list would get extremely short: Cassandra, CouchDB (still struggling though), HBase. All these were at some point driven by community. Probably the only special case could be LevelDB.

✚ As a follow up to Anders Karlsson post, Robert Hodges posted The Scale-Out Blog: Why I Love Open Source.

Original title and link: MySQL is a great Open Source project. How about open source NoSQL databases? (NoSQL database©myNoSQL)

via: http://karlssonondatabases.blogspot.com/2014/01/some-myths-on-open-source-way-i-see-it.html


Look how fast it is… actually it’s not, but who cares

This is how it goes:

  1. someone declares a solution being fast. It’s usually a micro benchmark presented with almost no context.
  2. then someone else shows better numbers from a competing product. It’s a similar micro benchmark performed with a completely different hardware. An apple-to-oranges comparison.
  3. the first person revists the topic and says that actually performance doesn’t matter.

What’s wrong with this?

  1. most of the readers will only see the first post. The attraction for numbers is irresistible.
  2. the very few people seeing the second type of post will already be segregated and dismiss the other results.

The bottom line is that we end up with 2 posts with irrelevant numbers that each group could use to claim theirs is bigger than others. And very few actually learn what’s so (completely) wrong about them.

Original title and link: Look how fast it is… actually it’s not, but who cares (NoSQL database©myNoSQL)


From MySQL to MongoDB and back - The world’s biggest biometrics database

The main subject of the article “Inside India’s Aadhar, The World’s Biggest Biometrics Database” published on TechCrunch is about possible information leaks, privacy issues, etc.. But I have found some interesting bits about the databases used towards its end:

Sudhir Narayana, assistant director general at Aadhar’s technology center, told me that MongoDB was among several database products, apart from MySQL, Hadoop and HBase, originally procured for running the database search. Unlike MySQL, which could only store demographic data, MongoDB was able to store pictures.

That’s the warning sign right there. You can already see what follows:

However, Aadhar has been slowly shifting most of its database related work to MySQL, after realizing that MongoDB was not being able to cope with massive chunks of data, millions of packets.

✚ You can see more details about Aadhaar’s complex database architecture in Big Data at Aadhaar With Hadoop, HBase, MongoDB, MySQL, and Solr

Original title and link: From MySQL to MongoDB and back - The world’s biggest biometrics database (NoSQL database©myNoSQL)

via: http://techcrunch.com/2013/12/06/inside-indias-aadhar-the-worlds-biggest-biometrics-database/


TokuMX transactions for MongoDB

In two posts, the Tokutek guys are explaining how transactions work on TokuMX, the replacement engine they are proposing to MongoDB users—remember that Vadim Tkachenko (“MySQL Performance blog“) called TokuMX the InnoDB for MongoDB?:

  1. the what: Introducing TokuMX transactions for MongoDB applications

    • For each statement that tries to modify a TokuMX collection, either the entire statement is applied, or none of the statement is applied. A statement is never partially applied.
    • Commands beginTransaction, commitTransaction`, androllbackTransaction` have been added to allow users to perform multi-statement transactions.
    • TokuMX queries use multi-version concurrency control (MVCC). That is, queries operate on a snapshot of the system that does not change for the duration of the query. Concurrent inserts, updates, and deletes do not affect query results (note this does not include file operations like removing a collection).
  2. the benefits: Four benefits of TokuMX transactions for MongoDB applications:

    1. cursors represent a true snapshot of the system
    2. simpler to batch inserts together for performance
    3. simpler for applications to update multiple documents with a single statement
    4. no need to combine documents together for the purpose of atomicity

✚ I’d find TokuMX’s transactions even more interesting if they would work by default at a shard level instead of cluster level. Users would need to manually configure cluster-wise transaction thus remaining in control of the performance and availability.

✚ I still have my doubts about TokuMK’s positioning, but that’s a business & marketing story.

Original title and link: TokuMX transactions for MongoDB (NoSQL database©myNoSQL)


MongoDB performance bottlenecks and optimization strategies for MongoDB

Mikita Manko goes through a list of bottlenecks in MongoDB and suggest different ways to alleviate the pain (when possible):

I will try to describe here all potential performance bottlenecks and possible solutions and tips for performance optimization, but first of all – You should to ensure that MongoDB was the right choice for your project.

Original title and link: MongoDB performance bottlenecks and optimization strategies for MongoDB (NoSQL database©myNoSQL)

via: http://www.mikitamanko.com/blog/2013/12/06/mongodb-performance-bottlenecks-optimization-strategies-for-mongodb/


Defending Mongodb

Siddharth Ravichandran:

My post however, aims at highlighting the areas where Mongodb works and how it performed brilliantly for us. As someone leading the engineering efforts for a shipping and logistics company I wasn’t too happy initially to see Mongodb being used as the primary datastore but after 2 years I’m more than sure that this was definitely the datastore for us. I’ve outlined areas that confused me when i first encountered them only to learn that they were actually invaluable features that were available to me.

I fail to understand how someone can defend the bad usage of locks or the now-not-anymore-default fire and forget behavior of a database. Maybe that’s why the original title is “Mongodb — not evil, just misunderstood”.

Original title and link: Defending Mongodb (NoSQL database©myNoSQL)

via: http://siddharth-ravichandran.com/2013/12/04/mongodb-not-evil-just-misunderstood/


Tuning MongoDB Performance with MMS

MongoLab:

At MongoLab we manage thousands of MongoDB clusters and regularly help customers optimize system performance. Some of the best tools available for gaining insight into our MongoDB deployments are the monitoring features of MongoDB Management Service (MMS). […] Here we focus primarily on the metrics provided by MMS but augment our analysis with specific log file metrics as well.

There’s definitely something you can learn from guys whose business is running MongoDB.

✚ I continue to be impressed with 10genMongoDB Inc.’s MMS service.

Original title and link: Tuning MongoDB Performance with MMS (NoSQL database©myNoSQL)

via: http://mongodb.info/2013/12/04/new-blog-tuning-mongodb-performance-with-mms/


Does Meteor scale? Polling for changes in MongoDB

Jon James in an article looking at the scalability of the Meteor framework, which uses MongoDB as its database:

Meteor is all real-time, which it currently achieves by fetching and comparing documents after every database write operation. Meteor also polls the database for changes every 10 seconds. These are the main bottlenecks when scaling Meteor, and they introduce two main issues:

  1. The polling and comparing logic takes a lot of CPU power and network I/O.
  2. After a write operation, there is no way to propagate changes to other Meteor instances in real-time. Changes will only be noticed the next time Meteor polls (~10 seconds).

While from a developer’s perspective getting automatic updates is probably a pretty cool feature, polling a database is not going to get them very far. The author suggests using MongoDB’s opslog as a source of changes. That could work.

Original title and link: Does Meteor scale? Polling for changes in MongoDB (NoSQL database©myNoSQL)

via: http://meteorhacks.com/does-meteor-scale.html


The SSL performance overhead in MongoDB and MySQL

How to use MongoDB with SSL:

As you can see the SSL overhead is clearly visible being about 0.05ms slower than a plain connection. The median for the inserts with SSL is 0.28ms. Plain connections have a median at around 0.23ms. So there is a performance loss of about 25%. These are all just rough numbers. Your mileage may vary.

Then 2 posts on “MySQL Performance Blog“: SSL Performance Overhead in MySQL and MySQL encryption performance, revisited:

Some of you may recall my security webinar from back in mid-August; one of the follow-up questions that I was asked was about the performance impact of enabling SSL connections. My answer was 25%, based on some 2011 data that I had seen over on yaSSL’s website, but I included the caveat that it is workload-dependent, because the most expensive part of using SSL is establishing the connection.

These 2 articles are diving much deeper and more scientifically into the impact of using SSL with MySQL. The results are interesting and the recommendations are well worth spending the time reading them.

Original title and link: The SSL performance overhead in MongoDB and MySQL (NoSQL database©myNoSQL)


A MongoDB data recovery tale

Excellent write up by MongoHQ’s Paul Rubin of a data recovery story:

The friend-​​of-​​a-​​friend had not been run­ning Mon­goDB with us, but had been run­ning Mon­goDB at a bud­get VPS host. Their data­base on the bud­get VPS host had worked fine, until the host had a hard­ware crash. And as is all too usual in these sto­ries, their self-​​hosted plat­form had no usable back­ups less than about six weeks old.

While this story might be helpful, I really wish you’ll never need it; but just in case bookmark it.

Original title and link: A MongoDB data recovery tale (NoSQL database©myNoSQL)

via: http://blog.mongohq.com/shipwrecked-a-mongodb-data-recovery-tale/


Efficient techniques for fuzzy and partial matching in MongoDB

A pretty interesting post by John Page:

This blogpost describes a number of techniques, in MongoDB, for efficiently finding documents that have a number of similar attributes to a supplied query whilst not being an exact match. This concept of “Fuzzy” searching allows users to avoid the risks of failing to find important information due to slight differences in how was entered.

In the presented incarnation, these 4 solutions are MongoDB specific, but some of them could be easily generalized.

Original title and link: Efficient techniques for fuzzy and partial matching in MongoDB (NoSQL database©myNoSQL)

via: http://ilearnasigoalong.blogspot.com/2013/10/efficient-techniques-for-fuzzy-and.html