ALL COVERED TOPICS

NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter

NAVIGATE MAIN CATEGORIES

Close

mongodb: All content about mongodb in NoSQL databases and polyglot persistence

Does Meteor scale? Polling for changes in MongoDB

Jon James in an article looking at the scalability of the Meteor framework, which uses MongoDB as its database:

Meteor is all real-time, which it currently achieves by fetching and comparing documents after every database write operation. Meteor also polls the database for changes every 10 seconds. These are the main bottlenecks when scaling Meteor, and they introduce two main issues:

  1. The polling and comparing logic takes a lot of CPU power and network I/O.
  2. After a write operation, there is no way to propagate changes to other Meteor instances in real-time. Changes will only be noticed the next time Meteor polls (~10 seconds).

While from a developer’s perspective getting automatic updates is probably a pretty cool feature, polling a database is not going to get them very far. The author suggests using MongoDB’s opslog as a source of changes. That could work.

Original title and link: Does Meteor scale? Polling for changes in MongoDB (NoSQL database©myNoSQL)

via: http://meteorhacks.com/does-meteor-scale.html


The SSL performance overhead in MongoDB and MySQL

How to use MongoDB with SSL:

As you can see the SSL overhead is clearly visible being about 0.05ms slower than a plain connection. The median for the inserts with SSL is 0.28ms. Plain connections have a median at around 0.23ms. So there is a performance loss of about 25%. These are all just rough numbers. Your mileage may vary.

Then 2 posts on “MySQL Performance Blog“: SSL Performance Overhead in MySQL and MySQL encryption performance, revisited:

Some of you may recall my security webinar from back in mid-August; one of the follow-up questions that I was asked was about the performance impact of enabling SSL connections. My answer was 25%, based on some 2011 data that I had seen over on yaSSL’s website, but I included the caveat that it is workload-dependent, because the most expensive part of using SSL is establishing the connection.

These 2 articles are diving much deeper and more scientifically into the impact of using SSL with MySQL. The results are interesting and the recommendations are well worth spending the time reading them.

Original title and link: The SSL performance overhead in MongoDB and MySQL (NoSQL database©myNoSQL)


A MongoDB data recovery tale

Excellent write up by MongoHQ’s Paul Rubin of a data recovery story:

The friend-​​of-​​a-​​friend had not been run­ning Mon­goDB with us, but had been run­ning Mon­goDB at a bud­get VPS host. Their data­base on the bud­get VPS host had worked fine, until the host had a hard­ware crash. And as is all too usual in these sto­ries, their self-​​hosted plat­form had no usable back­ups less than about six weeks old.

While this story might be helpful, I really wish you’ll never need it; but just in case bookmark it.

Original title and link: A MongoDB data recovery tale (NoSQL database©myNoSQL)

via: http://blog.mongohq.com/shipwrecked-a-mongodb-data-recovery-tale/


Efficient techniques for fuzzy and partial matching in MongoDB

A pretty interesting post by John Page:

This blogpost describes a number of techniques, in MongoDB, for efficiently finding documents that have a number of similar attributes to a supplied query whilst not being an exact match. This concept of “Fuzzy” searching allows users to avoid the risks of failing to find important information due to slight differences in how was entered.

In the presented incarnation, these 4 solutions are MongoDB specific, but some of them could be easily generalized.

Original title and link: Efficient techniques for fuzzy and partial matching in MongoDB (NoSQL database©myNoSQL)

via: http://ilearnasigoalong.blogspot.com/2013/10/efficient-techniques-for-fuzzy-and.html


Forbes Top 10 Most Funded Big Data Startups

  • MongoDB (formerly 10gen) $231m Document-oriented database
  • Mu Sigma $208m Data-Science-as-a-Service
  • Cloudera $141m Hadoop-based software, services and training
  • Opera Solutions $114 Data-Science-as-a-Service
  • Hortonworks $98 Hadoop-based software, services and training
  • Guavus $87 Big data analytics solution
  • DataStax $83.7 Cassandra-based big data platform
  • GoodData $75.5 Cloud-based platform and big data apps
  • Talend $61.6 App and business process integration platform
  • Couchbase $56 Document-oriented database

I’m not really sure there are any conclusions one could make based only on this data.

Original title and link: Forbes Top 10 Most Funded Big Data Startups (NoSQL database©myNoSQL)

via: http://www.forbes.com/sites/gilpress/2013/10/30/top-10-most-funded-big-data-startups-updated/


Scaling MongoDB at Mailbox

The story—a quite long and interesting one—of moving a MongoDB collection from one cluster to a new one:

While MongoDB allows you to add shards to a MongoDB cluster easily, we wanted to spare ourselves potential long-term pain by moving one of the most frequently updated MongoDB collections, which stores email-related data, to its own cluster. We theorized that this would, at a minimum, cut the amount of write lock contention in half. While we could have chosen to scale by adding more shards, we wanted to be able to independently optimize and administer the different types of data separately.

I’m not an ops person and I don’t know what the optimal process is. Hopefully readers will share their expectations.

Original title and link: Scaling MongoDB at Mailbox (NoSQL database©myNoSQL)

via: https://tech.dropbox.com/2013/09/scaling-mongodb-at-mailbox/


Moving from MongoDB to Riak

Basho guys summarizing Customer.io’s migration from MongoDB to Riak:

Yesterday, Customer.io announced that they upgraded their architecture – moving from MongoDB to Riak. As described in their recent blog post, the move to Riak has provided an immediate and dramatic performance boost.

Wait, it’s not a migration but an upgrade.

Original title and link: Moving from MongoDB to Riak (NoSQL database©myNoSQL)

via: http://basho.com/customer-io-gains-6x-speed-improvement-by-moving-from-mongodb-to-riak/


10gen changes name to MongoDB Inc

That’s all.

Well, except I couldn’t miss this one:

Original title and link: 10gen changes name to MongoDB Inc (NoSQL database©myNoSQL)


Top Five MongoDB Alerts

The 5 alerts 10gen is recommending to use with their MongoDB Management Service:

  • Host Recovering (All, but by definition Secondary)
  • Repl Lag (Secondary)
  • Connections (All mongos, mongod)
  • Lock % (Primary, Secondary)
  • Replica (Primary, Secondary)
  1. It’s great that MMS offers help to their customers with these alerts;
  2. These also represent the top 5 problems you might have with a MongoDB deployment. And alerting is not going to help you fix them. So you better have a well established and rehearsed plan for each.
  3. Or you could use one of those solutions, like this or this, that don’t wake you at night.

Original title and link: Top Five MongoDB Alerts (NoSQL database©myNoSQL)

via: http://www.10gen.com/blog/post/five-mms-monitoring-alerts-keep-your-mongodb-deployment-track


What's really in it for MongoDB's 3rd parties?

Luca Olivari, Director of Business Development at 10gen:

With MongoDB you can cover 80% of the use cases of Relational plus NoSQL databases.

Leaving aside for a second the last part of this sentence as being obviously not accurate, let’s look at what the first part might mean:

  1. fewer than 20% of the use cases need strong transactional semantics
  2. fewer than 20% of the use cases need strong data integrity constraints
  3. fewer than 20% of the use cases require integration with other existing data processing tools that imply SQL access
  4. fewer than 20% of the use cases require one or more of the still unique to relational database features (triggers, materialized views, etc.)
  5. fewer than 20% of use cases require to be always available.

I’d (probably) be OK with the fact that each of the above could be true, but I don’t believe that adding together all these cases makes only for 20% of the use cases.

So, what’s another answer to the question:

If you were to choose a new technology, what would you choose? There’s a chance that you’ll pick the one that gives you more advantages in more cases.

It’s well known for many that adoption, thus opportunity, is not always related to the technological merits. Actually most of the time a 3rd party business opportunity is directly connected with the complexity or incompleteness or fragility of a technology.

If you’d be a business, wouldn’t you choose a market where there is sizable opportunity but the competition (nb your competition, not the solution competition) is not that strong and there’s a chance for recurring business (i.e. a business that requires a client to call multiple times is definitely better than one which once delivered it just works).

Original title and link: What’s really in it for MongoDB’s 3rd parties? (NoSQL database©myNoSQL)

via: http://dataandco.blogspot.com/2013/08/mongodb-alliances-series-part-i-what.html


How to speed up MongoDB Map Reduce by 20x

Antoine Girbal:

Looking back, we’ve started at 1200s and ended at 60s for the same MR job, which represents a 20x improvement! This improvement should be available to most use cases, even if some of the tricks are not ideal (e.g. using multiple output dbs / collections). Nevertheless this can give people ideas on how to speed up their MR jobs and hopefully some of those features will be made easier to use in the future. The following ticket will make ‘splitVector’ command more available, and this ticket will improve multiple MR jobs on the same database.

Looking back at the article, it reads like a series of tricks to go around the limitations of MongoDB’s MapReduce implementation:

  1. a single thread use for MapReduce jobs
  2. lock contention
  3. BSON-to-JSON-and-back serializations

Original title and link: How to speed up MongoDB Map Reduce by 20x (NoSQL database©myNoSQL)

via: http://edgystuff.tumblr.com/post/54709368492/how-to-speed-up-mongodb-map-reduce-by-20x


TokuMX means for MongoDB the same as InnoDB to MySQL

Vadim Tkachenko (MySQL Performance blog) about TokuMX, the fractal tree-based storage for MongoDB:

Why is TokuMX interesting? A few reasons:

  • It comes with transactions, and all that good stuff that transactions provide: a concurrent access to documents (no more global write-lock in MongoDB); crash recovery; atomicity
  • Performance in IO-bound operations
  • A good compression rate, which is a money-saver if you use SSD/Flash
  • But it is also SSD/Flash life-time friendly, which is double money-saver

Some thoughts:

  1. TokuMX brings to the table some features that might not be top priorities or even features that 10gen wants into MongoDB.
    1. I seriously doubt 10gen engineering or sales are recommending TokuMX.
    2. While the advantages of the TokuMX engine are quite interesting, how isTokutek closing sales (considering 10gen is not sharing their list of customers)?
  2. How would this mix of 10gen and Tokutek work at the business level? I don’t think Tokutek wants to sell or that 10gen is ready to acquire/merge with Tokutek.
  3. How would this work for customers? The InnoDB-MySQL and TokuMX-MongoDB parallel looks good on paper, but I cannot imagine how a user will interact with these 2 providers. Buy a license from Tokutek, then pay 10gen support for MongoDB and then call both?
  4. How will this integration work long term considering the complete control 10gen has over the core MongoDB? While 10gen could come up with a compatibility certification, I don’t think they’ll actually do it (see point 1).

Original title and link: TokuMX means for MongoDB the same as InnoDB to MySQL (NoSQL database©myNoSQL)

via: http://www.mysqlperformanceblog.com/2013/06/25/tokumx-is-mongodb-on-steroids/