MongoDB: All content tagged as MongoDB in NoSQL databases and polyglot persistence
Friday, 8 February 2013
MongoDB 2.4 Highlights
MongoDB 2.4 is just around the corner:
From Mike Friedman’s Roadmap slidedeck.
Original title and link: MongoDB 2.4 Highlights (©myNoSQL)
Thursday, 7 February 2013
MongoDB Is Still Broken by Design 5-0
My score after the first period was 4-1. But Emin Gün Sirer contested the 1 in the follow up post to 10gen’s reply:
Until recently, MongoDB did not talk about requestStart() and requestDone() in any context except when talking about how to ensure a very weak consistency requirement. Namely, if you don’t use this pair of operations, then a write to the database followed by a read from the database, by the same client, can return old values. So, I write 42 for key k with a WriteConcern.SAFE, read key k, and get some other number, because the Mongo driver can, by default, very well send the first request to one node over one connection, and the second one to another, over another connection. So requestStart() and requestDone() were billed as a mechanism to avoid that scenario; I saw no mention that they were required for correctness in multithreaded settings. I bet there is plenty of multithreaded code that does not follow that pattern. Such code is broken; if you’re a Mongo user, it’d be a good idea to check if you ever use getLastError without a bracketing requestStart() and Done().
5-0.
Original title and link: MongoDB Is Still Broken by Design 5-0 (©myNoSQL)
via: http://hackingdistributed.com/2013/02/07/10gen-response/#id2
10gen: MongoDB’s Fault Tolerance Is Not Broken…
Sitting comfortably? Check. Popcorn? Check. Let’s press play.
In an interview for InfoQ, 10gen’s Jared Rosoff replies to Emin Gün Sirer: “Broken by Design: MongoDB Fault Tolerance“.
-
MongoDB lies when it says that a write has succeeded
JR: “[…] Today the default behavior of official MongoDB drivers is Receipt Acknowledged, which means that you wait until the server has processed your write before returning to the client.”
The key word here is today. As clearly explained by Sirer, the behavior he described was in all versions of MongoDB and all the drivers until the 2.2 release and the drivers update in Nov.2012.
Basically the “fire-and-forget” behavior (nb: even this description is not accurate; a better one would be “load-and-forget”) has been the default for almost 3 years. With the 2.2 release and the corresponding drivers update it was changed to “Receipt acknowledged”. But new default acknowledges that the data was received on the server, but not that it was written anywhere. If you want your data to exist on multiple machines or be on a disk you need to use different settings.
-
Using getLastError slows down write operations.
JR: “GetLastError is underlying command in the MongoDB protocol that is used to implement write concerns. Intuitively, waiting for an operation to complete on the server is slower than not waiting for it.”
The problem here is not that the operation slows down because it has to wait for the acknowledgement. The real problem is that getLastError requires an extra network roundtrip. Not to mention that your old code was probably polluted by all these extra calls.
-
getLastError doesn’t work pipelined
JR: “[…] For many bulk loads, performing multiple inserts with periodic checks of getLastError is the right choice. […]”
Read the above.
-
getLastError doesn’t work multi-threaded
JR: ” Threads do not see getLastError responses for other thread’s operations. MongoDB’s getLastError command applies to the previous operation on a connection to the database, and not simply whatever random operation was performed last. […]”
If I’m reading this correctly, it seems like Sirer’ hypothesis was that connections can be shared acrossed threads. I have to agree that many drivers do not provide thread-safe connections. So, linking getLastError behavior to the current connection seems OK.
-
Write Concerns are broken
JR: “As described in the above sections, WriteConcerns provide a flexible toolset for controlling the durability of write operations applied to the database. You can choose the level of durability you want for individual operations, balanced with the performance of those operations. With the power to specify exactly what you want comes the responsibility to understand exactly what it is you want out of the database.”
Let’s look at what Sirer wrote:
[…] one could use
WriteConcern.SAFE,FSYNC_SAFEorREPLICAS_SAFEfor the insert operation [2]. There are 13 different concern levels, 8 of which seem to be distinct and presumably the remaining 5 are just kind of there in case you mistype one of the other ones. WriteConcern is at least well-named: it corresponds to “how concerned would you be if we lost your data?” and the potential answers are “not at all!”, “be my guest”, and “well, look like you made an effort, but it’s ok if you drop it.” Specifically, that’s three different kinds of SAFE, but none of them give you what you want: (1) SAFE means acknowledged by one replica but not written to disk, so a node failure can obliterate that data, (2) FSYNC_SAFE means written to a single disk, so a single disk crash can obliterate that data, and (3) REPLICAS_SAFE means it has been written to two replicas, but there is no guarantee that you will be able to retrieve it later.If you want a different explanation: even if there are 13 different WriteConcern types available, there is none that offers the option of having the data written to disk on more replicas.
The period is over and in my mind the score is clearly 4-1. But I know that this is not a real game and there will be some concluding that I’m not getting it. I’m OK living with that though.
Original title and link: 10gen: MongoDB’s Fault Tolerance Is Not Broken… (©myNoSQL)
Monday, 4 February 2013
When Data Is Worthless - Give MongoDB What Is MongoDB's
Emin Gün Sirer concluding a follow up post to MongoDB fault tolerance is broken by design:
So let us give onto Mongo what is clearly its: it’s mature software with a large install base. If it loses data, it’ll likely do so because of a deliberate design decision, rather than a bug. It’s easy to find Mongo hosting and it’s relatively easy to find people who are experienced with it. So if all your data is really of equal and low value, and you can afford to lose some of it, and your app’s needs are unlikely to grow, then MongoDB can be a fine pick for your application.
Original title and link: When Data Is Worthless - Give MongoDB What Is MongoDB’s (©myNoSQL)
via: http://hackingdistributed.com/2013/02/03/when-data-is-worthless/
MongoDB Fault Tolerance - Broken by Design
Emin Gün Sirer:
So, MongoDB is broken, and not just superficially so; it’s broken by design. If you’re relying on its fault tolerance,a you’re probably using it wrong.
He’s not the first to write about some of the critical issues in MongoDB. Every once in a while, there has been at least one post detailing one of these. But none of them stopped MongoDB’s adoption. Why? My only explanations are that:
- these posts are not reaching enough people;
- MongoDB is used mostly for simple scenarios where the occurance of such errors can be ignored
- those using MongoDB for more complicated scenarios have developed internal, application specific workarounds
Original title and link: MongoDB Fault Tolerance - Broken by Design (©myNoSQL)
Thursday, 31 January 2013
MongoDB Sequence Number Generators Using findAndModify and Spring Data
Yuan Ji:
I want a sequence number generator from MongoDB to give me unique sequence number. The operations are to return current sequence number and also increase the sequence number in the database. In MongoDB, command
findAndModifyatomically modifies and returns a single document. So we can use this command to query the sequence number and increase it by $inc function.
The solution is interesting and the post includes the Spring-ified code for it. But there’s no word of the possible issues this solution would run into related to contention to update the same record. Even with row level locking and things would be pretty bad for an app needing lots of sequence number. But with MongoDB’s database level locking, things could get pretty ugly.
Original title and link: MongoDB Sequence Number Generators Using findAndModify and Spring Data (©myNoSQL)
via: http://www.jiwhiz.com/post/2013/1/Add_A_Counter_To_MongoDB_With_Spring_Data
Wednesday, 30 January 2013
MongoDB Is Abusing JSON With Its Query Language
SM Sohan expresses his discontent with MongoDB’s JSON-based query representation:
I find the MongoDB API is abusing JSON in a really bad way. JSON is probably a good format for storing the documents in MongoDB, but using JSON for it’s weird API is simply a terrible idea.
Using JSON (nb BSON) for representing queries works well for basic equality matching. But I agree that for more advanced queries it looks quite forced. While definitely not perfect, I prefer (biasedly) RethinkDB’s API. On the other hand, I’m not aware of any proposals to make MongoDB’s query language better. Nor that there would be any willingness to change it.
Update: The (always) entertaining HN thread.
Original title and link: MongoDB Is Abusing JSON With Its Query Language (©myNoSQL)
Social Network Analysis of Apache CloudStack
Nice data experiment run by Sebastien Goasguen against the CloudStack mailing list:
To get the graphs I grabbed the emails archive from Apache. I used Python to load the mbox files into single Mongo collections. I cleaned the data to avoid replications of senders as well as remove JIRA and Review Board entries. Then with a little bit of PyMongo I made the queries and build the graph with NetworkX. Finished up with the graph visualization and calculations using Gephi. Since there are thousands of emails and threads, there is still some work to pre-process the data, avoid duplicates and match individuals to multiple email addresses.
Three questions:
- would using a graph database made this experiment easier?
- would Linkurious be able to generate these graphics?
- is the code available anywhere so someone else could try to use a graph database and maybe run other types of visualizations?
Original title and link: Social Network Analysis of Apache CloudStack (©myNoSQL)
via: http://sebgoa.blogspot.ch/2013/01/social-network-analysis-of-apache.html
Monday, 28 January 2013
What Is MongoDB's Architecture Based On?
Eric Knorr interviewed Dwight Merriman for InfoWorld: “10gen CEO: Why we’re the NoSQL leader“.
Q: You created the first widely successful NoSQL document database.What did you base that architecture on?
A: It wasn’t based on anything in particular; it’s a career’s worth of learning what works and what doesn’t. We were looking at cloud computing — at needs for horizontal scalability and how we wanted to write code — and we couldn’t find tools that did what we wanted.
Recently, new MongoDB hosting company ObjectRocket launched with the message: “The Cloud Is Broken for MongoDB“. Was it simply bad PR or someone didn’t get the history of MongoDB right?
Original title and link: What Is MongoDB’s Architecture Based On? (©myNoSQL)
Monday, 21 January 2013
11 Interesting Releases From the First Weeks of January
The list of releases I wanted to post about has been growing fast these last couple of weeks, so instead of waiting leaving it to Here it is (in no particular order1):
- (Jan.2nd) Cassandra 1.2 — announcement on DataStax’s blog. I’m currently learning and working on a post looking at what’s new in Cassandra 1.2.
- (Jan.10th) Apache Pig 0.10.1 — Hortonworks wrote about it
- (Jan.10th) DataStax Community Edition 1.2 and OpsCenter 2.1.3 — DataStax announcement
- (Jan.10th) CouchDB 1.0.4, 1.1.2, and 1.2.1 — releases fixing some security vulnerabilities
-
(Jan.11th) MongoDB 2.3.2 unstable — announcement. This dev release includes support for full text indexing. For more details you can check:
- MongoDB Full Text Search Explained and MongoDB Text Search Tutorial
- Full text search in MongoDB: details about supported languages and queries
- Indexing a Markdown blog using MongoDB full text indexing
- Short demo of MongoDB text search and hashed shard keys
- (Jan.12th) Apache HBase 0.94.4 — announcement and release notes
- (Jan.14th) Apache Hive 0.10.0: Hortonworks’s post about it
- (Jan.15th) Hortonworks Data Platform 1.2 featuring Apache Amabari — official PR announcement
- (Jan.16th) Redis 2.6.9 — release notes
- (Jan.16th) HyperDex 1.0RC1 — no docs
- (Jan.16th) Klout’s Brickhouse — announcement:
[…] an open source project extending Hadoop and Hive with a collection of useful user-defined-functions. Its aim is to make the Hive Big Data developer more productive, and to enable scalable and robust dataflows.
-
I’ve tried to order it chronologically, but most probably I’ve failed. ↩
Original title and link: 11 Interesting Releases From the First Weeks of January (©myNoSQL)
Friday, 18 January 2013
The Cloud Is Broken for MongoDB
The cloud is broken. It’s not designed to properly run persistent data stores like MongoDB. ObjectRocket is designed from the ground up to fix this problem.
Any snarky comments fit perfectly.
Original title and link: The Cloud Is Broken for MongoDB (©myNoSQL)
Thursday, 17 January 2013
How to Monitor MongoDB
A post by Pandora FMS team about monitoring options for MongoDB:
If MongoDB goes wrong, all your apps will fail. So monitoring the main variables and configuration parameters of your database is the best option to make sure that your values are right and your users are happy.
The post talks about the tools that come with MongoDB (mongotop, mongostat, the stats available through the MongoDB shell, logfiles, etc.), but also introduces their PandoraFMS library. There’s no word about 10gen’s hosted MongoDB Monitoring Service, nor other MongoDB utilities for monitoring or the latest MongoMem collection memory usage library.
In terms of what are the most interesting stats, Simon Maynard’s 5 things to Monitor in MongoDB and these other 3 metrics should be a good start.
Original title and link: How to Monitor MongoDB (©myNoSQL)
Most Popular Articles
- Translate SQL to MongoDB MapReduce
- Tutorial: Getting Started With Cassandra
- CouchDB vs MongoDB: An attempt for a More Informed Comparison
- Cassandra @ Twitter: An Interview with Ryan King
- A Couple of Nice GUI Tools for MongoDB
- NoSQL benchmarks and performance evaluations
- Ehcache: Distributed Cache or NoSQL Store?
- Document Databases Compared: CouchDB, MongoDB, RavenDB
- Quick Review of Existing Graph Databases
- NoSQL Data Modeling

