ALL COVERED TOPICS

NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter

NAVIGATE MAIN CATEGORIES

Close

MongoDB: All content tagged as MongoDB in NoSQL databases and polyglot persistence

MongoDB 2.4 Highlights

MongoDB 2.4 is just around the corner:

MongoDB 2.4 highlights

From Mike Friedman’s Roadmap slidedeck.

Original title and link: MongoDB 2.4 Highlights (NoSQL database©myNoSQL)


MongoDB Is Still Broken by Design 5-0

My score after the first period was 4-1. But Emin Gün Sirer contested the 1 in the follow up post to 10gen’s reply:

Until recently, MongoDB did not talk about requestStart() and requestDone() in any context except when talking about how to ensure a very weak consistency requirement. Namely, if you don’t use this pair of operations, then a write to the database followed by a read from the database, by the same client, can return old values. So, I write 42 for key k with a WriteConcern.SAFE, read key k, and get some other number, because the Mongo driver can, by default, very well send the first request to one node over one connection, and the second one to another, over another connection. So requestStart() and requestDone() were billed as a mechanism to avoid that scenario; I saw no mention that they were required for correctness in multithreaded settings. I bet there is plenty of multithreaded code that does not follow that pattern. Such code is broken; if you’re a Mongo user, it’d be a good idea to check if you ever use getLastError without a bracketing requestStart() and Done().

5-0.

Original title and link: MongoDB Is Still Broken by Design 5-0 (NoSQL database©myNoSQL)

via: http://hackingdistributed.com/2013/02/07/10gen-response/#id2


10gen: MongoDB’s Fault Tolerance Is Not Broken…

Sitting comfortably? Check. Popcorn? Check. Let’s press play.

In an interview for InfoQ, 10gen’s Jared Rosoff replies to Emin Gün Sirer: “Broken by Design: MongoDB Fault Tolerance“.

  1. MongoDB lies when it says that a write has succeeded

    JR: “[…] Today the default behavior of official MongoDB drivers is Receipt Acknowledged, which means that you wait until the server has processed your write before returning to the client.”

    The key word here is today. As clearly explained by Sirer, the behavior he described was in all versions of MongoDB and all the drivers until the 2.2 release and the drivers update in Nov.2012.

    Basically the “fire-and-forget” behavior (nb: even this description is not accurate; a better one would be “load-and-forget”) has been the default for almost 3 years. With the 2.2 release and the corresponding drivers update it was changed to “Receipt acknowledged”. But new default acknowledges that the data was received on the server, but not that it was written anywhere. If you want your data to exist on multiple machines or be on a disk you need to use different settings.

  2. Using getLastError slows down write operations.

    JR: “GetLastError is underlying command in the MongoDB protocol that is used to implement write concerns. Intuitively, waiting for an operation to complete on the server is slower than not waiting for it.”

    The problem here is not that the operation slows down because it has to wait for the acknowledgement. The real problem is that getLastError requires an extra network roundtrip. Not to mention that your old code was probably polluted by all these extra calls.

  3. getLastError doesn’t work pipelined

    JR: “[…] For many bulk loads, performing multiple inserts with periodic checks of getLastError is the right choice. […]”

    Read the above.

  4. getLastError doesn’t work multi-threaded

    JR: ” Threads do not see getLastError responses for other thread’s operations. MongoDB’s getLastError command applies to the previous operation on a connection to the database, and not simply whatever random operation was performed last. […]”

    If I’m reading this correctly, it seems like Sirer’ hypothesis was that connections can be shared acrossed threads. I have to agree that many drivers do not provide thread-safe connections. So, linking getLastError behavior to the current connection seems OK.

  5. Write Concerns are broken

    JR: “As described in the above sections, WriteConcerns provide a flexible toolset for controlling the durability of write operations applied to the database. You can choose the level of durability you want for individual operations, balanced with the performance of those operations. With the power to specify exactly what you want comes the responsibility to understand exactly what it is you want out of the database.”

    Let’s look at what Sirer wrote:

    […] one could use WriteConcern.SAFE, FSYNC_SAFE or REPLICAS_SAFE for the insert operation [2]. There are 13 different concern levels, 8 of which seem to be distinct and presumably the remaining 5 are just kind of there in case you mistype one of the other ones. WriteConcern is at least well-named: it corresponds to “how concerned would you be if we lost your data?” and the potential answers are “not at all!”, “be my guest”, and “well, look like you made an effort, but it’s ok if you drop it.” Specifically, that’s three different kinds of SAFE, but none of them give you what you want: (1) SAFE means acknowledged by one replica but not written to disk, so a node failure can obliterate that data, (2) FSYNC_SAFE means written to a single disk, so a single disk crash can obliterate that data, and (3) REPLICAS_SAFE means it has been written to two replicas, but there is no guarantee that you will be able to retrieve it later.

    If you want a different explanation: even if there are 13 different WriteConcern types available, there is none that offers the option of having the data written to disk on more replicas.

The period is over and in my mind the score is clearly 4-1. But I know that this is not a real game and there will be some concluding that I’m not getting it. I’m OK living with that though.

Original title and link: 10gen: MongoDB’s Fault Tolerance Is Not Broken… (NoSQL database©myNoSQL)


When Data Is Worthless - Give MongoDB What Is MongoDB's

Emin Gün Sirer concluding a follow up post to MongoDB fault tolerance is broken by design:

So let us give onto Mongo what is clearly its: it’s mature software with a large install base. If it loses data, it’ll likely do so because of a deliberate design decision, rather than a bug. It’s easy to find Mongo hosting and it’s relatively easy to find people who are experienced with it. So if all your data is really of equal and low value, and you can afford to lose some of it, and your app’s needs are unlikely to grow, then MongoDB can be a fine pick for your application.

Looks like we mostly agree.

Original title and link: When Data Is Worthless - Give MongoDB What Is MongoDB’s (NoSQL database©myNoSQL)

via: http://hackingdistributed.com/2013/02/03/when-data-is-worthless/


MongoDB Fault Tolerance - Broken by Design

Emin Gün Sirer:

So, MongoDB is broken, and not just superficially so; it’s broken by design. If you’re relying on its fault tolerance,a you’re probably using it wrong.

He’s not the first to write about some of the critical issues in MongoDB. Every once in a while, there has been at least one post detailing one of these. But none of them stopped MongoDB’s adoption. Why? My only explanations are that:

  1. these posts are not reaching enough people;
  2. MongoDB is used mostly for simple scenarios where the occurance of such errors can be ignored
  3. those using MongoDB for more complicated scenarios have developed internal, application specific workarounds

Original title and link: MongoDB Fault Tolerance - Broken by Design (NoSQL database©myNoSQL)

via: http://hackingdistributed.com/2013/01/29/mongo-ft/


MongoDB Sequence Number Generators Using findAndModify and Spring Data

Yuan Ji:

I want a sequence number generator from MongoDB to give me unique sequence number. The operations are to return current sequence number and also increase the sequence number in the database. In MongoDB, command findAndModify atomically modifies and returns a single document. So we can use this command to query the sequence number and increase it by $inc function.

The solution is interesting and the post includes the Spring-ified code for it. But there’s no word of the possible issues this solution would run into related to contention to update the same record. Even with row level locking and things would be pretty bad for an app needing lots of sequence number. But with MongoDB’s database level locking, things could get pretty ugly.

Original title and link: MongoDB Sequence Number Generators Using findAndModify and Spring Data (NoSQL database©myNoSQL)

via: http://www.jiwhiz.com/post/2013/1/Add_A_Counter_To_MongoDB_With_Spring_Data


MongoDB Is Abusing JSON With Its Query Language

SM Sohan expresses his discontent with MongoDB’s JSON-based query representation:

I find the MongoDB API is abusing JSON in a really bad way. JSON is probably a good format for storing the documents in MongoDB, but using JSON for it’s weird API is simply a terrible idea.

Using JSON (nb BSON) for representing queries works well for basic equality matching. But I agree that for more advanced queries it looks quite forced. While definitely not perfect, I prefer (biasedly) RethinkDB’s API. On the other hand, I’m not aware of any proposals to make MongoDB’s query language better. Nor that there would be any willingness to change it.

Update: The (always) entertaining HN thread.

Original title and link: MongoDB Is Abusing JSON With Its Query Language (NoSQL database©myNoSQL)

via: http://smsohan.com/blog/2013/01/17/abusing-json/


Social Network Analysis of Apache CloudStack

Nice data experiment run by Sebastien Goasguen against the CloudStack mailing list:

To get the graphs I grabbed the emails archive from Apache. I used Python to load the mbox files into single Mongo collections. I cleaned the data to avoid replications of senders as well as remove JIRA and Review Board entries. Then with a little bit of PyMongo I made the queries and build the graph with NetworkX. Finished up with the graph visualization and calculations using Gephi. Since there are thousands of emails and threads, there is still some work to pre-process the data, avoid duplicates and match individuals to multiple email addresses.

csusers

Three questions:

  1. would using a graph database made this experiment easier?
  2. would Linkurious be able to generate these graphics?
  3. is the code available anywhere so someone else could try to use a graph database and maybe run other types of visualizations?

Original title and link: Social Network Analysis of Apache CloudStack (NoSQL database©myNoSQL)

via: http://sebgoa.blogspot.ch/2013/01/social-network-analysis-of-apache.html


What Is MongoDB's Architecture Based On?

Eric Knorr interviewed Dwight Merriman for InfoWorld: “10gen CEO: Why we’re the NoSQL leader“.

Q: You created the first widely successful NoSQL document database.What did you base that architecture on?

A: It wasn’t based on anything in particular; it’s a career’s worth of learning what works and what doesn’t. We were looking at cloud computing — at needs for horizontal scalability and how we wanted to write code — and we couldn’t find tools that did what we wanted.

Recently, new MongoDB hosting company ObjectRocket launched with the message: “The Cloud Is Broken for MongoDB“. Was it simply bad PR or someone didn’t get the history of MongoDB right?

Original title and link: What Is MongoDB’s Architecture Based On? (NoSQL database©myNoSQL)

via: http://www.infoworld.com/print/211201


11 Interesting Releases From the First Weeks of January

The list of releases I wanted to post about has been growing fast these last couple of weeks, so instead of waiting leaving it to Here it is (in no particular order1):

  1. (Jan.2nd) Cassandra 1.2 — announcement on DataStax’s blog. I’m currently learning and working on a post looking at what’s new in Cassandra 1.2.
  2. (Jan.10th) Apache Pig 0.10.1 — Hortonworks wrote about it
  3. (Jan.10th) DataStax Community Edition 1.2 and OpsCenter 2.1.3 — DataStax announcement
  4. (Jan.10th) CouchDB 1.0.4, 1.1.2, and 1.2.1 — releases fixing some security vulnerabilities
  5. (Jan.11th) MongoDB 2.3.2 unstable — announcement. This dev release includes support for full text indexing. For more details you can check:

    […] an open source project extending Hadoop and Hive with a collection of useful user-defined-functions. Its aim is to make the Hive Big Data developer more productive, and to enable scalable and robust dataflows.


  1. I’ve tried to order it chronologically, but most probably I’ve failed. 

Original title and link: 11 Interesting Releases From the First Weeks of January (NoSQL database©myNoSQL)


The Cloud Is Broken for MongoDB

The cloud is broken. It’s not designed to properly run persistent data stores like MongoDB. ObjectRocket is designed from the ground up to fix this problem.

Any snarky comments fit perfectly.

James Watters

Original title and link: The Cloud Is Broken for MongoDB (NoSQL database©myNoSQL)

via: http://www.objectrocket.com/details


How to Monitor MongoDB

A post by Pandora FMS team about monitoring options for MongoDB:

If MongoDB goes wrong, all your apps will fail. So monitoring the main variables and configuration parameters of your database is the best option to make sure that your values are right and your users are happy.

The post talks about the tools that come with MongoDB (mongotop, mongostat, the stats available through the MongoDB shell, logfiles, etc.), but also introduces their PandoraFMS library. There’s no word about 10gen’s hosted MongoDB Monitoring Service, nor other MongoDB utilities for monitoring or the latest MongoMem collection memory usage library.

In terms of what are the most interesting stats, Simon Maynard’s 5 things to Monitor in MongoDB and these other 3 metrics should be a good start.

Original title and link: How to Monitor MongoDB (NoSQL database©myNoSQL)

via: http://blog.pandorafms.org/?p=821&buffer_share=0406c