MongoDB: The Size of the Document and Why it Matters

Kyle Banker explains (@Hwaet) some of the possible implications of using very large documents in MongoDB:

  1. If you’re doing a full-document, replace-style, update, that entire 500k needs to be serialized and sent across the wire. This could get expensive on an update-heavy deployment.
  2. Same goes for queries. If you’re pulling back 500k at a time, that has to go across the network and be deserialized on the driver side.
  3. While most atomic updates happen in-place, the document usually has to be rewritten in-place on the server, as this is dictated by the BSON format. If you’re doing lots of $push operations on a very large document, that document will have to be rewritten server-side, which, again, on a heavy deployment, could get expensive.
  4. If an inner-document is frequently manipulated on its own, it can be less computationally expensive both client-side and server-side simply to store that “many” relationship in its own collection. It’s also frequently easier to manipulate the “many” side of a relationship when it’s in its own collection.

If going embedded all the way works for your use case, then there’s probably no problem with it. But with these extra-large documents, and a heavy load, you may start to see consequences in terms of performance and/or manipulability.

I’d say that these probably apply to most of the document databases out there.