In a previous post, I was arguing that data modeling will remain an “art” even if we are talking about NoSQL systems or not. Recently I’ve noticed a couple of posts that have resurfaced this idea in the context of document databases and parent - child models.
Both CouchDB and MongoDB spend some time on their documentation to explain the different approaches for mapping one-to-many and many-to-many relationships and also explain some of the pros and cons.
Unfortunately, there are tons of posts out there showing just one of the possible solutions and forgetting to detail pros and cons or at least ask the reader to further investigate the topic. One of the most used example is representing child collections as IDs in the parent entity. Another is representing child entities as an embedded collection on the parent. But what I couldn’t find in such posts was a discussion about pros and cons. For example, IDs on the parent leads to the well known N+1 access issue, embedded collections can lead to increased size data manipulation or unreachable child entities and so on.
So, my advise is that before starting to just dump your data into your favorite document store:
- spend some extended time understanding how to model your data and relationships with your storage solution
- think a lot about what data access patterns will be needed in your application
- don’t just trust all “look ma’, this solution is so cool”. Dig into the topic a bit more.
Otherwise you’ll might just end up with a cool NoSQL system performing rather badly due to the fact you have (mis)modeled your data.