NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



Document-oriented Databases and Normalization

Curt Monash ☞ in a recent post (nb: reformatted for better readability):

When normalization is good and denormalization is bad, one or both of two reasons are commonly in play:

  • The logical burden of keeping straight all the different places you’d have to update the same data is too great for the poor, overburdened programmers.
    • For the logical reason to have great force, there has to be a pretty complex schema, or else a frequently changing one. But when schemas change frequently, relational designs have their own problems.
  • The performance burden of doing all that updating is too great for the poor, overburdened hardware.
    • The physical reason automatically has great force if you have huge update volumes and keep many copies of the same data. Otherwise, its strength has a lot to do with the specific architecture of the DBMS. E.g., if it’s a lot cheaper to update a small record than a big one, short rows are better. But otherwise, denormalization may not have that much effect on performance.


That’s just a different way of saying: normalization is a possible solution for dealing with data redundancy. And if you actually need it or not depends on a lot of factors: data volumes, update volumes, access patterns/volume.

Pat Helland’s ☞ Normalization is for Sissies talks about the pros and cons of denormalization. And this ☞ Lambda the ultimate article pointed me to a paper ☞ Why Normalization failed to become the ultimate guide for data designers? whose abstract emphasizes real-life normalization issues:

With an impressive theoretical foundation, normalization was supposed to bring rigor and relevance into such a slippery domain as database design is. Almost every database textbook treats normalization in a certain extent, usually suggesting that the topic is so clear and consolidated that it does not deserve deeper discussions. But the reality is completely different. After more than three decades, normalization not only has lost much of its interest in the research papers, but also is still looking for practitioners to apply it effectively. Despite the vast amount of database literature, comprehensive books illustrating the application of normalization to effective real-world applications are still waited. This paper reflects the point of view of an Information Systems academic who incidentally has been for almost twenty years a practitioner in developing database applications. It outlines the main weaknesses of normalization and offers some explanations about the failure of a generous framework in becoming the so much needed universal guide for database designers. Practitioners might be interested in finding out (or confirming) some of the normalization misformulations, misinterpretations, inconsistencies and fallacies. Theorists could find useful the presentation of some issues where the normalization theory was proved to be inadequate, not relevant, or source of confusion.


So I guess, normalisation will never replace good design and understanding data design and access patterns.

Original title and link: Document-oriented Databases and Normalization (NoSQL databases © myNoSQL)