One CouchDB trick that is probably not so well known:
Now if the data that is inserted in the B-tree (in this case, the DocIDs) is random, it causes the tree to fan out quickly. As the minimum fill rate is 1/2 for every internal node, the nodes are mostly filled up to the 1/2 (as the data spreads evenly due to its randomness) generating more internal nodes than before.
The B-tree is why using the (semi-)sequential IDs is a real life saver. The new model causes the database to be filled in orderly fashion and the buckets (i.e. leafs) are filled in instead of leaving them half full. Best part here is that the auto generated IDs by CouchDB (which were not an option for us) already use the sequential ID scheme, so using those IDs you don’t really need to worry a thing.
So remember kids: if you cram loads of data in your CouchDB, remember to select your document ID scheme carefully!
The only thing that is missing is a pointer to what makes a good ID for CouchDB.
Original title and link: The impact of document IDs on performance of CouchDB (NoSQL databases © myNoSQL)