Jan Aerts, genetics researcher at Cambridge, is using MongoDB to run some experiments on the 1000genomes project. I am not sure the motivation that brought Jan to MongoDB is the best, but my purpose is not to stop any NoSQL experiments, but report them:
However we would end up with a lot of NULLs in that table. […] This is where you can start thinking of using a document-oriented database for storing these SNP data: each document will be tailored to a specific SNP and will e.g. not refer to the JPTCHB population if it it not present in that population. Enter mongodb.
The post includes code for loading data into MongoDB and also applying MapReduce for getting some results out. Some additional notes from the post:
This script (nb the MapReduce) takes 50 minutes to run using a mongo database on my MacBook laptop.
He also points to the excellent MongoDB aggregation tutorial.