ALL COVERED TOPICS

NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter

NAVIGATE MAIN CATEGORIES

Close

MongoDB Replica Sets and Sharding for GridFS as a Distributed File System

Contrary to many MongoDB deployments, we primarily use it for storing files in GridFS. We switched over to MongoDB after searching for a good distributed file system for years. Prior to MongoDB we used a regular NFS share, sitting on top of a HAST-device. That worked great, but it didn’t allow us to scale horizontally the way a distributed file system allows.

No doubt GridFS is a useful feature of MongoDB, but I’m pretty sure the experts in distributed file systems have better solutions for this—I just hope they’ll share it with us.

Update: Jeff Darcy1:

Yes, we do have better solutions for this particular kind of use case.  So do object/blob stores like Swift.  

Honestly, I don’t think the “searching for a good distributed filesystem” part is even credible. How can someone be that bad at finding readily available information?  For example, it’s easier to set up sharding and replication with GlusterFS than with MongoDB and GridFS, plus you’ll get striping and RDMA and generally better performance for this type of workload.  On top of all that, you won’t need to use special libraries to interface with it because it’s a regular POSIX filesystem.  Lastly, it’s not like there hasn’t been a lot of press about it.  Even considering their obvious FreeBSD bias and the fact that FreeBSD is weak in this area, the second i tem for “FreeBSD distributed filesystem” points to GlusterFS.  If they didn’t find it, they just didn’t look very hard before they reached for the New Shiny.  

It’s not just GlusterFS, either.  MogileFS might not be a real filesystem but it’s user space so it would probably run just fine in their environment - as would the aforementioned Swift.  I have more of a problem with the anti-Mongo haters than with Mongo itself, it’s wonderful that these guys found a Mongo-based solution that works for them, but it seems like a bit of an odd choice nonetheless.


  1. Jeff Darcy is a member of the gluster.org advisory board, and works on GlusterFS full time at Red Hat. He’s also the person I direct all my questions related to distributed file systems (and not only). 

Original title and link: MongoDB Replica Sets and Sharding for GridFS as a Distributed File System (NoSQL database©myNoSQL)

via: http://viktorpetersson.com/2012/01/29/notes-on-mongodb-gridfs-and-sharding-in-the-cloud/