NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



The Evolution of Google File System (GFS)

Firstly it was the Google File System described in this 2003 paper (PDF):

We have designed and implemented the Google File System, a scalable distributed file system for large distributed data-intensive applications. It provides fault tolerance while running on inexpensive commodity hardware, and it delivers high aggregate performance to a large number of clients.

This conversation between Marshall Kirk McKusick and Sean Quilan dives into some of the pluses of GFS, but also what led to the next version of Google File System:

Because GFS was designed initially to enable a crawling and indexing system, throughput was everything. […] While these instances—where you have to provide for failover and error recovery—may have been acceptable in the batch situation, they’re definitely not OK from a latency point of view for a user-facing application. Another issue here is that there are places in the design where we’ve tried to optimize for throughput by dumping thousands of operations into a queue and then just processing through them. That leads to fine throughput, but it’s not great for latency. You can easily get into situations where you might be stuck for seconds at a time in a queue just waiting to get to the head of the queue.

In this ZDNet interview, Urs Hölze confirms the migration to the new version of the filesystem:

[…] most applications don’t use [Google File System (GFS)] today. In fact, we’re phasing out GFS in favour of the next-generation file system that is very similar, but it’s not GFS anymore. It scales better and has better latency properties as well.

And according to the same interview, Google is already looking into building another version of its distributed filesystem that would take advatange of flash memory:

I think three years from now we’ll try to retire that because flash memory is coming and faster networks and faster CPUs are on the way and that will change how we want to do things.

Original title and link: The Evolution of Google File System (GFS) (NoSQL database©myNoSQL)