The Evolution of Google File System (GFS)
Firstly it was the Google File System described in this 2003 paper (PDF):
We have designed and implemented the Google File System, a scalable distributed file system for large distributed data-intensive applications. It provides fault tolerance while running on inexpensive commodity hardware, and it delivers high aggregate performance to a large number of clients.
This conversation between Marshall Kirk McKusick and Sean Quilan dives into some of the pluses of GFS, but also what led to the next version of Google File System:
Because GFS was designed initially to enable a crawling and indexing system, throughput was everything. […] While these instances—where you have to provide for failover and error recovery—may have been acceptable in the batch situation, they’re definitely not OK from a latency point of view for a user-facing application. Another issue here is that there are places in the design where we’ve tried to optimize for throughput by dumping thousands of operations into a queue and then just processing through them. That leads to fine throughput, but it’s not great for latency. You can easily get into situations where you might be stuck for seconds at a time in a queue just waiting to get to the head of the queue.
In this ZDNet interview, Urs Hölze confirms the migration to the new version of the filesystem:
[…] most applications don’t use [Google File System (GFS)] today. In fact, we’re phasing out GFS in favour of the next-generation file system that is very similar, but it’s not GFS anymore. It scales better and has better latency properties as well.
And according to the same interview, Google is already looking into building another version of its distributed filesystem that would take advatange of flash memory:
I think three years from now we’ll try to retire that because flash memory is coming and faster networks and faster CPUs are on the way and that will change how we want to do things.
Original title and link: The Evolution of Google File System (GFS) (NoSQL database©myNoSQL)