In a reply to Cloudera’s defense of HDFS, Jeff Darcy comments about the portability of HDFS:
This is also not an HDFS exclusive. Any of the alternatives that were developed outside the Hadoopiverse have this quality as well. If you have data in Cassandra or Ceph you can keep it in Cassandra or Ceph as you go Hadoop-distro shopping. The biggest data-portability wall here is HDFS’s, because it’s one of only two such systems (the other being MapR) that’s Hadoop-specific. It doesn’t even try to be a general-purpose filesystem or database. A tremendous amount of work has gone into several excellent tools to import data into HDFS, but that work wouldn’t even be necessary with some of the alternatives. That’s not just a waste of machine cycles; it’s also a waste of engineer cycles. If they hadn’t been stuck in the computer equivalent of shipping and receiving, the engineers who developed those tools might have created something even more awesome. I know some of them, and they’re certainly capable of it. Each application can write the data it generates using some set of interfaces. If HDFS isn’t one of those, or if HDFS through that interface is unbearably slow because the HDFS folks treat anything other than their own special snowflake as second class, then you’ll be the one copying massive amounts of data before you can analyze it … not just once, but every time.
Original title and link: Attacking HDFS’s Defense: Why Does Cloudera *Really* Use HDFS? ( ©myNoSQL)