hdfs: All content tagged as hdfs in NoSQL databases and polyglot persistence
The post is a bit old, but the data contained comparing different compression methods is helpful:
Original title and link: Comparing File Formats and Compression Methods in HDFS and Hive ( ©myNoSQL)
As I’m slowly recovering after a severe poisoning that I initially ignored but finally put me to bed for almost a week, I’m going to post some of the most interesting articles I’ve read while resting.
Hadoop Namenode’s single point of failure has always been mentioned as one of the weaknesses of Hadoop and also as a differentiator of other Hadoop-based commercial offerings. But now the Namenode HA branch was merged into trunk and while it will take a couple of cicles to complete the tests, this will become soon part of the Hadoop distribution.
Significant enhancements were completed to make HOT Failover work:
- Configuration changes for HA
- Notion of active and standby states were added to the Namenode
- Client-side redirection
- Standby processing journal from Active
- Dual block reports to Active and Standby
In a follow up post to Gartner’s article Apache Hadoop 1.0 Doesn’t Clear Up Trunks and Branches Questions. Do Distributions?, the advantage of using custom distributions will slowly vanish and the open source version will be the one you’ll want to have in production.
Original title and link: Hadoop Namenode High Availability Merged to HDFS Trunk ( ©myNoSQL)
Edd Dumbill enumerates the various components of the Hadoop ecosystem:
Original title and link: The components and their functions in the Hadoop ecosystem ( ©myNoSQL)
A picture is worth a thousand words. A comic-like explanation of HDFS is worth some too:
See it in full size. Credit Maneesh Varshney
Original title and link: Hadoop Distributed File System HDFS: A Cartoon Is Worth A ( ©myNoSQL)
Steve Loughran starts with a critical look at Netapp Open solution for Hadoop paper:
Actually it is weirder than I first thought. This is still HDFS, just running on more expensive hardware. You get the (current) HDFS limitations: no native filesystem mounting, a namenode to care about, security on a par with NFS, without the cost savings of pure-SATA-no-licensing-fees. Instead you have to use RAID everywhere, which not only bumps up your cost of storage, puts you at risk of RAID controller failure and errors in the OS drivers for those controller (hence their strict rules about which Linux releases to trust). If you do follow their recommendations and rely on hardware for data integrity, you’ve cut down the probability of node-local job execution, so all FUD about replication traffic is now moot as at least 1/3 more of your tasks will be running remote -possibly even with the Fair Scheduler, which waits for a bit to see if a local slot becomes free. What they are doing then is adding some HA hardware underneath a filesystem that is designed to give strong availability out of medium availability hardware. I have seen such a design before, and thought it sucked then too. Information week says this is a response to EMC, but it looks more like NetApp’s strategy to stay relevant, and Cloudera are partnering with them as NetApp offered them money and if it sells into more “enterprise customers” then why not? With the extra hardware costs of NetApp the cloudera licenses will look better value, and clearly both NetApp and their customers are in need of the hand-holding that Cloudera can offer.
Then in a follow up post, he looks at a couple of alternatives (Lustre, GPFS, IBRIX, etc):
I’m not against running MapReduce—or the entire Hadoop stack—against alternate filesystems. There are some good cases where it makes sense. Other filesystems offer security, NFS mounting, the ability to be used by other applications and other features. HDFS is designed to scale well on “commodity” hardware, (where servers containing Xeon E5 series parts with 64GB RAM, 10GbE and 8-12 SFF HDDs are considered a subset of “commodity”).
Original title and link: A Short Incursion Into Alternate Hadoop Filesystems ( ©myNoSQL)