NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



WebHDFS: All content tagged as WebHDFS in NoSQL databases and polyglot persistence

HttpFS: Another Hadoop File System Over HTTP

Just a new HTTP interface for Hadoop file system. The main differences between HttpFS and WebHDFS are that this one is created by Cloudera, not Hortonworks (on top of their previos Hoop library) and:

HttpFs is a proxy so, unlike WebHDFS, it does not require clients be able to access every machine in the cluster. This allows clients to to access a cluster that is behind a firewall via the WebHDFS REST API.

Question is: if they are API compatible and both open source, why not unifying them?

Original title and link: HttpFS: Another Hadoop File System Over HTTP (NoSQL database©myNoSQL)



Nicholas Sze:

Apache Hadoop provides a high performance native protocol for accessing HDFS. While this is great for Hadoop applications running inside a Hadoop cluster, users often want to connect to HDFS from the outside. […] To address this we have developed an additional protocol to access HDFS using an industry standard RESTful mechanism, called WebHDFS. As part of this, WebHDFS takes advantages of the parallelism that a Hadoop cluster offers. Further, WebHDFS retains the security that the native Hadoop protocol offers. It also fits well into the overall strategy of providing web services access to all Hadoop components.

WebHDFS opens up opportunities for many new tools. For example, tools like FUSE or C/C++ client libraries using WebHDFS are fairly straightforward to be written. It allows existing Unix/Linux utilities and non-Java applications to interact with HDFS. Besides, there is no Java binding in those tools and Hadoop installation is not required.

I think Andre Luckow’s webhdfs-py is the first library (in Python) to take advantage of WebHDFS.

Original title and link: HTTP REST Access to HDFS: WebHDFS (NoSQL database©myNoSQL)