ALL COVERED TOPICS

NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter

NAVIGATE MAIN CATEGORIES

Close

An introduction to the Hadoop Distributed File System

An excellent article covering:

  • HDFS architecture
  • Data replication
  • Data organization
  • Data storage reliability

HDFS has many goals. Here are some of the most notable:

  • Fault tolerance by detecting faults and applying quick, automatic recovery
  • Data access via MapReduce streaming
  • Simple and robust coherency model
  • Processing logic close to the data, rather than the data close to the processing logic
  • Portability across heterogeneous commodity hardware and operating systems
  • Scalability to reliably store and process large amounts of data
  • Economy by distributing data and processing across clusters of commodity personal computers
  • Efficiency by distributing data and logic to process it in parallel on nodes where data is located
  • Reliability by automatically maintaining multiple copies of data and automatically redeploying processing logic in the event of failures
HDFS architecture

Credit J. Jeffrey Hanson

Original title and link: An introduction to the Hadoop Distributed File System (NoSQL databases © myNoSQL)

via: http://www.ibm.com/developerworks/web/library/wa-introhdfs/index.html?ca=drs-