An introduction to the Hadoop Distributed File System
An excellent article covering:
- HDFS architecture
- Data replication
- Data organization
- Data storage reliability
HDFS has many goals. Here are some of the most notable:
- Fault tolerance by detecting faults and applying quick, automatic recovery
- Data access via MapReduce streaming
- Simple and robust coherency model
- Processing logic close to the data, rather than the data close to the processing logic
- Portability across heterogeneous commodity hardware and operating systems
- Scalability to reliably store and process large amounts of data
- Economy by distributing data and processing across clusters of commodity personal computers
- Efficiency by distributing data and logic to process it in parallel on nodes where data is located
- Reliability by automatically maintaining multiple copies of data and automatically redeploying processing logic in the event of failures
Original title and link: An introduction to the Hadoop Distributed File System (NoSQL databases © myNoSQL)
via: http://www.ibm.com/developerworks/web/library/wa-introhdfs/index.html?ca=drs-
