A quick description of the most important terms and components of Hadoop—HDFS, NameNode, DataNode, MapReduce, JobTracker, TaskTracker—and its high level design principles:
- The system must properly distribute data across a system evenly and safely.
- The system must support partial failure of a node in the system. This means, if a node goes down, the operations within the cluster continue without change in the final outcome.
- If there is a failure, the system should be able to recover the data through the existence of backup (later referred to as replicated blocks).
- When a node is brought back online, it should be able to rejoin the system immediately
- The system shall maintain linear scalability, meaning addition of resources will increase performance linearly, just as removal of resources would decrease performance linearly.
Original title and link: Hadoop Terms and Components Index Card ( ©myNoSQL)