Hadoop prefers a set of separate disks to the same set managed as a RAID-0 disk array. Read speeds are particularly important to the performance of a Hadoop cluster, and in his post, Steve makes the point that since drive speeds vary, and RAID-0 reads occur at the speed of the slowest disk in the array, a RAID-0 configuration may well be slower than a non-RAID configuration. The bigger issue, in my opinion, is reliability. If a set of disks is configured as a RAID-0 array, then one disk failure in that array will take that entire volume down, and if all the disks in a node are configured as a single RAID-0 array, then a single disk failure will take all the node’s data down. By configuring multiple disks in a RAID-0 array, you magnify the probability of that volume going offline due to a single disk failure and you maximize the amount of data that goes offline when that single failure occurs.
- Why not RAID-0? It’s about Time and Snowflakes
- Proper Care and Feeding of Drives in a Hadoop Cluster: A Conversation with StackIQ’s Dr. Bruno
Original title and link: Hadoop Storage: JBOD vs RAID-0 ( ©myNoSQL)