John Webster reporting his learnings from the Structure Big Data event:
This conference only confirmed a suspicion that’s been building for that last few months as I’ve been following the big-data wave: Big-data practitioners are generally hostile to shared storage. They like direct-attached storage (DAS) in various forms from solid state disk (SSD) to high-capacity SATA disk buried inside parallel processing nodes. SANs (storage area networks) need not apply.
Why? There are two reasons that are interrelated. First, most if not all of the attendees here would include real- or near-real-time information delivery as a one of the defining characteristics of big-data analytics. Latency is therefore avoided whenever and wherever possible. Data in memory is good. Data on spinning disk at the other end of a SAN connection is not, unless perhaps it’s a secondary copy of data. (I’ll get to that in a minute.) And while some here believed that it was theoretically possible to get high-performance shared storage to stand up to the low-latency requirement, the cost of such a SAN at the scale these people need was seen to be prohibitive.
Squeezing every drop of performance is one aspect. Costs are the second. But I also think there is also a CAP dimension in the sense that data locality increases the reliability of a distributed system.
Original title and link: Big Data and Storage Area Networks (NoSQL databases © myNoSQL)