ALL COVERED TOPICS

NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter

NAVIGATE MAIN CATEGORIES

Close

SAN: All content tagged as SAN in NoSQL databases and polyglot persistence

Hadoop on SAN? Never, ever do this to Hadoop

Andrew C. Oliver in an article for InfoWorld:

I’ve done this myself, figuring we’d kick off the project and show how we could “optimize” to local disks later. Let me say this unequivocally: You absolutely should not use a SAN or NAS with Hadoop.

As simple as that.

Original title and link: Hadoop on SAN? Never, ever do this to Hadoop (NoSQL database©myNoSQL)

via: http://www.infoworld.com/d/application-development/never-ever-do-hadoop-232090


Big Data and Storage Area Networks

John Webster reporting his learnings from the Structure Big Data event:

This conference only confirmed a suspicion that’s been building for that last few months as I’ve been following the big-data wave: Big-data practitioners are generally hostile to shared storage. They like direct-attached storage (DAS) in various forms from solid state disk (SSD) to high-capacity SATA disk buried inside parallel processing nodes. SANs (storage area networks) need not apply.

[…]

Why? There are two reasons that are interrelated. First, most if not all of the attendees here would include real- or near-real-time information delivery as a one of the defining characteristics of big-data analytics. Latency is therefore avoided whenever and wherever possible. Data in memory is good. Data on spinning disk at the other end of a SAN connection is not, unless perhaps it’s a secondary copy of data. (I’ll get to that in a minute.) And while some here believed that it was theoretically possible to get high-performance shared storage to stand up to the low-latency requirement, the cost of such a SAN at the scale these people need was seen to be prohibitive.

Squeezing every drop of performance is one aspect. Costs are the second. But I also think there is also a CAP dimension in the sense that data locality increases the reliability of a distributed system.

Original title and link: Big Data and Storage Area Networks (NoSQL databases © myNoSQL)

via: http://news.cnet.com/8301-21546_3-20049693-10253464.html