NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



Lustre: All content tagged as Lustre in NoSQL databases and polyglot persistence

Hadoop on top of… Intel adds Lustre support to Hadoop

Intel adds Lustre support to Hadoop:

We abstracted out an HDFS layer but underneath that it is actually talking to lustre.

This is not the first project based on the principle “we already have this distributed system, file system or database, so why not reusing it for Hadoop?”. What would be the first step of such a project? Provide a HDFS API compatible layer on top of your existing system. But how about the other assumptions in HDFS: large block, sequential, local access, etc? How do you guarantee that your integration addressed all of them?

If this trends continues, I could see one of the companies behind the open source Hadoop, Cloudera or Hortonworks or both, coming up with a TCK sold to any company that claims HDFS compatibility.

Original title and link: Hadoop on top of… Intel adds Lustre support to Hadoop (NoSQL database©myNoSQL)

A Short Incursion Into Alternate Hadoop Filesystems

Steve Loughran starts with a critical look at Netapp Open solution for Hadoop paper:

Actually it is weirder than I first thought. This is still HDFS, just running on more expensive hardware. You get the (current) HDFS limitations: no native filesystem mounting, a namenode to care about, security on a par with NFS, without the cost savings of pure-SATA-no-licensing-fees. Instead you have to use RAID everywhere, which not only bumps up your cost of storage, puts you at risk of RAID controller failure and errors in the OS drivers for those controller (hence their strict rules about which Linux releases to trust). If you do follow their recommendations and rely on hardware for data integrity, you’ve cut down the probability of node-local job execution, so all FUD about replication traffic is now moot as at least 1/3 more of your tasks will be running remote -possibly even with the Fair Scheduler, which waits for a bit to see if a local slot becomes free. What they are doing then is adding some HA hardware underneath a filesystem that is designed to give strong availability out of medium availability hardware. I have seen such a design before, and thought it sucked then too.  Information week says this is a response to EMC, but it looks more like NetApp’s strategy to stay relevant, and Cloudera are partnering with them as NetApp offered them money and if it sells into more “enterprise customers” then why not? With the extra hardware costs of NetApp the cloudera licenses will look better value, and clearly both NetApp and their customers are in need of the hand-holding that Cloudera can offer.

Then in a follow up post, he looks at a couple of alternatives (Lustre, GPFS, IBRIX, etc):

I’m not against running MapReduce—or the entire Hadoop stack—against alternate filesystems. There are some good cases where it makes sense. Other filesystems offer security, NFS mounting, the ability to be used by other applications and other features. HDFS is designed to scale well on “commodity” hardware, (where servers containing Xeon E5 series parts with 64GB RAM, 10GbE and 8-12 SFF HDDs are considered a subset of “commodity”).

Original title and link: A Short Incursion Into Alternate Hadoop Filesystems (NoSQL database©myNoSQL)