NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



Cloud Computing: All content tagged as Cloud Computing in NoSQL databases and polyglot persistence

Hadoop in the Cloud: Pros and Cons

Steve Loughran covering the pro and con arguments of running Hadoop in a cloud environment:

  1. If your data is stored in a cloud provider’s storage infrastructure, doing the analysis locally is the only rational action. It’s that “work near the data” philosophy.
  2. If you are only doing some computation -say nightly- then you can rent some cluster time. Even if compute performance is worse, you can just rent some more machines to compensate.
  3. You may be able to achieve better security through isolation of clusters (depends on your IaaS vendor’s abilities).
  4. No upfront capex; fund from ongoing revenue.
  5. Easier to expand your cluster; no need to buy more racks, find more rack space.
  6. You don’t need to care about the problems of networking.
  7. Less of a problem of heterogenous clusters if you expand later.

Interestingly the list of counter-arguments is much shorter and the important bit, further detailed in the post, is: “Hadoop contains lots of assumptions about running in a static infrastructure; it’s scheduling and recovery algorithms assume this.”

Original title and link: Hadoop in the Cloud: Pros and Cons (NoSQL database©myNoSQL)


OpenStack-based SDSC Cloud Storage Services

The San Diego Supercomputer Center (SDSC) at the University of California, San Diego announced a cloud storage solution based on OpenStack Swift Object Storage:

SDSC’s Cloud Storage provides academic and industry users with a convenient and affordable way to store, share, and archive data, including extremely large data sets. The object based storage system and multiple interface methods make the SDSC Cloud easy to use for the average user, but also provide a flexible, configurable, and expandable solution to meet the needs of more demanding applications.

Check out the project homepage for a short description of this new cloud offering characteristics.

Original title and link: OpenStack-based SDSC Cloud Storage Services (NoSQL database©myNoSQL)

Tanuki: A 30000 Cores AWS Cluster

Sometimes the only valid comment is wow.

We have now launched a cluster 3 times the size of Tanuki, or 30,000 cores, which cost $1279/hour to operate for a Top 5 Pharma. It performed genuine scientific work — in this case molecular modeling — and a ton of it. The complexity of this environment did not necessarily scale linearly with the cores.

In fact, we had to implement a triad of features within CycleCloud to make it a reality:

  1. MultiRegion support: To achieve the mind boggling core count of this cluster, we launched in three distinct AWS regions simultaneously, including Europe.
  2. Massive Spot instance support: This was a requirement given the potential savings at this scale by going through the spot market. Besides, our scheduling environment and the workload had no issues with the possibility of early termination and rescheduling.
  3. Massive CycleServer monitoring & Grill GUI app for Chef monitoring: There is no way that any mere human could keep track of all of the moving parts on a cluster of this scale.

Facebook runs a 30PB Hadoop analytic data warehouse and Yahoo! has a 100,000 cores/40,000 machines Hadoop cluster. I’m wondering what are the largest Amazon Elastic MapReduce jobs ever run. Any ideas?

Original title and link: Tanuki: A 30000 Cores AWS Cluster (NoSQL database©myNoSQL)


Will Oracle Win the NoSQL Competition

I agree this title is misleading but problem is clear: today Oracle does not provide any product can compete with new cloud computing needs and with the NoSQL movement. It is not possibile to think that actually the RAC technology of oracle can be used in a cloud environment and also a cloud service cannot be deployed over an Exadata.

The real question though is if Oracle is really interested by the market currently served by NoSQL databases and/or hybrid solutions. And judging by the latest versions of MySQL and MySQL Cluster[1] it looks like they are testing the waters.

  1. Latest versions of MySQL and MySQL Cluster are adding support for using the Memcached protocol. See NoSQL to MySQL with Memcached  

Original title and link: Will Oracle Win the NoSQL Competition (NoSQL database©myNoSQL)


Running MongoDB on the Cloud

I’ve been posting a lot about deployments in the cloud and especially about deploying MongoDB in the Amazon cloud:

In this video Jared Rosoff covers topics like scaling and performance characteristics of running MongoDB in the cloud and he also shares some best practices when using Amazon EC2.

Memcached in the Cloud: Amazon ElastiCache

Amazon announced today a new service Amazon ElastiCache or Memcached in the cloud. The new service is still in beta and available only in the US East (Virginia) Region.

While many will find this new service useful, it is a bit of a disappointement that Amazon took the safe route and went with pure Memcached. The only notable feature of Amazon ElastiCache is automatic failure detection and recovery. But compared with Membase (and the soon to be released Couchbase 2.0) it is missing clustering, replication, support for virtual nodes, etc. Even if advertising a push-button scaling, ElastiCache will lose cached data on adding or removing instances.

The pace at which Amazon is launching new services is indeed impressive. I’m wondering what will be the first NoSQL database that will get official Amazon support.

Original title and link: Memcached in the Cloud: Amazon ElastiCache (NoSQL database©myNoSQL)

Reliable, Scalable, and Kinda Sorta Cheap: A Cloud Hosting Architecture for MongoDB

Using MongoDB replicate sets:

At Famigo, we house all of our valuable data in MongoDB and we also serve all requests from Amazon EC2 instances. We’ve devoted many mental CPU cycles to finding the right architecture for our data in the cloud, focusing on 3 main factors: cost, reliability, and performance.

Original title and link: Reliable, Scalable, and Kinda Sorta Cheap: A Cloud Hosting Architecture for MongoDB (NoSQL database©myNoSQL)


Data Integrity in the Cloud

Chris Marsh:

Cloud storage can be an attractive means of outsourcing the day-to-day management of data, but ultimately the responsibility and liability for that data falls on the company that owns the data, not the hosting provider. With this in mind, it is important to understand some of the causes of data corruption, how much responsibility a cloud service provider holds, some basic best practices for utilizing cloud storage safely, and some methods and standards for monitoring the integrity of data regardless of whether that data resides locally or in the cloud.

This reminded me of how the Adobe SaaS Infrastructure Team has tested HBase.

Original title and link: Data Integrity in the Cloud (NoSQL database©myNoSQL)


MongoDB Positioning: Big Data and Development Agility

Max Schireson positions MongoDB as a solution for Big Data and development agility:

The Server Architecture Debate Rages On

Big processors or little processors, scale-up or scale-out, on-premise or in the cloud […] The plethora of choices for application architecture and delivery model are great if you like variety, but I don’t envy anyone tasked with choosing which system on which to spend their limited budget dollars.

Too little options is bad[1]. Too many options are paralizing[2]. Then what’s the solution? I think the only answer is to build experience. By trying, failing, learning, and sharing with everyone else.

Original title and link: The Server Architecture Debate Rages On (NoSQL database©myNoSQL)


MongoDB Hosting Matrix

Maurício Maia put together a price comparison for MongoDB hosting:

MongoDB hosting matrix

Credit Maurício Maia

Additionally there’s also the matrix of MongoDB hosting features. While far from being exhaustive, these MongoDB hosting matrix are meant to give you an idea of what options are out there.

Unfortunately, they don’t include MongoDB hosting on:

Each of these offer a free plan, but the pricing will depend on many factors. On the other hand, they also offer application hosting and that basically means collocating your app and data which is better than putting the whole internet between the two.

Original title and link: MongoDB Hosting Matrix (NoSQL database©myNoSQL)