NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



Cloud computing: All content tagged as Cloud computing in NoSQL databases and polyglot persistence

Zynga, Data Centers, Polyglot Persistence, and Big Data

Cadir Lee (CTO Zynga) quoted in a VentureBeat post:

It’s not the amount of hardware that matters. It’s the architecture of the application. You have to work at making your app architecture so that it takes advantage of Amazon. You have to have complete fluidity with the storage tier, the web tier. We are running our own data centers. We are looking more at doing our own data centers with more of a private cloud.

Couple of thoughts:

  1. Zynga is going the opposite direction than Netflix. While Netflix is focusing (by using Amazon for most of their infrastructure), Zynga is diversifying (building their own data centers) .
  2. Zynga’s applications are great examples of where fully distributed NoSQL databases fit. Availability is key.
  3. My answer to the question: “how many Zyngas are out there” would be: “enough to ensure some good business for the most reliable and scalable distributed databases”
  4. Zynga has contributed and is an investor in Membase, the company that merged with CouchOne to form Couchbase. But Zynga was using a custom version of Membase.
  5. Zynga also operates a large MySQL cluster.
  6. Zynga processes over 15 terabytes of game data every day (according to their SEC filing ). That’s Hadoop sweet spot.

PS: I’d love to talk to someone from Zynga about their data storage approach. If you have any connections I’d really appreciate an introduction.

Original title and link: Zynga, Data Centers, Polyglot Persistence, and Big Data (NoSQL database©myNoSQL)

CouchDB Queue Service: Amazon SQS API-compatible

Neat idea that could prove pretty useful in development environments:

CQS is a message queue system, using Apache CouchDB. It is exactly like Amazon Simple Queue Service (SQS). The API is the same. Everything is exactly the same, it just runs on CouchDB.

Original title and link: CouchDB Queue Service: Amazon SQS API-compatible (NoSQL database©myNoSQL)


Hadoop Chaos Monkey: The Fault Injection Framework

Do you remember the 5 lessons Netflix learned while using the Amazon Web Services—judging by how much Netflix shared about their experience in the cloud including Amazon SimpleDB I’d say these 5 are only the tip of the iceberg—where they talked about the Chaos Monkey?

One of the first systems our engineers built in AWS is called the Chaos Monkey. The Chaos Monkey’s job is to randomly kill instances and services within our architecture. If we aren’t constantly testing our ability to succeed despite failure, then it isn’t likely to work when it matters most – in the event of an unexpected outage.

Hadoop provides a similar framework: Fault Injection Framework :

The idea of fault injection is fairly simple: it is an infusion of errors and exceptions into an application’s logic to achieve a higher coverage and fault tolerance of the system. Different implementations of this idea are available today. Hadoop’s FI framework is built on top of Aspect Oriented Paradigm (AOP) implemented by AspectJ toolkit.

As a sidenote, this is one of the neatest usages of AspectJ I’ve read about.

Update: Abhijit Belapurkar says that Fault injection using AOP was part of Recovery Oriented Computing research at Stanford/UCB many years ago: JAGR: An Autonomous Self-Recovering Application Server.

Original title and link: Hadoop Chaos Monkey: The Fault Injection Framework (NoSQL database©myNoSQL)

The Future of Cloud Services: IDC Report

An IDC Report about the impact of cloud computing on the IT market:

In 2015, public cloud services will account for 46 percent of net new growth in overall IT spending in five key product categories – applications, application development and deployment, systems infrastructure software, basic storage, and servers, according to the report.

Software-oriented cloud services (SaaS) will account for roughly three quarters of all spending on public cloud IT services throughout the forecast. This includes all three software-oriented cloud categories, not just applications. Spending on hardware-oriented cloud services (servers and storage) will be largely driven by SaaS providers building out their infrastructure.

If I read this correctly, there’s now word about Database-as-a-Service.

Original title and link: The Future of Cloud Services: IDC Report (NoSQL database©myNoSQL)


Sinatra with Redis on Cloud Foundry

The workshop takes you through creating a Sinatra application using sample code from here . Once the Sinatra application which leverages Twitter is working, the workshop then takes you through adding Redis to your application. Finally the workshop ends after taking you through scaling your application instances up and then back down.

Only 15 minutes to get it up and running:

Original title and link: Sinatra with Redis on Cloud Foundry (NoSQL databases © myNoSQL)

As the minicomputer brought Oracle, will the cloud bring a new database?

The question I think about a lot is what changes will cloud computing bring to the database industry. Will cloud be the platform change that ushers in a new generation of database technologies, and if so which ones and how sweeping will the change be? It is both intellectually interesting and one of the largest external factors effecting the company I run (10gen, the company that builds mongoDB; yes, this is an obvious source of bias for me in this post

Thought provoking.

Original title and link: As the minicomputer brought Oracle, will the cloud bring a new database? (NoSQL databases © myNoSQL)


Cloud Foundry, NoSQL Databases, and Polyglot Persistence

VMWare’s Cloud Foundry has the potential to become the preferred PaaS solution. It bundles together a set of services that it took years for other PaaS providers (Google App Engine, Microsoft Azure) to offer. And it seems that Cloud Foundry has much less (or none at all) vendor lock in[1].

From a storage perspective, Cloud Foundry is encouraging polyglot persistence right from the start offering access to a relational database (MySQL), a super-fast smart key-value store (Redis), and a popular document database (MongoDB). The only bit missing is a graph database[2].

I think the first graph database to get there will see an immediate bump in its adoption.

  1. These comments are based on what I’ve read about VMWare CloudFoundry as I haven’t received (yet) my invitation.  

  2. I don’t think wide-column databases (Cassandra, HBase) are fit for PaaS  

Original title and link: Cloud Foundry, NoSQL Databases, and Polyglot Persistence (NoSQL databases © myNoSQL)

Amazon EC2 Cassandra Cluster with DataStax AMI

This AMI does the following:

  • installs Cassandra 0.7.4 on a Ubuntu 10.10 image
  • configures emphemeral disks in raid0, if applicable (EBS is a bad fit for Cassandra
  • configures Cassandra to use the root volume for the commitlog and the ephemeral disks for data files
  • configures Cassandra to use the local interface for intra-cluster communication
  • configures all Cassandra nodes with the same seed for gossip discovery

Note the “EBS is a bad fit for Cassandra”. That’s what Adrian Cockcroft explains in Multi-tenancy and Cloud Storage Performance.

Original title and link: Amazon EC2 Cassandra Cluster with DataStax AMI (NoSQL databases © myNoSQL)


Multi-tenancy and Cloud Storage Performance

Adrian Cockcroft[1] has a great explanation of the impact of multi-tenancy on cloud storage performance. The connection with NoSQL databases is not necessarily in the Amazon EBS and SSD Price, Performance, QoS comparison, but:


If you ever see public benchmarks of AWS that only use m1.small, they are useless, it shows that the people running the benchmark either didn’t know what they were doing or are deliberately trying to make some other system look better. You cannot expect to get consistent measurements of a system that has a very high probability of multi-tenant interference.

  1. Adrian Cockcroft: Netflix, @adrianco  

Original title and link: Multi-tenancy and Cloud Storage Performance (NoSQL databases © myNoSQL)


Open-Source VoIP Cloud Services with Erlang

There’s a bit of CouchDB in the project:

We’ve built an open-source product that automatically deploys, scales and distributes VoIP calls across the Internet on commodity or virtualized servers. It fully utilizes Erlang for VoIP logic as well as relies on other Erlang products like CouchDB and RabbitMQ. It’s got an awesome set of APIs and some other nifty features.

Original title and link: Open-Source VoIP Cloud Services with Erlang (NoSQL databases © myNoSQL)

Netflix: Run Consistency Checkers All The Time To Fixup Transactions

Todd Hoff about NoSQL and Cloud at Netflix:

You might have consistency problems if you have: multiple datastores in multiple datacenters, without distributed transactions, and with the ability to alternately execute out of each datacenter;  syncing protocols that can fail or sync stale data; distributed clients that cache data and then write old back to the central store; a NoSQL database that doesn’t have transactions between updates of multiple related key-value records; application level integrity checks; client driven optimistic locking.

Original title and link: Netflix: Run Consistency Checkers All The Time To Fixup Transactions (NoSQL databases © myNoSQL)