NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



Netflix: All content tagged as Netflix in NoSQL databases and polyglot persistence

NetflixGraph: In-Memory Directed Graph Data

Another open source project from Netflix: NetflixGraph:

NetflixGraph is a compact in-memory data structure used to represent directed graph data. You can use NetflixGraph to vastly reduce the size of your application’s memory footprint, potentially by an order of magnitude or more. If your application is I/O bound, you may be able to remove that bottleneck by holding your entire dataset in RAM. You’ll likely be very surprised by how little memory is actually required to represent your data.

At first glance it sounds sort of a Redis for graph data. Available on GitHub.

Original title and link: NetflixGraph: In-Memory Directed Graph Data (NoSQL database©myNoSQL)


From SimpleDB to Cassandra: Data Migration for a High Volume Web Application at Netflix

Prasanna Padmanabhan and Shashi Madapp posted an article on the Netflix blog describing the process used to migrate data from Amazon SimpleDB to Cassandra:

There will come a time in the life of most systems serving data, when there is a need to migrate data to a more reliable, scalable and high performance data store while maintaining or improving data consistency, latency and efficiency. This document explains the data migration technique we used at Netflix to migrate the user’s queue data between two different distributed NoSQL storage systems.

The steps involved are what you’d expect for a large data set migration:

  1. forklift
  2. incremental replication
  3. consistency checking
  4. shadow writes
  5. shadow writes and shadow reads for validation
  6. end of life of the original data store (SimpleDB)

If you think of it, this is how a distributed, eventually consistent storage works (at least in big lines) when replicating data across the cluster. The main difference is that inside a storage engine you deal with a homogeneous system with a single set of constraints, while data migration has to deal with heterogenous systems most often characterized by different limitations and behavior.

In 2009, Netflix performed a similar massive data migration operation. At that time it involved moving data from its own hosted Oracle and MySQL databases to SimpleDB. The challenges of operating this hybrid solution were described in a the paper Netflix’s Transition to High-Availability Storage Systems authored by Sid Anand.

Sid Anand is now working at LinkedIn where they use Databus for low latency data transfer. But Databus’s approach is very similar.

Original title and link: From SimpleDB to Cassandra: Data Migration for a High Volume Web Application at Netflix (NoSQL database©myNoSQL)


Redis-Stat: Redis Monitoring With Netflix’s Hystrix Flavor

A very early stage of a Redis monitoring tool using hiredis1and express2 on Node.js presenting a dashboard inspired by Netflix’s Hystrix3:


The project is on GitHub so you can send some pull requests with improvements.

Original title and link: Redis-Stat: Redis Monitoring With Netflix’s Hystrix Flavor (NoSQL database©myNoSQL)

The Best Defense Against Major Unexpected Failures Is to Fail Often: Netflix Open Sources Chaos Monkey

Why would anyone use Netflix’s Chaos Monkey?

Failures happen and they inevitably happen when least desired or expected. If your application can’t tolerate an instance failure would you rather find out by being paged at 3am or when you’re in the office and have had your morning coffee? Even if you are confident that your architecture can tolerate an instance failure, are you sure it will still be able to next week? How about next month? Software is complex and dynamic and that “simple fix” you put in place last week could have undesired consequences. Do your traffic load balancers correctly detect and route requests around instances that go offline? Can you reliably rebuild your instances? Perhaps an engineer “quick patched” an instance last week and forgot to commit the changes to your source repository?

GitHub repository is here

Original title and link: The Best Defense Against Major Unexpected Failures Is to Fail Often: Netflix Open Sources Chaos Monkey (NoSQL database©myNoSQL)


Wordnik: Migrating From a Monolythic Platform to Micro Services

The story of how Wordnik changed a monolithic platform to one based on Micro Services and the implications at the data layer (MongoDB):

To address this, we made a significant architectural shift. We have split our application stack into something called Micro Services — a term that I first heard from the folks at Netflix. […] This translates to the data tier as well. We have low cost servers, and they work extremely well when they stay relatively small. Make them too big and things can go sour, quickly. So from the data tier, each service gets its own data cluster. This keeps services extremely focused, compact, and fast — there’s almost no fear that some other consumer of a shared data tier is going to perform some ridiculously slow operation which craters the runtime performance. Have you ever seen what happens when a BI tool is pointed at the runtime database? This is no different.

Original title and link: Wordnik: Migrating From a Monolythic Platform to Micro Services (NoSQL database©myNoSQL)


Automating Cassandra Operations and Management With Netflix's Priam Tool

A new open source tool from Netflix, Priam—back in November, Netflix has released Curator, a ZooKeeper library—, used to simplify and automate the operations and management of a Cassandra cluster:

Priam is a co-process that runs alongside Cassandra on every node to provide the following functionality:

  • Backup and recovery
    • snapshot and incremental backups
    • compression and multipart off-site uploading
    • data recovery and data testing
  • Bootstrapping and automated token assignment

    Priam automates the assignment of tokens to Cassandra nodes as they are added, removed or replaced in the ring. Priam relies on centralized external storage (SimpleDB/Cassandra) for storing token and membership information, which is used to bootstrap nodes into the cluster. It allows us to automate replacing nodes without any manual intervention, since we assume failure of nodes, and create failures using Chaos Monkey. The external Priam storage also provides us valuable information for the backup and recovery process.

  • Centralized configuration management: All our clusters are centrally configured via properties stored in SimpleDB, which includes setup of critical JVM settings and Cassandra YAML properties.

  • RESTful monitoring and metrics: provides hooks that support external monitoring and automation scripts. They provide the ability to backup, restore a set of nodes manually and provide insights into Cassandra’s ring information. They also expose key Cassandra JMX commands such as repair and refresh.

Original title and link: Automating Cassandra Operations and Management With Netflix’s Priam Tool (NoSQL database©myNoSQL)


Auto Scaling in the Amazon Cloud: Netflix's Approach and Lessons Learned

Another great post for today from the engineering team at Netflix:

Auto scaling is a very powerful tool, but it can also be a double-edged sword. Without the proper configuration and testing it can do more harm than good. A number of edge cases may occur when attempting to optimize or make the configuration more complex. As seen above, when configured carefully and correctly, auto scaling can increase availability while simultaneously decreasing overall costs.

Original title and link: Auto Scaling in the Amazon Cloud: Netflix’s Approach and Lessons Learned (NoSQL database©myNoSQL)


Netflix Open Sources Curator ZooKeeper Library

I’m excited to read that Netflix is getting even more involved in the open source world and the team there is starting to open source some of the tools developed internally. As some of you may already know, Netflix has been experimenting with quite a few and it is a heavy user of NoSQL databases running the majority of they services in the cloud.

The first project announced and available already on GitHub is Curator,a ZooKeeper client wrapper and rich ZooKeeper framework (nb: ZooKeeper just released version 3.4.0).

Curator deals with ZooKeeper complexity in the following ways:

  • Retry Mechanism: Curator supports a pluggable retry mechanism. All ZooKeeper operations that generate a recoverable error get retried per the configured retry policy. Curator comes bundled with several standard retry policies (e.g. exponential backoff).
  • Connection State Monitoring: Curator constantly monitors the ZooKeeper connection. Curator users can listen for state changes in the connection and respond accordingly.
  • ZooKeeper Instance Management: Curator manages the actual connection to the ZooKeeper cluster using the standard ZooKeeper class. However, the instance is managed internally (though you can access it if needed) and recreated as needed. Thus, Curator provides a reliable handle to the ZooKeeper cluster (unlike the built-in implementation).
  • Correct, Reliable Recipes: Curator comes bundled with implementations of most of the important ZooKeeper recipes (and some additional recipes as well). The implementations are written using ZooKeeper best practices and take account of all known edge cases (as mentioned above).
  • Curator’s focus on recipes makes your code more resilient as you can focus strictly on the ZooKeeper feature you’re interested in without worrying about correctly implementing ZooKeeper housekeeping requirements.

Next on the list of open source projects from Netflix:

  • Astyanax - The Netflix Cassandra Client
  • Priam - Co-Process for backup/recovery, Token Management, and Centralized Configuration management for Cassandra
  • CassJMeter - JMeter plugin to run cassandra tests

Original title and link: Netflix Open Sources Curator ZooKeeper Library (NoSQL database©myNoSQL)


Hadoop Chaos Monkey: The Fault Injection Framework

Do you remember the 5 lessons Netflix learned while using the Amazon Web Services—judging by how much Netflix shared about their experience in the cloud including Amazon SimpleDB I’d say these 5 are only the tip of the iceberg—where they talked about the Chaos Monkey?

One of the first systems our engineers built in AWS is called the Chaos Monkey. The Chaos Monkey’s job is to randomly kill instances and services within our architecture. If we aren’t constantly testing our ability to succeed despite failure, then it isn’t likely to work when it matters most – in the event of an unexpected outage.

Hadoop provides a similar framework: Fault Injection Framework :

The idea of fault injection is fairly simple: it is an infusion of errors and exceptions into an application’s logic to achieve a higher coverage and fault tolerance of the system. Different implementations of this idea are available today. Hadoop’s FI framework is built on top of Aspect Oriented Paradigm (AOP) implemented by AspectJ toolkit.

As a sidenote, this is one of the neatest usages of AspectJ I’ve read about.

Update: Abhijit Belapurkar says that Fault injection using AOP was part of Recovery Oriented Computing research at Stanford/UCB many years ago: JAGR: An Autonomous Self-Recovering Application Server.

Original title and link: Hadoop Chaos Monkey: The Fault Injection Framework (NoSQL database©myNoSQL)

Netflix: Run Consistency Checkers All The Time To Fixup Transactions

Todd Hoff about NoSQL and Cloud at Netflix:

You might have consistency problems if you have: multiple datastores in multiple datacenters, without distributed transactions, and with the ability to alternately execute out of each datacenter;  syncing protocols that can fail or sync stale data; distributed clients that cache data and then write old back to the central store; a NoSQL database that doesn’t have transactions between updates of multiple related key-value records; application level integrity checks; client driven optimistic locking.

Original title and link: Netflix: Run Consistency Checkers All The Time To Fixup Transactions (NoSQL databases © myNoSQL)


NoSQL & Cloud at Netflix

Today Netflix can be seen as a leader in what can be achieved by combining cloud computing and polyglot persistence. Not only that, but Netflix has chosen to share their experience with everyone else so we can all learn from their experience.

Netflix’s experience of migrating from an on-premise architecture using relational databases has been documented over time. Here are a couple of important points in the history of migrating from the classical architecture to the mostly in the cloud solution they are currently using and continuing to experiment and build:

And it doesn’t stop here. In the video below, Siddharth “Sid” Anand covers the answers to some questions that are in the mind of everyone considering NoSQL databases in the cloud:

  • What sort of data can you move to NoSQL?
  • Which NoSQL technologies are we working with?
  • How did we translate RDBMS concepts to NoSQL?

Original title and link: NoSQL & Cloud at Netflix (NoSQL databases © myNoSQL)


Why Netflix Picked Amazon SimpleDB, Hadoop/HBase, and Cassandra

Yury Izrailevsky[1]:

The reason why we use multiple NoSQL solutions is because each one is best suited for a specific set of use cases. For example, HBase is naturally integrated with the Hadoop platform, whereas Cassandra is best for cross-regional deployments and scaling with no single points of failure. Adopting the non-relational model in general is not easy, and Netflix has been paying a steep pioneer tax while integrating these rapidly evolving and still maturing NoSQL products. There is a learning curve and an operational overhead. Still, the scalability, availability and performance advantages of the NoSQL persistence model are evident and are paying for themselves already, and will be central to our long-term cloud strategy.

Summarizing the pros for each of the 3 solutions:

  • Amazon SimpleDB Pros

    • highly durable, writes spanning multiple availability zones
    • handy query and data formats
    • batch operations
    • consistent reads
    • hosted solution
  • HBase Pros

    • dynamic partitioning model
    • built-in support for compression
    • range queries
    • support for distributed counters
    • strong consistency
    • interoperability with Hadoop
  • Cassandra Pros

    • no dedicated name nodes
    • no practical architectural limitations on data sizes, row/column counts, etc.
    • flexible data model
    • no underlying storage format requirements like HDFS
    • uniquely flexible consistency and replication models
    • cross-datacenter and cross-regional replication

I hope the next post will be about the “small” issues Netflix ran into when adopting each of these systems. In the past they’ve shared some of the challenges of an Oracle - Amazon SimpleDB hybrid solution.

  1. Yury Izrailevsky: Netflix Director of Cloud and Systems Infrastructure  

Original title and link: Why Netflix Picked Amazon SimpleDB, Hadoop/HBase, and Cassandra (NoSQL databases © myNoSQL)