NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



Reddit: All content tagged as Reddit in NoSQL databases and polyglot persistence

Reddit’s Database Has Two Tables

Considering the fast evolution of NoSQL databases, the topic is now very old (from 2010). But read the comments on the original post, Hacker News, and Reddit to see what people think today about extreme denormalization, schemas, relational and NoSQL databases.

Original title and link: Reddit’s Database Has Two Tables (NoSQL database©myNoSQL)


Reddit's Story of Running Cassandra & PostgreSQL on Amazon EBS

I’m still distilling what happened at Reddit the other days when failures of EBS in a single availability zone took Reddit down for many hours:

Unfortunately, EBS also has reliability issues. Even before the serious outage last night, we suffered random disks degrading multiple times a week. While we do have protections in place to mitigate latency on a small set of disks by using raid-0 stripes, the frequency of degradation has become highly unpalatable.

[…] we have been working to completely move Cassandra off of EBS and onto the local storage which is directly attached to the EC2 instances. […] While the local storage has much less functionality than EBS, the reliability of local storage outweighs the benefits of EBS.

After the outage today, we are going to be investigating doing the same for our Postgres clusters.

One mistake we made was using a single EBS disk to back some of our older master databases

Maybe these will sound as truisms to those working on high available systems, but not for everybody else:

  • when talking high availability, running your application from a single Amazon availability zone is not enough

  • even if EBS promises “highly available, highly reliable storage volumes”, a solution relying on it will have to account for: 1) failures; 2) unreliable performance.

    An ex-Reddit engineer posted details about the serious issues Reddit noticed while using Amazon EBS.

  • Dynamo-style NoSQL databases — where all nodes in a cluster are equal — are able to tolerate failures easier than traditional RDBMS.

    Reddit is working on moving Cassandra off the EBS and onto the local ephemeral EC2 storage.

  • A master/slave replication model combined with the out-of-order commits issue makes me think that the cloud and RDBMS are not yet perfect together.

    Data which had been committed to the slaves was not committed to the masters. In a normal replication scenario, this should never, ever happen. The master commits the data, then tells the slave it is safe to commit the same data.

  • One mistake we made was using a single EBS disk to back some of our older master databases

  • remember the Amazon EBS vs SSD: Price, Performance, QoS?

What else can we learn from Reddit’s experience?

Original title and link: Reddit’s Story of Running Cassandra & PostgreSQL on Amazon EBS (NoSQL databases © myNoSQL)


MemcacheDB History at Reddit

Steve Huffman (co-founder and programmer of Reddit) speaking at ☞ FOWA Miami 2010 (around min.18:30)[1]:

And then there is another software that is really handy MemcacheDB, which is like memcached but is persistent. […] It’s very very fast, super-handy, we store far more data in MemcacheDB than we do in Postgres

Then bam! MemcacheDB bursting blocking writes leading Reddit to switch to Cassandra as friends from Digg or Twitter did.

Lesson learned: take such pieces of advise with a grain of salt and always test your scenario.

  1. It looks like Steve was not working at Reddit anymore at the time the presentation was made and so he might not have been aware of the problems related to MemcacheDB.  ()