reddit: All content tagged as reddit in NoSQL databases and polyglot persistence
Monday, 3 September 2012
Reddit’s Database Has Two Tables
Considering the fast evolution of NoSQL databases, the topic is now very old (from 2010). But read the comments on the original post, Hacker News, and Reddit to see what people think today about extreme denormalization, schemas, relational and NoSQL databases.
Original title and link: Reddit’s Database Has Two Tables (©myNoSQL)
via: http://kev.inburke.com/kevin/reddits-database-has-two-tables/
Monday, 21 March 2011
Reddit's Story of Running Cassandra & PostgreSQL on Amazon EBS
I’m still distilling what happened at Reddit the other days when failures of EBS in a single availability zone took Reddit down for many hours:
Unfortunately, EBS also has reliability issues. Even before the serious outage last night, we suffered random disks degrading multiple times a week. While we do have protections in place to mitigate latency on a small set of disks by using raid-0 stripes, the frequency of degradation has become highly unpalatable.
[…] we have been working to completely move Cassandra off of EBS and onto the local storage which is directly attached to the EC2 instances. […] While the local storage has much less functionality than EBS, the reliability of local storage outweighs the benefits of EBS.
After the outage today, we are going to be investigating doing the same for our Postgres clusters.
One mistake we made was using a single EBS disk to back some of our older master databases
Maybe these will sound as truisms to those working on high available systems, but not for everybody else:
-
when talking high availability, running your application from a single Amazon availability zone is not enough
-
even if EBS promises “highly available, highly reliable storage volumes”, a solution relying on it will have to account for: 1) failures; 2) unreliable performance.
An ex-Reddit engineer posted details about the serious issues Reddit noticed while using Amazon EBS.
-
Dynamo-style NoSQL databases — where all nodes in a cluster are equal — are able to tolerate failures easier than traditional RDBMS.
Reddit is working on moving Cassandra off the EBS and onto the local ephemeral EC2 storage.
-
A master/slave replication model combined with the out-of-order commits issue makes me think that the cloud and RDBMS are not yet perfect together.
Data which had been committed to the slaves was not committed to the masters. In a normal replication scenario, this should never, ever happen. The master commits the data, then tells the slave it is safe to commit the same data.
-
One mistake we made was using a single EBS disk to back some of our older master databases
- remember the Amazon EBS vs SSD: Price, Performance, QoS?
What else can we learn from Reddit’s experience?
Original title and link: Reddit’s Story of Running Cassandra & PostgreSQL on Amazon EBS (NoSQL databases © myNoSQL)
via: http://blog.reddit.com/2011/03/why-reddit-was-down-for-6-of-last-24.html
Tuesday, 22 June 2010
MemcacheDB History at Reddit
Steve Huffman (co-founder and programmer of Reddit) speaking at ☞ FOWA Miami 2010 (around min.18:30)[1]:
And then there is another software that is really handy MemcacheDB, which is like memcached but is persistent. […] It’s very very fast, super-handy, we store far more data in MemcacheDB than we do in Postgres
Then bam! MemcacheDB bursting blocking writes leading Reddit to switch to Cassandra as friends from Digg or Twitter did.
Lesson learned: take such pieces of advise with a grain of salt and always test your scenario.
- It looks like Steve was not working at Reddit anymore at the time the presentation was made and so he might not have been aware of the problems related to MemcacheDB. (↩)