AWS: All content tagged as AWS in NoSQL databases and polyglot persistence
Wednesday, 1 June 2011
HBase on EC2 using EBS volumes : Lessons Learned
There lies the answer! We have a requirement of recreating the cluster in case we accidentally delete entire data or if we loose our master. In such a case the reliable backup can only be taken if your HDFS data does not reside on the root devices. A reliable backup of the root device cannot be taken without rebooting the device. Furthermore it’s stored as an AMI which mean you have to create a new AMI every day and delete the old one. This means to solve all of our problems we need HBase installation and data both stored on attached EBS volumes that are not the root devices.
Update: after reading the post both Bradford Stephens[1] and Andrew Purtell[2] recommended using instance store instead of EBS:
EBS adds complexity, failure risk, and cost
-
CEO of Drawn to Scale ↩
-
Systems architect and HBase committer, @akpurtell ↩
Original title and link: HBase on EC2 using EBS volumes : Lessons Learned (NoSQL databases © myNoSQL)
via: http://aws-musings.com/hbase-on-ec2-using-ebs-volumes-lessons-learned/
Wednesday, 18 May 2011
Membase on Amazon EC2 with EBS
The decision was made and we decided to go with a 2 server solution, each server has 16G of memory and 100G of EBS volume attached to it.
Both will have membase latest stable version installed and perform as a cluster in case one falls or anything happens, a fail safe if you will.
In this post, I will walk you though what was done to perform this and how exactly it was done on the amazon cloud.
Wouldn’t it be easier if there would be an always up-to-date official Membase AMI and the corresponding guide (making sure important details about EBS are not left out)?
Original title and link: Membase on Amazon EC2 with EBS (NoSQL databases © myNoSQL)
Monday, 9 May 2011
Neo4j REST Server Image in Amazon EC2
OpenCredo created it, Jussi Heinonen shares the details:
Original title and link: Neo4j REST Server Image in Amazon EC2 (NoSQL databases © myNoSQL)
via: http://jussiheinonen.blogspot.com/2011/05/neo4j-graph-database-server-image-in.html
Thursday, 28 April 2011
MongoDB on EC2
The basic setup:
The advanced guide can be found in the MongoDB in the Amazon cloud post.
Original title and link: MongoDB on EC2 (NoSQL databases © myNoSQL)
via: http://blog.mongodb.org/post/4982676520/mongodb-on-ec2-best-practices
Tuesday, 19 April 2011
A Rake Task for Backing Up a MongoDB Database
Daniel Doubrovkine:
I tried
mongodumpandmongorestore. Those are straightforward tools that let you export and import Mongo data (Mongo people did their job very well there, much less hassle than with a traditional RDBMS where you have to backup the database, deal with the transaction log, bla bla bla). All is well when working with local machines. Remotely, you need to go the extra step of figuring out the database address, username and password. This gets messier with Heroku and eventually starts smelling bad.I want to do this the “Rails Way” by invoking a single rake command that imports and exports Mongo data in any of my environments
So he wrote a Rake task for backing up MongoDB to Amazon S3.
Original title and link: A Rake Task for Backing Up a MongoDB Database (NoSQL databases © myNoSQL)
Monday, 18 April 2011
Amazon EC2 Cassandra Cluster with DataStax AMI
This AMI does the following:
- installs Cassandra 0.7.4 on a Ubuntu 10.10 image
- configures emphemeral disks in raid0, if applicable (EBS is a bad fit for Cassandra
- configures Cassandra to use the root volume for the commitlog and the ephemeral disks for data files
- configures Cassandra to use the local interface for intra-cluster communication
- configures all Cassandra nodes with the same seed for gossip discovery
Note the “EBS is a bad fit for Cassandra”. That’s what Adrian Cockcroft explains in Multi-tenancy and Cloud Storage Performance.
Original title and link: Amazon EC2 Cassandra Cluster with DataStax AMI (NoSQL databases © myNoSQL)
via: http://www.datastax.com/dev/blog/setting-up-a-cassandra-cluster-with-the-datastax-ami
Multi-tenancy and Cloud Storage Performance
Adrian Cockcroft[1] has a great explanation of the impact of multi-tenancy on cloud storage performance. The connection with NoSQL databases is not necessarily in the Amazon EBS and SSD Price, Performance, QoS comparison, but:
- Reddit’s story of running Cassandra & PostgreSQL on Amazon EBS (nb: their setup led to a prolongued downtime)
- MongoDB in the Amazon Cloud
and
If you ever see public benchmarks of AWS that only use m1.small, they are useless, it shows that the people running the benchmark either didn’t know what they were doing or are deliberately trying to make some other system look better. You cannot expect to get consistent measurements of a system that has a very high probability of multi-tenant interference.
Original title and link: Multi-tenancy and Cloud Storage Performance (NoSQL databases © myNoSQL)
via: http://perfcap.blogspot.com/2011/03/understanding-and-using-amazon-ebs.html
Wednesday, 13 April 2011
Netflix: Run Consistency Checkers All The Time To Fixup Transactions
Todd Hoff about NoSQL and Cloud at Netflix:
You might have consistency problems if you have: multiple datastores in multiple datacenters, without distributed transactions, and with the ability to alternately execute out of each datacenter; syncing protocols that can fail or sync stale data; distributed clients that cache data and then write old back to the central store; a NoSQL database that doesn’t have transactions between updates of multiple related key-value records; application level integrity checks; client driven optimistic locking.
Original title and link: Netflix: Run Consistency Checkers All The Time To Fixup Transactions (NoSQL databases © myNoSQL)
Monday, 4 April 2011
Apixio Using Hadoop, Pig and Cassandra for Advanced Analytics on Medical Records
Apixio uses Hadoop and Pig for analysing medical records and Cassandra for serving seach queries. All production machines are Amazon EC2 instances.
Bob Rogers, Apixio’s chief scientist, explained the importance of machine learning and unstructured-data analysis in the medical field. He said because of the proliferation of ontologies — area-specific terminology for everything from billing to scan results — any sort of search engine must be able to create degrees of association between the various ontologies, as well as common language.
It sounds like the perfect setup for Brisk.
Original title and link: Apixio Using Hadoop, Pig and Cassandra for Advanced Analytics on Medical Records (NoSQL databases © myNoSQL)
Script for Launching a 100-node Riak Cluster
Remember last week’s discussion about administering and scaling up a Riak cluster on Amazon EC2? Reid Draper created a Python script to launch a 100-node Riak cluster:
The script launches a master node, and notes its IP address. The other 99 nodes are launched and told to join the master. Riak doesn’t currently have provisions to deal with many nodes trying to join the cluster at once. To avoid the thundering-herd problem I simply have each node sleep for a random time, such that nodes are joining, on average, one every 15 seconds.
In his test he got a 95-node (97 after re-adding 2 nodes) cluster up in about 35 minutes.
Original title and link: Script for Launching a 100-node Riak Cluster (NoSQL databases © myNoSQL)
Thursday, 31 March 2011
MongoDB in the Amazon Cloud
A discussion on the MongoDB group about EBS snapshot backups of journaled MongoDB reminded me of a Jared Rosoff’s slides “MongoDB on EC2 and EBS” covering many important aspects of running MongoDB on the Amazon cloud:
-
MongoDB components and their requirements

-
deployment options and corresponding Amazon EC2 instance types

-
operating systems, specific configurations, and operational advise:
- deployment automation
- backups and restoration
- security
-
deployment scenarios:
- 3-node replica set
- 2-nodes + arbiter
- multi-datacenter (availability zone) 3-node replica set
- sharded MongoDB
While tempting, running databases in the cloud is not as simple as Amazon makes it sound. Reddit felt that with their Cassandra and PostgreSQL deployment.
Original title and link: MongoDB in the Amazon Cloud (NoSQL databases © myNoSQL)
Wednesday, 30 March 2011
NoSQL & Cloud at Netflix
Today Netflix can be seen as a leader in what can be achieved by combining cloud computing and polyglot persistence. Not only that, but Netflix has chosen to share their experience with everyone else so we can all learn from their experience.
Netflix’s experience of migrating from an on-premise architecture using relational databases has been documented over time. Here are a couple of important points in the history of migrating from the classical architecture to the mostly in the cloud solution they are currently using and continuing to experiment and build:
- Challenges of a Hybrid solution: Oracle - Amazon SimpleDB
- Practical tips for optimizing Amazon SimpleDB access
- Netflix’s transition to High-Availability storage systems
- Why Netflix picked Amazon SimpleDB, Hadoop/HBase, and Cassandra
- The key technical challenge of cloud computing
And it doesn’t stop here. In the video below, Siddharth “Sid” Anand covers the answers to some questions that are in the mind of everyone considering NoSQL databases in the cloud:
- What sort of data can you move to NoSQL?
- Which NoSQL technologies are we working with?
- How did we translate RDBMS concepts to NoSQL?
Original title and link: NoSQL & Cloud at Netflix (NoSQL databases © myNoSQL)
via: http://techblog.netflix.com/2011/03/nosql-netflix-talk-part-1.html
Most Popular Articles
- Translate SQL to MongoDB MapReduce
- Tutorial: Getting Started With Cassandra
- CouchDB vs MongoDB: An attempt for a More Informed Comparison
- Cassandra @ Twitter: An Interview with Ryan King
- A Couple of Nice GUI Tools for MongoDB
- NoSQL benchmarks and performance evaluations
- Ehcache: Distributed Cache or NoSQL Store?
- Document Databases Compared: CouchDB, MongoDB, RavenDB
- Quick Review of Existing Graph Databases
- NoSQL Data Modeling

