EC2: All content tagged as EC2 in NoSQL databases and polyglot persistence
Tuesday, 7 June 2011
Setting Up a 3 Node Riak Cluster With EC2 Cluster Compute Instances
How fast can you set up a demo cluster:
Once your two instances are up and running, I literally followed the Riak documentation at Basic Cluster Setup. It was really that easy. The was one gotcha, though. Make sure that when you choose the IP address to bind (and name) for the nodes in your cluster that you use the EC2 Private IP Address (or DNS name should be fine too.)
If you’re looking something bigger check this script for launching a 100-node Riak cluster.
Original title and link: Setting Up a 3 Node Riak Cluster With EC2 Cluster Compute Instances (NoSQL databases © myNoSQL)
Wednesday, 1 June 2011
HBase on EC2 using EBS volumes : Lessons Learned
There lies the answer! We have a requirement of recreating the cluster in case we accidentally delete entire data or if we loose our master. In such a case the reliable backup can only be taken if your HDFS data does not reside on the root devices. A reliable backup of the root device cannot be taken without rebooting the device. Furthermore it’s stored as an AMI which mean you have to create a new AMI every day and delete the old one. This means to solve all of our problems we need HBase installation and data both stored on attached EBS volumes that are not the root devices.
Update: after reading the post both Bradford Stephens[1] and Andrew Purtell[2] recommended using instance store instead of EBS:
EBS adds complexity, failure risk, and cost
-
CEO of Drawn to Scale ↩
-
Systems architect and HBase committer, @akpurtell ↩
Original title and link: HBase on EC2 using EBS volumes : Lessons Learned (NoSQL databases © myNoSQL)
via: http://aws-musings.com/hbase-on-ec2-using-ebs-volumes-lessons-learned/
Wednesday, 18 May 2011
Membase on Amazon EC2 with EBS
The decision was made and we decided to go with a 2 server solution, each server has 16G of memory and 100G of EBS volume attached to it.
Both will have membase latest stable version installed and perform as a cluster in case one falls or anything happens, a fail safe if you will.
In this post, I will walk you though what was done to perform this and how exactly it was done on the amazon cloud.
Wouldn’t it be easier if there would be an always up-to-date official Membase AMI and the corresponding guide (making sure important details about EBS are not left out)?
Original title and link: Membase on Amazon EC2 with EBS (NoSQL databases © myNoSQL)
Monday, 9 May 2011
Neo4j REST Server Image in Amazon EC2
OpenCredo created it, Jussi Heinonen shares the details:
Original title and link: Neo4j REST Server Image in Amazon EC2 (NoSQL databases © myNoSQL)
via: http://jussiheinonen.blogspot.com/2011/05/neo4j-graph-database-server-image-in.html
Thursday, 28 April 2011
MongoDB on EC2
The basic setup:
The advanced guide can be found in the MongoDB in the Amazon cloud post.
Original title and link: MongoDB on EC2 (NoSQL databases © myNoSQL)
via: http://blog.mongodb.org/post/4982676520/mongodb-on-ec2-best-practices
Monday, 18 April 2011
Amazon EC2 Cassandra Cluster with DataStax AMI
This AMI does the following:
- installs Cassandra 0.7.4 on a Ubuntu 10.10 image
- configures emphemeral disks in raid0, if applicable (EBS is a bad fit for Cassandra
- configures Cassandra to use the root volume for the commitlog and the ephemeral disks for data files
- configures Cassandra to use the local interface for intra-cluster communication
- configures all Cassandra nodes with the same seed for gossip discovery
Note the “EBS is a bad fit for Cassandra”. That’s what Adrian Cockcroft explains in Multi-tenancy and Cloud Storage Performance.
Original title and link: Amazon EC2 Cassandra Cluster with DataStax AMI (NoSQL databases © myNoSQL)
via: http://www.datastax.com/dev/blog/setting-up-a-cassandra-cluster-with-the-datastax-ami
Multi-tenancy and Cloud Storage Performance
Adrian Cockcroft[1] has a great explanation of the impact of multi-tenancy on cloud storage performance. The connection with NoSQL databases is not necessarily in the Amazon EBS and SSD Price, Performance, QoS comparison, but:
- Reddit’s story of running Cassandra & PostgreSQL on Amazon EBS (nb: their setup led to a prolongued downtime)
- MongoDB in the Amazon Cloud
and
If you ever see public benchmarks of AWS that only use m1.small, they are useless, it shows that the people running the benchmark either didn’t know what they were doing or are deliberately trying to make some other system look better. You cannot expect to get consistent measurements of a system that has a very high probability of multi-tenant interference.
Original title and link: Multi-tenancy and Cloud Storage Performance (NoSQL databases © myNoSQL)
via: http://perfcap.blogspot.com/2011/03/understanding-and-using-amazon-ebs.html
Monday, 4 April 2011
Apixio Using Hadoop, Pig and Cassandra for Advanced Analytics on Medical Records
Apixio uses Hadoop and Pig for analysing medical records and Cassandra for serving seach queries. All production machines are Amazon EC2 instances.
Bob Rogers, Apixio’s chief scientist, explained the importance of machine learning and unstructured-data analysis in the medical field. He said because of the proliferation of ontologies — area-specific terminology for everything from billing to scan results — any sort of search engine must be able to create degrees of association between the various ontologies, as well as common language.
It sounds like the perfect setup for Brisk.
Original title and link: Apixio Using Hadoop, Pig and Cassandra for Advanced Analytics on Medical Records (NoSQL databases © myNoSQL)
Script for Launching a 100-node Riak Cluster
Remember last week’s discussion about administering and scaling up a Riak cluster on Amazon EC2? Reid Draper created a Python script to launch a 100-node Riak cluster:
The script launches a master node, and notes its IP address. The other 99 nodes are launched and told to join the master. Riak doesn’t currently have provisions to deal with many nodes trying to join the cluster at once. To avoid the thundering-herd problem I simply have each node sleep for a random time, such that nodes are joining, on average, one every 15 seconds.
In his test he got a 95-node (97 after re-adding 2 nodes) cluster up in about 35 minutes.
Original title and link: Script for Launching a 100-node Riak Cluster (NoSQL databases © myNoSQL)
Thursday, 31 March 2011
MongoDB in the Amazon Cloud
A discussion on the MongoDB group about EBS snapshot backups of journaled MongoDB reminded me of a Jared Rosoff’s slides “MongoDB on EC2 and EBS” covering many important aspects of running MongoDB on the Amazon cloud:
-
MongoDB components and their requirements

-
deployment options and corresponding Amazon EC2 instance types

-
operating systems, specific configurations, and operational advise:
- deployment automation
- backups and restoration
- security
-
deployment scenarios:
- 3-node replica set
- 2-nodes + arbiter
- multi-datacenter (availability zone) 3-node replica set
- sharded MongoDB
While tempting, running databases in the cloud is not as simple as Amazon makes it sound. Reddit felt that with their Cassandra and PostgreSQL deployment.
Original title and link: MongoDB in the Amazon Cloud (NoSQL databases © myNoSQL)
Tuesday, 29 March 2011
Scaling up a Riak Cluster on Amazon EC2
A conversation about administering and scaling up a Riak cluster on Amazon EC2 captured on Mark Phillip’s Riak recap
The following resource are mentioned in the conversation:
- Creating a Local Riak Cluster with Vagrant and Chef
- Chef cookbook for Riak autoconf.rb and cluster.rb
- Chef LWRP for Riak
Original title and link: Scaling up a Riak Cluster on Amazon EC2 (NoSQL databases © myNoSQL)
Thursday, 7 October 2010
Hadoop: Cluster Deploy on EC2/UEC Using Puppet and Ubuntu
Once the initial setup of the Puppet master is done and the Hadoop Namenode and Jobtracker are up and running adding new Hadoop Workers is just one command:
./start_instance.py worker
Puppet automatically configures them to join the Hadoop Cluster.
But explaining how to set up the Puppet master, Hadoop Namenode and Jobtracker resulted in a very long post. It also looks like there are two versions for the Puppet recipe: Adobe’s for Hadoop/HBase deployments and ☞ some code on Launchpad
Original title and link: Hadoop: Cluster Deploy on EC2/UEC Using Puppet and Ubuntu (NoSQL databases © myNoSQL)
Most Popular Articles
- Translate SQL to MongoDB MapReduce
- Tutorial: Getting Started With Cassandra
- CouchDB vs MongoDB: An attempt for a More Informed Comparison
- Cassandra @ Twitter: An Interview with Ryan King
- A Couple of Nice GUI Tools for MongoDB
- NoSQL benchmarks and performance evaluations
- Ehcache: Distributed Cache or NoSQL Store?
- Document Databases Compared: CouchDB, MongoDB, RavenDB
- Quick Review of Existing Graph Databases
- NoSQL Data Modeling


