ALL COVERED TOPICS

NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter

NAVIGATE MAIN CATEGORIES

Close

EC2: All content tagged as EC2 in NoSQL databases and polyglot persistence

Setting Up a 3 Node Riak Cluster With EC2 Cluster Compute Instances

How fast can you set up a demo cluster:

Once your two instances are up and running, I literally followed the Riak documentation at Basic Cluster Setup.  It was really that easy.  The was one gotcha, though. Make sure that when you choose the IP address to bind (and name) for the nodes in your cluster that you use the EC2 Private IP Address (or DNS name should be fine too.)

If you’re looking something bigger check this script for launching a 100-node Riak cluster.

Original title and link: Setting Up a 3 Node Riak Cluster With EC2 Cluster Compute Instances (NoSQL databases © myNoSQL)

via: http://adamschepis.com/blog/2011/06/06/setting-up-a-3-node-riak-cluster-with-ec2-cluster-compute-instances/


HBase on EC2 using EBS volumes : Lessons Learned

There lies the answer! We have a requirement of recreating the cluster in case we accidentally delete entire data or if we loose our master. In such a case the reliable backup can only be taken if your HDFS data does not reside on the root devices. A reliable backup of the root device cannot be taken without rebooting the device. Furthermore it’s stored as an AMI which mean you have to create a new AMI every day and delete the old one. This means to solve all of our problems we need HBase installation and data both stored on attached EBS volumes that are not the root devices.

Update: after reading the post both Bradford Stephens[1] and Andrew Purtell[2] recommended using instance store instead of EBS:

EBS adds complexity, failure risk, and cost


  1. CEO of Drawn to Scale  

  2. Systems architect and HBase committer, @akpurtell  

Original title and link: HBase on EC2 using EBS volumes : Lessons Learned (NoSQL databases © myNoSQL)

via: http://aws-musings.com/hbase-on-ec2-using-ebs-volumes-lessons-learned/


Membase on Amazon EC2 with EBS

The decision was made and we decided to go with a 2 server solution, each server has 16G of memory and 100G of EBS volume attached to it.

Both will have membase latest stable version installed and perform as a cluster in case one falls or anything happens, a fail safe if you will.

In this post, I will walk you though what was done to perform this and how exactly it was done on the amazon cloud.

Wouldn’t it be easier if there would be an always up-to-date official Membase AMI and the corresponding guide (making sure important details about EBS are not left out)?

Original title and link: Membase on Amazon EC2 with EBS (NoSQL databases © myNoSQL)

via: http://www.kensodev.com/2011/05/15/install-membase-1-6-5-3-on-amazon-ec2-and-configure-it-on-ebs/


Neo4j REST Server Image in Amazon EC2

OpenCredo created it, Jussi Heinonen shares the details:

Neo4j EC2 Components Image

Original title and link: Neo4j REST Server Image in Amazon EC2 (NoSQL databases © myNoSQL)

via: http://jussiheinonen.blogspot.com/2011/05/neo4j-graph-database-server-image-in.html


MongoDB on EC2

The basic setup:

MongoDB on EC2

The advanced guide can be found in the MongoDB in the Amazon cloud post.

Original title and link: MongoDB on EC2 (NoSQL databases © myNoSQL)

via: http://blog.mongodb.org/post/4982676520/mongodb-on-ec2-best-practices


Amazon EC2 Cassandra Cluster with DataStax AMI

This AMI does the following:

  • installs Cassandra 0.7.4 on a Ubuntu 10.10 image
  • configures emphemeral disks in raid0, if applicable (EBS is a bad fit for Cassandra
  • configures Cassandra to use the root volume for the commitlog and the ephemeral disks for data files
  • configures Cassandra to use the local interface for intra-cluster communication
  • configures all Cassandra nodes with the same seed for gossip discovery

Note the “EBS is a bad fit for Cassandra”. That’s what Adrian Cockcroft explains in Multi-tenancy and Cloud Storage Performance.

Original title and link: Amazon EC2 Cassandra Cluster with DataStax AMI (NoSQL databases © myNoSQL)

via: http://www.datastax.com/dev/blog/setting-up-a-cassandra-cluster-with-the-datastax-ami


Multi-tenancy and Cloud Storage Performance

Adrian Cockcroft[1] has a great explanation of the impact of multi-tenancy on cloud storage performance. The connection with NoSQL databases is not necessarily in the Amazon EBS and SSD Price, Performance, QoS comparison, but:

and

If you ever see public benchmarks of AWS that only use m1.small, they are useless, it shows that the people running the benchmark either didn’t know what they were doing or are deliberately trying to make some other system look better. You cannot expect to get consistent measurements of a system that has a very high probability of multi-tenant interference.


  1. Adrian Cockcroft: Netflix, @adrianco  

Original title and link: Multi-tenancy and Cloud Storage Performance (NoSQL databases © myNoSQL)

via: http://perfcap.blogspot.com/2011/03/understanding-and-using-amazon-ebs.html


Apixio Using Hadoop, Pig and Cassandra for Advanced Analytics on Medical Records

Apixio uses Hadoop and Pig for analysing medical records and Cassandra for serving seach queries. All production machines are Amazon EC2 instances.

Bob Rogers, Apixio’s chief scientist, explained the importance of machine learning and unstructured-data analysis in the medical field. He said because of the proliferation of ontologies — area-specific terminology for everything from billing to scan results — any sort of search engine must be able to create degrees of association between the various ontologies, as well as common language.

It sounds like the perfect setup for Brisk.

Original title and link: Apixio Using Hadoop, Pig and Cassandra for Advanced Analytics on Medical Records (NoSQL databases © myNoSQL)

via: http://www.nytimes.com/external/gigaom/2011/04/01/01gigaom-apixio-is-bringing-big-data-to-medical-records-in-95148.html


Script for Launching a 100-node Riak Cluster

Remember last week’s discussion about administering and scaling up a Riak cluster on Amazon EC2? Reid Draper created a Python script to launch a 100-node Riak cluster:

The script launches a master node, and notes its IP address. The other 99 nodes are launched and told to join the master. Riak doesn’t currently have provisions to deal with many nodes trying to join the cluster at once. To avoid the thundering-herd problem I simply have each node sleep for a random time, such that nodes are joining, on average, one every 15 seconds.

In his test he got a 95-node (97 after re-adding 2 nodes) cluster up in about 35 minutes.

Original title and link: Script for Launching a 100-node Riak Cluster (NoSQL databases © myNoSQL)

via: http://reiddraper.com/100-node-riak-cluster/


MongoDB in the Amazon Cloud

A discussion on the MongoDB group about EBS snapshot backups of journaled MongoDB reminded me of a Jared Rosoff’s slides “MongoDB on EC2 and EBS” covering many important aspects of running MongoDB on the Amazon cloud:

  • MongoDB components and their requirements

    MongoDB components

  • deployment options and corresponding Amazon EC2 instance types

    MongoDB and Amazon EC2 instance types

  • operating systems, specific configurations, and operational advise:

    • deployment automation
    • backups and restoration
    • security
  • deployment scenarios:

    • 3-node replica set
    • 2-nodes + arbiter
    • multi-datacenter (availability zone) 3-node replica set
    • sharded MongoDB

While tempting, running databases in the cloud is not as simple as Amazon makes it sound. Reddit felt that with their Cassandra and PostgreSQL deployment.

Original title and link: MongoDB in the Amazon Cloud (NoSQL databases © myNoSQL)


Scaling up a Riak Cluster on Amazon EC2

A conversation about administering and scaling up a Riak cluster on Amazon EC2 captured on Mark Phillip’s Riak recap

The following resource are mentioned in the conversation:

Original title and link: Scaling up a Riak Cluster on Amazon EC2 (NoSQL databases © myNoSQL)


Hadoop: Cluster Deploy on EC2/UEC Using Puppet and Ubuntu

Once the initial setup of the Puppet master is done and the Hadoop Namenode and Jobtracker are up and running adding new Hadoop Workers is just one command:

./start_instance.py worker

Puppet automatically configures them to join the Hadoop Cluster.

Hadoop Puppet Cluster

But explaining how to set up the Puppet master, Hadoop Namenode and Jobtracker resulted in a very long post. It also looks like there are two versions for the Puppet recipe: Adobe’s for Hadoop/HBase deployments and ☞ some code on Launchpad

Original title and link: Hadoop: Cluster Deploy on EC2/UEC Using Puppet and Ubuntu (NoSQL databases © myNoSQL)

via: http://ubuntumathiaz.wordpress.com/2010/09/27/deploying-a-hadoop-cluster-on-ec2uec-with-puppet-and-ubuntu-maverick/