ALL COVERED TOPICS

NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter

NAVIGATE MAIN CATEGORIES

Close

AWS: All content tagged as AWS in NoSQL databases and polyglot persistence

HBase on EC2 using EBS volumes : Lessons Learned

There lies the answer! We have a requirement of recreating the cluster in case we accidentally delete entire data or if we loose our master. In such a case the reliable backup can only be taken if your HDFS data does not reside on the root devices. A reliable backup of the root device cannot be taken without rebooting the device. Furthermore it’s stored as an AMI which mean you have to create a new AMI every day and delete the old one. This means to solve all of our problems we need HBase installation and data both stored on attached EBS volumes that are not the root devices.

Update: after reading the post both Bradford Stephens[1] and Andrew Purtell[2] recommended using instance store instead of EBS:

EBS adds complexity, failure risk, and cost


  1. CEO of Drawn to Scale  

  2. Systems architect and HBase committer, @akpurtell  

Original title and link: HBase on EC2 using EBS volumes : Lessons Learned (NoSQL databases © myNoSQL)

via: http://aws-musings.com/hbase-on-ec2-using-ebs-volumes-lessons-learned/


Membase on Amazon EC2 with EBS

The decision was made and we decided to go with a 2 server solution, each server has 16G of memory and 100G of EBS volume attached to it.

Both will have membase latest stable version installed and perform as a cluster in case one falls or anything happens, a fail safe if you will.

In this post, I will walk you though what was done to perform this and how exactly it was done on the amazon cloud.

Wouldn’t it be easier if there would be an always up-to-date official Membase AMI and the corresponding guide (making sure important details about EBS are not left out)?

Original title and link: Membase on Amazon EC2 with EBS (NoSQL databases © myNoSQL)

via: http://www.kensodev.com/2011/05/15/install-membase-1-6-5-3-on-amazon-ec2-and-configure-it-on-ebs/


Neo4j REST Server Image in Amazon EC2

OpenCredo created it, Jussi Heinonen shares the details:

Neo4j EC2 Components Image

Original title and link: Neo4j REST Server Image in Amazon EC2 (NoSQL databases © myNoSQL)

via: http://jussiheinonen.blogspot.com/2011/05/neo4j-graph-database-server-image-in.html


MongoDB on EC2

The basic setup:

MongoDB on EC2

The advanced guide can be found in the MongoDB in the Amazon cloud post.

Original title and link: MongoDB on EC2 (NoSQL databases © myNoSQL)

via: http://blog.mongodb.org/post/4982676520/mongodb-on-ec2-best-practices


A Rake Task for Backing Up a MongoDB Database

Daniel Doubrovkine:

I tried mongodump and mongorestore. Those are straightforward tools that let you export and import Mongo data (Mongo people did their job very well there, much less hassle than with a traditional RDBMS where you have to backup the database, deal with the transaction log, bla bla bla). All is well when working with local machines. Remotely, you need to go the extra step of figuring out the database address, username and password. This gets messier with Heroku and eventually starts smelling bad.

I want to do this the “Rails Way” by invoking a single rake command that imports and exports Mongo data in any of my environments

So he wrote a Rake task for backing up MongoDB to Amazon S3.

Original title and link: A Rake Task for Backing Up a MongoDB Database (NoSQL databases © myNoSQL)

via: http://code.dblock.org/ShowPost.aspx?Id=192


Amazon EC2 Cassandra Cluster with DataStax AMI

This AMI does the following:

  • installs Cassandra 0.7.4 on a Ubuntu 10.10 image
  • configures emphemeral disks in raid0, if applicable (EBS is a bad fit for Cassandra
  • configures Cassandra to use the root volume for the commitlog and the ephemeral disks for data files
  • configures Cassandra to use the local interface for intra-cluster communication
  • configures all Cassandra nodes with the same seed for gossip discovery

Note the “EBS is a bad fit for Cassandra”. That’s what Adrian Cockcroft explains in Multi-tenancy and Cloud Storage Performance.

Original title and link: Amazon EC2 Cassandra Cluster with DataStax AMI (NoSQL databases © myNoSQL)

via: http://www.datastax.com/dev/blog/setting-up-a-cassandra-cluster-with-the-datastax-ami


Multi-tenancy and Cloud Storage Performance

Adrian Cockcroft[1] has a great explanation of the impact of multi-tenancy on cloud storage performance. The connection with NoSQL databases is not necessarily in the Amazon EBS and SSD Price, Performance, QoS comparison, but:

and

If you ever see public benchmarks of AWS that only use m1.small, they are useless, it shows that the people running the benchmark either didn’t know what they were doing or are deliberately trying to make some other system look better. You cannot expect to get consistent measurements of a system that has a very high probability of multi-tenant interference.


  1. Adrian Cockcroft: Netflix, @adrianco  

Original title and link: Multi-tenancy and Cloud Storage Performance (NoSQL databases © myNoSQL)

via: http://perfcap.blogspot.com/2011/03/understanding-and-using-amazon-ebs.html


Netflix: Run Consistency Checkers All The Time To Fixup Transactions

Todd Hoff about NoSQL and Cloud at Netflix:

You might have consistency problems if you have: multiple datastores in multiple datacenters, without distributed transactions, and with the ability to alternately execute out of each datacenter;  syncing protocols that can fail or sync stale data; distributed clients that cache data and then write old back to the central store; a NoSQL database that doesn’t have transactions between updates of multiple related key-value records; application level integrity checks; client driven optimistic locking.

Original title and link: Netflix: Run Consistency Checkers All The Time To Fixup Transactions (NoSQL databases © myNoSQL)

via: http://highscalability.com/blog/2011/4/6/netflix-run-consistency-checkers-all-the-time-to-fixup-trans.html


Apixio Using Hadoop, Pig and Cassandra for Advanced Analytics on Medical Records

Apixio uses Hadoop and Pig for analysing medical records and Cassandra for serving seach queries. All production machines are Amazon EC2 instances.

Bob Rogers, Apixio’s chief scientist, explained the importance of machine learning and unstructured-data analysis in the medical field. He said because of the proliferation of ontologies — area-specific terminology for everything from billing to scan results — any sort of search engine must be able to create degrees of association between the various ontologies, as well as common language.

It sounds like the perfect setup for Brisk.

Original title and link: Apixio Using Hadoop, Pig and Cassandra for Advanced Analytics on Medical Records (NoSQL databases © myNoSQL)

via: http://www.nytimes.com/external/gigaom/2011/04/01/01gigaom-apixio-is-bringing-big-data-to-medical-records-in-95148.html


Script for Launching a 100-node Riak Cluster

Remember last week’s discussion about administering and scaling up a Riak cluster on Amazon EC2? Reid Draper created a Python script to launch a 100-node Riak cluster:

The script launches a master node, and notes its IP address. The other 99 nodes are launched and told to join the master. Riak doesn’t currently have provisions to deal with many nodes trying to join the cluster at once. To avoid the thundering-herd problem I simply have each node sleep for a random time, such that nodes are joining, on average, one every 15 seconds.

In his test he got a 95-node (97 after re-adding 2 nodes) cluster up in about 35 minutes.

Original title and link: Script for Launching a 100-node Riak Cluster (NoSQL databases © myNoSQL)

via: http://reiddraper.com/100-node-riak-cluster/


MongoDB in the Amazon Cloud

A discussion on the MongoDB group about EBS snapshot backups of journaled MongoDB reminded me of a Jared Rosoff’s slides “MongoDB on EC2 and EBS” covering many important aspects of running MongoDB on the Amazon cloud:

  • MongoDB components and their requirements

    MongoDB components

  • deployment options and corresponding Amazon EC2 instance types

    MongoDB and Amazon EC2 instance types

  • operating systems, specific configurations, and operational advise:

    • deployment automation
    • backups and restoration
    • security
  • deployment scenarios:

    • 3-node replica set
    • 2-nodes + arbiter
    • multi-datacenter (availability zone) 3-node replica set
    • sharded MongoDB

While tempting, running databases in the cloud is not as simple as Amazon makes it sound. Reddit felt that with their Cassandra and PostgreSQL deployment.

Original title and link: MongoDB in the Amazon Cloud (NoSQL databases © myNoSQL)


NoSQL & Cloud at Netflix

Today Netflix can be seen as a leader in what can be achieved by combining cloud computing and polyglot persistence. Not only that, but Netflix has chosen to share their experience with everyone else so we can all learn from their experience.

Netflix’s experience of migrating from an on-premise architecture using relational databases has been documented over time. Here are a couple of important points in the history of migrating from the classical architecture to the mostly in the cloud solution they are currently using and continuing to experiment and build:

And it doesn’t stop here. In the video below, Siddharth “Sid” Anand covers the answers to some questions that are in the mind of everyone considering NoSQL databases in the cloud:

  • What sort of data can you move to NoSQL?
  • Which NoSQL technologies are we working with?
  • How did we translate RDBMS concepts to NoSQL?

Original title and link: NoSQL & Cloud at Netflix (NoSQL databases © myNoSQL)

via: http://techblog.netflix.com/2011/03/nosql-netflix-talk-part-1.html