NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



Puppet: All content tagged as Puppet in NoSQL databases and polyglot persistence

PuppetDB: Configuration Management Database for Puppet

PuppetDB is replacing CouchDB for managing Puppet configurations and is a service layer written in Clojure with a PostgreSQL back-end. Not a graph database:

PuppetDB is a key component of the Puppet Data Library, and brings that to bear in its query API. Resources, facts, nodes, and metrics can all be queried over HTTP. For resources and nodes, there is a simple query language which can be used to form arbitrarily complex requests. The public API is the same one that Puppet uses to make storeconfigs queries (using the «||» operator) of PuppetDB, but provides a superset of the functionality provided by storeconfigs.

PuppetDB is faster, smarter, and has more complete data than ever before. […] PuppetDB offers great power over and insight into your infrastructure, and it’s only going to get bigger and better.

Original title and link: PuppetDB: Configuration Management Database for Puppet (NoSQL database©myNoSQL)


NoSQL Databases Configuration Management

After reading about MarkLogic Packaging feature, I was wondering if managing configurations would not be better done with tools like Puppet or Chef instead of a custom built solution even if it comes packaged with your NoSQL database.

  • You’ve been working on an application on your development machine. Now it’s time to move your application to the staging or testing servers. What follows is a tedious process of reviewing the settings on your development machine and applying them to the staging machine. How sure are you that you got all the indexes just right?
  • You’ve got a certified configuration that you want to deploy onto a new cluster. Getting the hardware setup and installing the server itself isn’t too hard, but now you have to make sure that all the application servers and databases are setup. Can you see another tedious process coming?

If you’ve been involved or responsible for managing the configuration of a NoSQL database deployment, I’d really love to learn what solution and tools have been used.

Original title and link: NoSQL Databases Configuration Management (NoSQL database©myNoSQL)

Hadoop and Cassandra in the Top 10 Most Important Open Source Projects of 2011

From the Big Data and NoSQL space: Hadoop and Cassandra. And related to this space: OpenStack and Puppet.

So to judge importance, I looked at projects that are influential, gaining in popularity, and/or technical standouts in new areas. In other words, projects that are even more noteworthy than the other noteworthy projects. This means that many projects that are crucial didn’t make the list.

On my list of projects left out: Redis, Riak, HBase, and MongoDB. By looking at the explanation of why Android didn’t make the list, I could understand why not including Redis, Riak, and MongoDB. But I’d keep HBase.

Original title and link: Hadoop and Cassandra in the Top 10 Most Important Open Source Projects of 2011 (NoSQL database©myNoSQL)


Deploying Cassandra With Puppet

Cassandra is a peer-to-peer architecture which is typically deployed on a large number of servers. Deploying, managing, and upgrading these systems by is more administrative time especially as your cluster grows. Puppet provides a simple way to install Cassandra.

Puppet is one of the favorite tools for automating the deployment on large clusters. Here is an example for HBase/Hadoop deployments and one for deploying a Hadoop cluster on EC2. There is Puppet recipe for Riak on GitHub, but it doesn’t show a lot of activity.

Getting back to Cassandra, you know what would be cool (if possible)? A Puppet recipe for a rolling upgrade of a Cassandra cluster.

Original title and link: Deploying Cassandra With Puppet (NoSQL database©myNoSQL)


Puppet and CouchDB

Starting in Puppet 2.6, its possible to store all facts in a couchdb database. […] Advantages:

  • Facts can be aggregated to a separate service to be queried.
  • We can now access a clients fact information using Couch’s RESTful interface.

Was it possible before to store this data in another storage engine or CouchDB is the first option?

Original title and link: Puppet and CouchDB (NoSQL databases © myNoSQL)


Hadoop: Cluster Deploy on EC2/UEC Using Puppet and Ubuntu

Once the initial setup of the Puppet master is done and the Hadoop Namenode and Jobtracker are up and running adding new Hadoop Workers is just one command:

./ worker

Puppet automatically configures them to join the Hadoop Cluster.

Hadoop Puppet Cluster

But explaining how to set up the Puppet master, Hadoop Namenode and Jobtracker resulted in a very long post. It also looks like there are two versions for the Puppet recipe: Adobe’s for Hadoop/HBase deployments and ☞ some code on Launchpad

Original title and link: Hadoop: Cluster Deploy on EC2/UEC Using Puppet and Ubuntu (NoSQL databases © myNoSQL)


Canonical, Ubuntu and NoSQL

Separately, sources close to Canonical have told The Reg that the company is in talks with Cassandra and CouchDB on NoSQL, and start-up PuppetLabs for data-center automation and provisioning.


Canonical is targeting Hadoop and NoSQL – used by hyperscale providers like Yahoo! and Facebook – believing ordinary businesses are now ready to start use them for data processing and analytics.

Having in mind that both Hadoop and Cassandra are meant to be used in distributed systems, I’m wondering what exactly will Canonical offer by including these in Ubuntu? (note the secret sauce may be Puppet).


Automating Hadoop/HBase deployments with Puppet

The guys from the Adobe SaaS team — same guys that shared with us their experience and reasons for using HBase — have ☞ open sourced their Puppet[1] recipes for automating Hadoop/HBase deployments.

Right now we are open-sourcing on GitHub, Puppet recipes for:

  • creating the user under which the entire hstack runs.
  • changing system settings, like the ssh keys, authorizing machines to talk to each other, aliases for hadoop and hbase executables, /tmp rules.
  • standalone puppet module to deploy Hadoop
  • standalone puppet module to configure the Hadoop NameNode in High-Availability mode via DRBD, heartbeat and mon. For more details on this recipe check out the cloudera blog post on this topic.
  • standalone puppet module to deploy HBase
  • standalone puppet module to deploy Zookeeper.

Their ☞ announcement gives a lot of details of why they created these recipes and how to use them (nb it would be excellent if the ☞ GitHub project would point back to the article as part of the documentation).

Just to get an idea of how complex this process can be you can check the HBase/Hadoop MacOS Installation Guide, so I’d say that these recipes will definitely make things a lot easier!


  • [1] ☞ Puppet: the leading open source tool for data center automation. Puppet helps you save time, gain visibility into your server environment, and ensure consistency across your IT infrastructure. ()