NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



azure: All content tagged as azure in NoSQL databases and polyglot persistence

Optimizing Joins running on HDInsight Hive on Azure

Two notable things in Denny Lee’s post about optimizing some of the Hive joins used by Microsoft’s Online Services Division:

  1. Microsoft is drinking their own HDInsight on Azure champaign. This will take HDInsight product far as they’ll always have first hand feedback about parts of the system that need improvement.
  2. Know the different types of JOINs supported by Hive and don’t be afraid of experimenting.

✚ An extra point for the link to Liyin Tang and Namit Jain’s Join strategies in Hive (PDF)

Original title and link: Optimizing Joins running on HDInsight Hive on Azure (NoSQL database©myNoSQL)


Microsoft Azure Sales Top $1 Billion Challenging Amazon

Last week I’ve seen some Amazon Web Service’s revenue guestimates. Bloomberg posted an article about Microsoft Azure and related programs (?) revenue: $1 billion.

Interesting numbers:

  • market share: Amazon Web Services 71%, Microsoft Azure 20%
  • Azure grew 48% in the last 6 months
  • Gartner estimates the infrastructure segment of the cloud market at $6.17 billions in 2012 and growing to $30.6 billions in 2017
  • Gartner estimates total cloud market at $108.9 billions in 2012 and growing to $237.2 billions in 2017. (nb: I find this one weird as it includes online advertising and other less-cloudy-services-imo).

Amazon hasn’t given many details about the AWS platform, except 3 numbers:

  1. number of objects stored in S3. This has been doubling every year for the last 4 years
    1. Q4 2012: 1.3trillions
    2. Q3 2011: 566b
    3. Q4 2010: 262b
    4. Q4 2009: 102b
    5. Q4 2008: 40b
    6. Q4 2007: 14b
    7. Q4 2006: 2.9b
  2. number of requests per second AWS
  3. number of EMR clusters (?) spun

According to some slides from last October/November:

  1. S3 stored over 1.3 trillion objects
  2. AWS handles over 830k requests/s
  3. 3.7mil EMR clusters spun since 2010

While I don’t have any data about RDS and Dynamo, it would be great if Microsoft would release any details about Azure.

✚ If AWS has a market share of 71% and Azure 20%, that leaves Google plus others with 9%. Makes me wonder how accurate this data is.

Original title and link: Microsoft Azure Sales Top $1 Billion Challenging Amazon (NoSQL database©myNoSQL)


Halo 4: A Success Case Study of HDInsight, Microsoft's Hadoop on Azure

Besides a bit too many businessy words, this is a nice story of using HDInsight, the Hadoop solution for Windows developed by Microsoft and Hortonworks:

Behind the scenes, a powerful new Microsoft technology platform called HDInsight was capturing data from the cloud and feeding daily game statistics to the tournament’s operator, Virgin Gaming. Virgin not only used the data to update online leaderboards each day; it also relied on the data to detect cheaters, removing them from the boards to ensure that the right gamers got the chance to win.

But this new technology didn’t just support the Infinity Challenge. From day one, the Xbox 360 game has been using the Hadoop open source framework to gain deep insights into players. The Halo 4 development team at 343 Industries is taking these insights and updating the game almost weekly, using direct player feedback to tweak the game. In the process, the game’s multiplayer ecosystem continues to evolve with the community as the title matures in the marketplace.

Original title and link: Halo 4: A Success Case Study of HDInsight, Microsoft’s Hadoop on Azure (NoSQL database©myNoSQL)


Microsoft Hadoop Grand Vision: Apache Hadoop for Windows Server and Windows Azure

I’m still not sure how many are planning to run a Hadoop cluster on top of Windows Server—I initially had doubts about Hadoop on Azure too, but looking at the bigger picture it starts to make sense—, but Microsoft vision of integrating Hadoop in its toolchain is quite sound. And the slidedeck embedded below offers a glimpse at Microsoft’s perspective on Big Data, data integration, and BI:

Microsoft Hadoop Grand Vision

Beginners' Guide to MongoDB With Node.js on Windows Azure

A very detailed guide to getting started with MongoDB and Node.js on Windows Azure:

  • Add MongoDB support to an existing Windows Azure service that was created using the Windows Azure SDK for Node.js.
  • Use npm to install the MongoDB driver for Node.js.
  • Use MongoDB within a Node.js application.
  • Run your MongoDB Node.js application locally using the Windows Azure compute emulator.
  • Publish your MongoDB Node.js application to Windows Azure.

Aren’t you getting the feeling sometimes that these Microsoft tutorials are way too detailed? They make me feel like the intended reader is some kid first seeing code. Or is this how things are in the MS world?

Original title and link: Beginners’ Guide to MongoDB With Node.js on Windows Azure (NoSQL database©myNoSQL)


Using MongoDB Replica Sets With Node.js on Microsoft Azure: NoSQL Tutorials

Mariano Vazquez explains how to configure MongoDB replica sets on Microsoft Azure and how that works:

  • MongoDB will run the native binaries on a worker role and will store the data in Windows Azure storage using Windows Azure Drive (basically a hard disk mounted on Azure Page blobs)
  • The good thing about using Azure Storage is that the data is georeplicated. It will also make backup easier because of the snapshot feature of blob storage (which is not a copy but a diff).
  • It will use the local hard disk in the VM (local resources in the Azure jargon) to store the log files and a local cache.
  • You can scale out to multiple Mongo Replica Sets by increasing the instance count of the MongoDB role

Original title and link: Using MongoDB Replica Sets With Node.js on Microsoft Azure: NoSQL Tutorials (NoSQL database©myNoSQL)


Hadoop on Windows Azure: Visualizing Data

The setup includes a web-based interactive JavaScript console, which lets you put data into HDFS, launch MapReduce jobs, and also visualize results with HTML5 charts - and it’s very easy to use.

The JavaScript console and the visualization support are very nice additions on top of the managed Hadoop on Azure.

Feature checklists are still important, but technology adoption depends more and more on the user experience. Think of getting up to speed as being the first impression someone gets of a new technology.

I have a couple of ideas of what would be next in terms of facilitating the adoption of NoSQL technologies. But I’d really like to hear your opinions first.

Original title and link: Hadoop on Windows Azure: Visualizing Data (NoSQL database©myNoSQL)


SQL Azure Federation... Aka Sharding

One of the exciting new features in the just-released SQL Azure Q4 2011 Service Release is SQL Azure Federation. In a sentence, SQL Azure Federation enables building elastic and scalable database tiers.

We all know the benefits of sharding so why calling it differently? NIH?

Original title and link: SQL Azure Federation… Aka Sharding (NoSQL database©myNoSQL)


CloudSpokes: From Microsoft Azure to

CloudSpokes, an Appirio-led community rearchitected their solution from Windows Azure to Salesforce’s

Initially, Messinger said, his team was really happy with Windows Azure’s table storage and blob storage features, but trouble arose when it came to deploying computing resources called “Web Roles.” […]

Additionally, said Messinger, Windows Azure required some level of database-administration know-how, which is something the CloudSpokes didn’t really want to deal with. It wanted to focus on the front end and other business-critical aspects rather than on DBA work. So it looked to, and Messinger and Singh haven’t looked back since beginning the transition in mid-July.

This is the first time I’m reading a scenario where DaaS (database as a service) is explicitely mentioned as the main reason for migrating the architecture of an application.

Original title and link: CloudSpokes: From Microsoft Azure to (NoSQL database©myNoSQL)


Hadoop in Microsoft Azure

I don’t know how many are going to deploy Hadoop on Microsoft Azure, but at least we know it is possible:

Is it possible to deploy a Hadoop cluster in Azure? It sure is and setting one up is not difficult, here’s how you do it.


The Azure deployment is set to use 1 large VM for the Name Node, 1 large VM for the Job Tracker and 4 Extra Large nodes as Slaves. If you are ok with that configuration skip to the next step.

Original title and link: Hadoop in Microsoft Azure (NoSQL databases © myNoSQL)


Paper: NoSQL and the Windows Azure Platform

A paper by Andrew J.Brust. Abstract:

An introduction to NoSQL database technology, and its major subcategories, for those new to the subject; an examination of NoSQL technologies available in the cloud using Windows Azure and SQL Azure; and a critical discussion of the NoSQL and relational database approaches, including the suitability of each to line-of-business application development.

When analyzing NoSQL options available on the Azure platform, Andrew is listing:

  • Azure Table Storage
  • SQL Azure XML Columns
  • SQL Azure Federation — check also The NoSQL gene in SQL Azure Federations
  • OData (?)
  • running NoSQL databases as Azure Worker Roles, VM roles, and Azure Drive

The paper concludes:

We saw how NoSQL databases are suitable for data management that is light-duty but large-scale, and how they work well for content management requirements of many stripes. We also saw, again and again, that relational databases are best for line-of-business applications. The database consistency, query optimization and set-based declarative query capability that relational databases have provided for decades is still required by most LOB applications; this has not changed.

Original title and link: Paper: NoSQL and the Windows Azure Platform (NoSQL databases © myNoSQL)

via: Azure No SQL White Paper.pdf

Neo4j on Windows Azure

It started as an embedded database. Then it became a server. Now it is available on Microsoft Azure:

Neo4j has a ‘j’ appended to the name. And now it is available on Windows Azure? This proves that in the most unlikely of circumstances sometimes beautiful things can emerge.

Until now it was only MongoDB, sones GraphDB and RavenDB that could run in the Microsoft cloud.

Original title and link: Neo4j on Windows Azure (NoSQL databases © myNoSQL)