NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



Microsoft: All content tagged as Microsoft in NoSQL databases and polyglot persistence

Oracle and IBM May Not Know Big Data, but Neither Does Ballmer

The echo chamber is reacting:

Specifically, for a data processing and analytics project to qualify as Big Data, it must encompass not just internal corporate data, but also third-party data that resides outside the firewall, according to Ballmer. He said IBM and Oracle limit their Big Data approaches to internal data, thus they are not in fact Big Data by his definition.


IBM, Oracle and now Microsoft are jockeying to position each of their approaches to Big Data as the industry standard, and Ballmer is clearly trying to steer the Big Data conversation towards Microsoft’s strengths and away from its weaknesses. That means talking up Microsoft’s ability to integrate third-party data with relatively large volumes of corporate data inside Microsoft’s SQL Server R2 Parallel Data Warehouse and away from its lack of petabyte-scale data processing power.

I guess there will be no end to the Oracle-IBM-Microsoft triangle love, so I’ll stop here until real facts are added to the story.

Original title and link: Oracle and IBM May Not Know Big Data, but Neither Does Ballmer (NoSQL database©myNoSQL)


Steve Ballmer on Microsoft and Big Data

InformationWeek quoting Steve Ballmer:

Nobody plays in big data, really, except Microsoft and Google


I’ll use the word ‘data’ rather than ‘BI’ because that says I want to use all the world’s information… not just the information that we figured out how to capture inside our corporate system.


The explosion in the use of data is not always in a traditional BI-ish way, and that’s a big thing for us

But what is behind these words? The InformationWeek article builds around Steve Balmer’s vision that based on the knowledge gained creating and operating Bing and AdCenter, Microsoft is better prepared to handle BigData than EMC, HP, IBM, Oracle, and Teradata. While this can be true, there’s an important difference to draw here: creating and operating internal processes and tools is a different business than developing, selling, and supporting tools for customers. Microsoft has experience in both these fields, but their current products are not yet combining the know-how in these two areas. Meanwhile the companies mentioned above, plus quite a few startups, and open source projects are betting their future on this market alone by staying focused.

Original title and link: Steve Ballmer on Microsoft and Big Data (NoSQL database©myNoSQL)


Hadoop in Microsoft Azure

I don’t know how many are going to deploy Hadoop on Microsoft Azure, but at least we know it is possible:

Is it possible to deploy a Hadoop cluster in Azure? It sure is and setting one up is not difficult, here’s how you do it.


The Azure deployment is set to use 1 large VM for the Name Node, 1 large VM for the Job Tracker and 4 Extra Large nodes as Slaves. If you are ok with that configuration skip to the next step.

Original title and link: Hadoop in Microsoft Azure (NoSQL databases © myNoSQL)


LAMP, NoSQL databases, Open Source, and Microsoft

Relative to the LAMP stack, NoSQL databases and other open source technologies, Microsoft technology is sometimes viewed as stodgy, non-innovative and expensive. For some, the Microsoft .NET Framework, SQL Server, SharePoint and certainly Windows and Office are impressive and reliable, but not the things that Web breakthroughs are made of.

Keywords: impressive, reliable, Microsoft.

Original title and link: LAMP, NoSQL databases, Open Source, and Microsoft (NoSQL databases © myNoSQL)


Trinity, Dryad, Probase and Bing

Klint Finley (RWW) connecting the dots between Microsoft Research projects Trinity, Dryad, Probase, Bing and competition (Google, Facebook):

It’s not hard to connect the dots between Bing, Dryad, Probase and Trinity. Microsoft is building a set of tools to rival those used internally at Google and the open source tools used by companies like Facebook and Twitter. The interesting thing will be what Microsoft does with its data.

Original title and link: Trinity, Dryad, Probase and Bing (NoSQL databases © myNoSQL)


SQL Server and SQL Azure Comparison

SQL Azure provides relational database functionality as a utility service. Cloud-based database solutions such as SQL Azure can provide many benefits, including rapid provisioning, cost-effective scalability, high availability, and reduced management overhead.

If you are ready for the cloud — keep in mind this is not an easy question as proved by Netflix cloud migration and Reddit’s experience, going from on-premise SQL Server to SQL Azure doesn’t seem to involve drawbacks.

But what I’m really curious about is how SQL Azure compares to Amazon RDS.

Original title and link: SQL Server and SQL Azure Comparison (NoSQL databases © myNoSQL)


Trinity: A Graph Database from Microsoft Research

Trinity is a graph database and computation platform over distributed memory cloud. As a database, it provides features such as highly concurrent query processing, transaction, consistency control. As a computation platform, it provides synchronous and asynchronous batch-mode computations on large scale graphs. Trinity can be deployed on one machine or hundreds of machines.

The project page describing Trinity goals/features looks very interesting. But there’s not sign of the project status.

Trinity Architecture Graph Database

Original title and link: Trinity: A Graph Database from Microsoft Research (NoSQL databases © myNoSQL)


Comparing Dryad and Hadoop

Madhu Reddy[1] comparing the commercial and not yet released Dryad with the open source, widely used Hadoop:

  • While Hadoop has chosen to build these capabilities from scratch [management and administration of large clusters], Dryad has chosen to leverage the proven and tested cluster management capabilities already present in Windows HPC Server.
  • Hadoop […] has focused on performance and scale. Dryad, building on the performance and scale of Windows HPC Server, has in addition focused on making big data easier to use for mainstream application developers.
  • Dryad and DSC are based on the widely used and mature NTFS (New Technology File System), the file system that comes standard with Windows Server.
  • Hadoop uses the MapReduce computational model, which provides support for expressing the application logic in two simple steps — map and reduce. However, to develop more complex applications, developers will have to manually string together a sequence of MapReduce steps. DryadLINQ offers a higher-level computational model where complex sequence of MapReduce steps can be easily expressed in a query language similar to SQL.

A couple of aspects that were left out:

  1. licensing costs for Windows HPC Server, Microsoft Visual Studio, and the future Dryad
  2. Dryad commercial closed source model versus Hadoop open source model. (nb: example question: how soon could you get a bug fix or improvement?)
  3. Hadoop tools ecosystem
  4. Other Hadoop tools like Karmasphere studio — a graphical environment to develop, debug, deploy and monitor MapReduce jobs.

That’s not to say that Dryad and DryadLINQ are not interesting projects.

  1. Madhu Reddy is senior product manager for Technical Computing marketing at Microsoft  

Original title and link: Comparing Dryad and Hadoop (NoSQL databases © myNoSQL)


Is NoSQL known in the Microsoft world?

Kevin Kline (strategy manager for SQL Server at Quest Software) and Brent Ozar (SQL Server DBA expert at Quest Software):

Ozar: There are two common scenarios for why you would consider using something other than your typical relational database. One is data that is not worth very much money.


Kline: We’re going to look at other ways to look up data: key value stores, what’s the other one called?

Ozar: XML columnar storage, XML property bags.

Sounds pretty uninformed and makes me wonder what is known about NoSQL in the Microsoft world.