ALL COVERED TOPICS

NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter

NAVIGATE MAIN CATEGORIES

Close

microsoft: All content tagged as microsoft in NoSQL databases and polyglot persistence

The Forrester Wave for Hadoop market

Update: I’d like to thank the people that pointed out in the comment thread that I’ve messed up quite a few aspects in my comments about the report. I don’t believe in taking down posts that have been out for a while, so please be warned that basically this article can be ignored.

Thank you and my apologies for those comments that were a misinterpretation of the report..


This is the Q1 2014 Forrester Wave for Hadoop:

Forrester wave for Hadoop

A couple of thoughts:

  1. Cloudera, Hortonworks, MapR are positioned very (very) close.

    1. Hortonworks is position closer to the top right meaning they report more customers/larger install base
    2. MapR is higher on the vertical axis meaning that MapR’s strategy is slightly better.

      For me, MapR’s strategy can be briefly summarized as:

      1. address some of the limitations in the Hadoop ecosystem
      2. provide API-compatible products for major components of the Hadoop ecosystem
      3. use these Apache product (trade marked) names to advertise their products

      I think the 1st point above explains the better positioning of MapR’s current offering.

    3. Even if Cloudera has been the first pure-play Hadoop distribution it’s positioned behind behind both Hortonworks and MapR.

  2. IBM has the largest market presence. That’s a big surprise as I’m very rarely hearing clear messages from IBM.

  3. IBM and Pivotal Software are considered to have the strongest strategy. That’s another interesting point in Forrester’s report. Except the fact that IBM has a ton of data products and that Pivotal Software is offering more than Hadoop, I don’t know what exactly explains this position.

    The Forrester report Strategy positioning is based on quantifying the following categories: Licensing and pricing, Ability to execute, Product road map, Customer support. IBM and Pivotal are ranked the first in all these categories (with maximum marks for the last 3). As a comparison Hortonworks has 3/5 for Ability to execute — this must be related only to budget; Cloudera has 3/5 for both Ability to execute and Customer support.

    Pivotal is the 3rd last in terms of current offering. I guess my hypothesis for ranking Pivotal as 1st in terms of strategy is wrong.

  4. Microsoft who through the collaboration with Hortonworks came up with HDInsight, which basically enabled Hadoop for Excel and its data warehouse offering, it positioned the 2nd last on all 3 axes.

    No one seems to love Microsoft anymore.

  5. While not a pure Hadoop player, DataStax has been offering the DataStax Enterprise platform that includes support for analytics through Hadoop and search through Solr for at least 2 years. That’s actually way before anyone else from the group of companies in the Forrester’s report had anything similar1.

    This report focuses only on “general-purpose Hadoop solutions based on a differentiated, commercial Hadoop distribution”.

You can download the report after registering on Hortonwork’s site: here.


  1. DataStax is my employer. But what I wrote is a pure fact. 

Original title and link: The Forrester Wave for Hadoop market (NoSQL database©myNoSQL)


Optimizing Joins running on HDInsight Hive on Azure

Two notable things in Denny Lee’s post about optimizing some of the Hive joins used by Microsoft’s Online Services Division:

  1. Microsoft is drinking their own HDInsight on Azure champaign. This will take HDInsight product far as they’ll always have first hand feedback about parts of the system that need improvement.
  2. Know the different types of JOINs supported by Hive and don’t be afraid of experimenting.

✚ An extra point for the link to Liyin Tang and Namit Jain’s Join strategies in Hive (PDF)

Original title and link: Optimizing Joins running on HDInsight Hive on Azure (NoSQL database©myNoSQL)

via: http://dennyglee.com/2013/04/26/optimizing-joins-running-on-hdinsight-hive-on-azure-at-gfs/


Microsoft Azure Sales Top $1 Billion Challenging Amazon

Last week I’ve seen some Amazon Web Service’s revenue guestimates. Bloomberg posted an article about Microsoft Azure and related programs (?) revenue: $1 billion.

Interesting numbers:

  • market share: Amazon Web Services 71%, Microsoft Azure 20%
  • Azure grew 48% in the last 6 months
  • Gartner estimates the infrastructure segment of the cloud market at $6.17 billions in 2012 and growing to $30.6 billions in 2017
  • Gartner estimates total cloud market at $108.9 billions in 2012 and growing to $237.2 billions in 2017. (nb: I find this one weird as it includes online advertising and other less-cloudy-services-imo).

Amazon hasn’t given many details about the AWS platform, except 3 numbers:

  1. number of objects stored in S3. This has been doubling every year for the last 4 years
    1. Q4 2012: 1.3trillions
    2. Q3 2011: 566b
    3. Q4 2010: 262b
    4. Q4 2009: 102b
    5. Q4 2008: 40b
    6. Q4 2007: 14b
    7. Q4 2006: 2.9b
  2. number of requests per second AWS
  3. number of EMR clusters (?) spun

According to some slides from last October/November:

  1. S3 stored over 1.3 trillion objects
  2. AWS handles over 830k requests/s
  3. 3.7mil EMR clusters spun since 2010

While I don’t have any data about RDS and Dynamo, it would be great if Microsoft would release any details about Azure.

✚ If AWS has a market share of 71% and Azure 20%, that leaves Google plus others with 9%. Makes me wonder how accurate this data is.

Original title and link: Microsoft Azure Sales Top $1 Billion Challenging Amazon (NoSQL database©myNoSQL)

via: http://www.bloomberg.com/news/2013-04-29/microsoft-azure-sales-top-1-billion-challenging-amazon.html


SQL Server's Future

Brent Ozar about the state and future of the things in the SQL Server space:

In SQL Server 2012 and beyond, we’ve got:

  • AlwaysOn Availability Groups – high availability, disaster recovery, and scale-out reads
  • Hekaton - in-memory storage with optimized stored procedures and new data formats on disk
  • Column store indexes – faster data retrieval for certain kinds of queries

Call me maybe crazy, but I don’t see really widespread adoption for any of these.

Leaving crazyness aside, I’m wondering if these features are not of interest for SQL Server users then what is would SQL Server users want to see?

Hekaton is something new for me to read about.

✚ Here’s something interesting about Hekaton:

By late fall 2009, Larson and his colleagues had come up with a design and a simple prototype for an in-memory database engine that showed huge performance improvements. They had moved away from a partitioned approach, which essentially treated a multicore processor as a distributed system, to a latch-free, also called lock-free, design that focused on removing the barriers to scalability present in current systems.

✚ There’s a paper about the MVCC implementation in Hekaton: High-Performance Concurrency Contorl Mechanisms for Main-Memory Databases.

Original title and link: SQL Server’s Future (NoSQL database©myNoSQL)

via: http://www.brentozar.com/archive/2013/03/databases-five-years-from-today/


Halo 4: A Success Case Study of HDInsight, Microsoft's Hadoop on Azure

Besides a bit too many businessy words, this is a nice story of using HDInsight, the Hadoop solution for Windows developed by Microsoft and Hortonworks:

Behind the scenes, a powerful new Microsoft technology platform called HDInsight was capturing data from the cloud and feeding daily game statistics to the tournament’s operator, Virgin Gaming. Virgin not only used the data to update online leaderboards each day; it also relied on the data to detect cheaters, removing them from the boards to ensure that the right gamers got the chance to win.

But this new technology didn’t just support the Infinity Challenge. From day one, the Xbox 360 game has been using the Hadoop open source framework to gain deep insights into players. The Halo 4 development team at 343 Industries is taking these insights and updating the game almost weekly, using direct player feedback to tweak the game. In the process, the game’s multiplayer ecosystem continues to evolve with the community as the title matures in the marketplace.

Original title and link: Halo 4: A Success Case Study of HDInsight, Microsoft’s Hadoop on Azure (NoSQL database©myNoSQL)

via: http://www.microsoft.com/enterprise/it-trends/big-data/articles/Changing-the-Game-Halo-4-Team-Gets-New-User-Insights-from-Big-Data-in-the-Cloud.aspx


What Is Microsoft HDInsight?

Karan Gulati:

HDInsight is Microsoft’s Hadoop-based distribution.

There’s a version for on-premise Microsoft stacks and one available on Azure Service.

Original title and link: What Is Microsoft HDInsight? (NoSQL database©myNoSQL)

via: http://blogs.msdn.com/b/karang/archive/2013/01/04/hdinsight_2d00_what_2d00_is_2d00_it.aspx


Microsoft SQL Server 2012 High Availability Solutions

The recent announcement of the Microsoft SQL Server 2012 release emphasized the high availability features added to this version. Here is what I could find after some digging through the documentation:

  • AlwaysOn Failover Cluster Instances: As part of the SQL Server AlwaysOn offering, AlwaysOn Failover Cluster Instances leverages Windows Server Failover Clustering (WSFC) functionality to provide local high availability through redundancy at the server-instance level—a failover cluster instance (FCI). An FCI is a single instance of SQL Server that is installed across Windows Server Failover Clustering (WSFC) nodes and, possibly, across multiple subnets. On the network, an FCI appears to be an instance of SQL Server running on a single computer, but the FCI provides failover from one WSFC node to another if the current node becomes unavailable.

    This is explained in more detail on AlwaysOn Failover Cluster Instances (SQL Server).

  • AlwaysOn Availability Groups: The AlwaysOn Availability Groups feature is a high-availability and disaster-recovery solution that provides an enterprise-level alternative to database mirroring. Introduced in SQL Server 2012, AlwaysOn Availability Groups maximizes the availability of a set of user databases for an enterprise. An availability group supports a failover environment for a discrete set of user databases, known as availability databases, that fail over together. An availability group supports a set of read-write primary databases and one to four sets of corresponding secondary databases. Optionally, secondary databases can be made available for read-only access and/or some backup operations.

    More documentation about AlwaysOn Availability groups can be found here.

  • Database mirroring: This feature will be removed in a future version of Microsoft SQL Server.

  • Log shipping: SQL Server Log shipping allows you to automatically send transaction log backups from a primary database on a primary server instance to one or more secondary databases on separate secondary server instances.

    This is the well-known master-slave setup. More details can be found here.

Also worth checking the availability of these feature per SQL Server 2012 editions:

SQL Server 2012 Hgih Availability

Original title and link: Microsoft SQL Server 2012 High Availability Solutions (NoSQL database©myNoSQL)


Microsoft Hadoop Grand Vision: Apache Hadoop for Windows Server and Windows Azure

I’m still not sure how many are planning to run a Hadoop cluster on top of Windows Server—I initially had doubts about Hadoop on Azure too, but looking at the bigger picture it starts to make sense—, but Microsoft vision of integrating Hadoop in its toolchain is quite sound. And the slidedeck embedded below offers a glimpse at Microsoft’s perspective on Big Data, data integration, and BI:

Microsoft Hadoop Grand Vision


NoSQL Paper: The Trinity Graph Engine

Even if my first post about the Micosoft research graph database Trinity is back from March last year, I haven’t heard much about it since. Based on my tip, Klint Finley published an interesting speculation about Trinity, Dryad, Probase, and Bing. Since then though, Microsoft moved away from using Dryad to Hadoop and I’m still not sure about the status of the Trinity project. But I have found a paper about the Trinity graph engine authored by Bin Shao, Haixun Wang, Yatao Li. You can read it or download it after the break.

We introduce Trinity, a memory-based distributed database and computation platform that supports online query processing and offline analytics on graphs. Trinity leverages graph access patterns in online and offline computation to optimize the use of main memory and communication in order to deliver the best performance. With Trinity, we can perform efficient graph analytics on web-scale, billion-node graphs using dozens of commodity machines, while existing platforms such as MapReduce and Pregel require hundreds of machines. In this paper, we analyze several typical and important graph applications, including search in a so- cial network, calculating Pagerank on a web graph, and sub-graph matching on web-scale graphs without using index, to demonstrate the strength of Trinity.


Beginners' Guide to MongoDB With Node.js on Windows Azure

A very detailed guide to getting started with MongoDB and Node.js on Windows Azure:

  • Add MongoDB support to an existing Windows Azure service that was created using the Windows Azure SDK for Node.js.
  • Use npm to install the MongoDB driver for Node.js.
  • Use MongoDB within a Node.js application.
  • Run your MongoDB Node.js application locally using the Windows Azure compute emulator.
  • Publish your MongoDB Node.js application to Windows Azure.

Aren’t you getting the feeling sometimes that these Microsoft tutorials are way too detailed? They make me feel like the intended reader is some kid first seeing code. Or is this how things are in the MS world?

Original title and link: Beginners’ Guide to MongoDB With Node.js on Windows Azure (NoSQL database©myNoSQL)

via: http://www.windowsazure.com/en-us/develop/nodejs/tutorials/web-app-with-mongodb/


JavaScript Console and Excel Coming to Hadoop

Eric Baldeschwieler about the Hortonworks and Microsoft partnership for bringing Apache Hadoop to Windows:

What makes this announcement significant is that Microsoft is opening up Apache Hadoop to literally millions of new users. There are millions of JavaScript developers that can now leverage the power of Apache Hadoop. There are many more millions of Excel and PowerPivot users that can also now derive value from Apache Hadoop using software is that already very familiar to them. Simply put, these contributions by Microsoft will extend Apache Hadoop to the most prolific data analysis tools in the world.

Me, back in January, after taking a look at Hadoop on Windows Azure:

The JavaScript console and the visualization support are very nice additions on top of the managed Hadoop on Azure.

Feature checklists are still important, but technology adoption depends more and more on the user experience. Think of getting up to speed as being the first impression someone gets of a new technology.

Think of integration with familiar tools and frameworks as a huge adoption accelerator.

Original title and link: JavaScript Console and Excel Coming to Hadoop (NoSQL database©myNoSQL)

via: http://hortonworks.com/blog/extending-apache-hadoop-to-millions-of-new-microsoft-users/


Cache Warm-Up: Redis vs Memcached vs Microsoft AppFabric

The traffic of our football news syndicating website (Kick News) has been steadily growing a lot since it launched. When we redeveloped it a couple of years ago, we used an in-process cache, by creating an IQueryable extension method that uses an md5 hash of the underlying SQL query as the key. This worked reasonably well, but has it’s obvious problems, such as the caches needing to be refilled when the app pool recycles or when the server is restarted. On our busy site, this means we had to wait until the caches are full before we serve any requests or it would overload our database server, which is unacceptable. Before the site gets any busier we’re going to move to an out-of-process cache and the are 3 main options we’ve considered are Redis, Memcached and Windows Server AppFabric 

From these 3 solutions, only Redis will help address the cache warm-up issue.

Original title and link: Cache Warm-Up: Redis vs Memcached vs Microsoft AppFabric (NoSQL database©myNoSQL)

via: http://www.ichi.co.uk/post/18280190946/microsoft-appfabric-vs-redis-windows-port