microsoft: All content tagged as microsoft in NoSQL databases and polyglot persistence
Update: I’d like to thank the people that pointed out in the comment thread that I’ve messed up quite a few aspects in my comments about the report. I don’t believe in taking down posts that have been out for a while, so please be warned that basically this article can be ignored.
Thank you and my apologies for those comments that were a misinterpretation of the report..
This is the Q1 2014 Forrester Wave for Hadoop:
A couple of thoughts:
Cloudera, Hortonworks, MapR are positioned very (very) close.
- Hortonworks is position closer to the top right meaning they report more customers/larger install base
MapR is higher on the vertical axis meaning that MapR’s strategy is slightly better.
For me, MapR’s strategy can be briefly summarized as:
- address some of the limitations in the Hadoop ecosystem
- provide API-compatible products for major components of the Hadoop ecosystem
- use these Apache product (trade marked) names to advertise their products
I think the 1st point above explains the better positioning of MapR’s current offering.
Even if Cloudera has been the first pure-play Hadoop distribution it’s positioned behind behind both Hortonworks and MapR.
IBM has the largest market presence. That’s a big surprise as I’m very rarely hearing clear messages from IBM.
IBM and Pivotal Software are considered to have the strongest strategy. That’s another interesting point in Forrester’s report. Except the fact that IBM has a ton of data products and that Pivotal Software is offering more than Hadoop, I don’t know what exactly explains this position.
The Forrester report Strategy positioning is based on quantifying the following categories: Licensing and pricing, Ability to execute, Product road map, Customer support. IBM and Pivotal are ranked the first in all these categories (with maximum marks for the last 3). As a comparison Hortonworks has 3/5 for Ability to execute — this must be related only to budget; Cloudera has 3/5 for both Ability to execute and Customer support.
Pivotal is the 3rd last in terms of current offering. I guess my hypothesis for ranking Pivotal as 1st in terms of strategy is wrong.
Microsoft who through the collaboration with Hortonworks came up with HDInsight, which basically enabled Hadoop for Excel and its data warehouse offering, it positioned the 2nd last on all 3 axes.
No one seems to love Microsoft anymore.
While not a pure Hadoop player, DataStax has been offering the DataStax Enterprise platform that includes support for analytics through Hadoop and search through Solr for at least 2 years. That’s actually way before anyone else from the group of companies in the Forrester’s report had anything similar1.
This report focuses only on “general-purpose Hadoop solutions based on a differentiated, commercial Hadoop distribution”.
You can download the report after registering on Hortonwork’s site: here.
DataStax is my employer. But what I wrote is a pure fact. ↩
Original title and link: The Forrester Wave for Hadoop market ( ©myNoSQL)
The recent announcement of the Microsoft SQL Server 2012 release emphasized the high availability features added to this version. Here is what I could find after some digging through the documentation:
AlwaysOn Failover Cluster Instances: As part of the SQL Server AlwaysOn offering, AlwaysOn Failover Cluster Instances leverages Windows Server Failover Clustering (WSFC) functionality to provide local high availability through redundancy at the server-instance level—a failover cluster instance (FCI). An FCI is a single instance of SQL Server that is installed across Windows Server Failover Clustering (WSFC) nodes and, possibly, across multiple subnets. On the network, an FCI appears to be an instance of SQL Server running on a single computer, but the FCI provides failover from one WSFC node to another if the current node becomes unavailable.
This is explained in more detail on AlwaysOn Failover Cluster Instances (SQL Server).
AlwaysOn Availability Groups: The AlwaysOn Availability Groups feature is a high-availability and disaster-recovery solution that provides an enterprise-level alternative to database mirroring. Introduced in SQL Server 2012, AlwaysOn Availability Groups maximizes the availability of a set of user databases for an enterprise. An availability group supports a failover environment for a discrete set of user databases, known as availability databases, that fail over together. An availability group supports a set of read-write primary databases and one to four sets of corresponding secondary databases. Optionally, secondary databases can be made available for read-only access and/or some backup operations.
More documentation about AlwaysOn Availability groups can be found here.
Database mirroring: This feature will be removed in a future version of Microsoft SQL Server.
Log shipping: SQL Server Log shipping allows you to automatically send transaction log backups from a primary database on a primary server instance to one or more secondary databases on separate secondary server instances.
This is the well-known master-slave setup. More details can be found here.
Also worth checking the availability of these feature per SQL Server 2012 editions:
Original title and link: Microsoft SQL Server 2012 High Availability Solutions ( ©myNoSQL)
I’m still not sure how many are planning to run a Hadoop cluster on top of Windows Server—I initially had doubts about Hadoop on Azure too, but looking at the bigger picture it starts to make sense—, but Microsoft vision of integrating Hadoop in its toolchain is quite sound. And the slidedeck embedded below offers a glimpse at Microsoft’s perspective on Big Data, data integration, and BI:
“Big data is here and Hadoop is center stage”
I know I’ve already said it, but I’m still very impressed Microsoft gets this right.
The Grand vision:
Project Isotope offerings:
- Bi-directional connectors between Hadoop and SQL and PWD — see Hadoop Interoperability in Microsoft SQL Server and Parallel Data Warehouse
- ODBC driver for Hadoop
- Hosted elastic Hadoop service on Azure (nb: think Amazon Elastic MapReduce by Microsoft)
- Hive plug-in for Excel
Even if my first post about the Micosoft research graph database Trinity is back from March last year, I haven’t heard much about it since. Based on my tip, Klint Finley published an interesting speculation about Trinity, Dryad, Probase, and Bing. Since then though, Microsoft moved away from using Dryad to Hadoop and I’m still not sure about the status of the Trinity project. But I have found a paper about the Trinity graph engine authored by Bin Shao, Haixun Wang, Yatao Li. You can read it or download it after the break.
We introduce Trinity, a memory-based distributed database and computation platform that supports online query processing and offline analytics on graphs. Trinity leverages graph access patterns in online and offline computation to optimize the use of main memory and communication in order to deliver the best performance. With Trinity, we can perform efficient graph analytics on web-scale, billion-node graphs using dozens of commodity machines, while existing platforms such as MapReduce and Pregel require hundreds of machines. In this paper, we analyze several typical and important graph applications, including search in a so- cial network, calculating Pagerank on a web graph, and sub-graph matching on web-scale graphs without using index, to demonstrate the strength of Trinity.