Microsoft: All content tagged as Microsoft in NoSQL databases and polyglot persistence
Friday, 24 May 2013
Optimizing Joins running on HDInsight Hive on Azure
Two notable things in Denny Lee’s post about optimizing some of the Hive joins used by Microsoft’s Online Services Division:
- Microsoft is drinking their own HDInsight on Azure champaign. This will take HDInsight product far as they’ll always have first hand feedback about parts of the system that need improvement.
- Know the different types of JOINs supported by Hive and don’t be afraid of experimenting.
✚ An extra point for the link to Liyin Tang and Namit Jain’s Join strategies in Hive (PDF)
Original title and link: Optimizing Joins running on HDInsight Hive on Azure (©myNoSQL)
via: http://dennyglee.com/2013/04/26/optimizing-joins-running-on-hdinsight-hive-on-azure-at-gfs/
Thursday, 2 May 2013
Microsoft Azure Sales Top $1 Billion Challenging Amazon
Last week I’ve seen some Amazon Web Service’s revenue guestimates. Bloomberg posted an article about Microsoft Azure and related programs (?) revenue: $1 billion.
Interesting numbers:
- market share: Amazon Web Services 71%, Microsoft Azure 20%
- Azure grew 48% in the last 6 months
- Gartner estimates the infrastructure segment of the cloud market at $6.17 billions in 2012 and growing to $30.6 billions in 2017
- Gartner estimates total cloud market at $108.9 billions in 2012 and growing to $237.2 billions in 2017. (nb: I find this one weird as it includes online advertising and other less-cloudy-services-imo).
Amazon hasn’t given many details about the AWS platform, except 3 numbers:
- number of objects stored in S3. This has been doubling every year for the last 4 years
- Q4 2012: 1.3trillions
- Q3 2011: 566b
- Q4 2010: 262b
- Q4 2009: 102b
- Q4 2008: 40b
- Q4 2007: 14b
- Q4 2006: 2.9b
- number of requests per second AWS
- number of EMR clusters (?) spun
According to some slides from last October/November:
- S3 stored over 1.3 trillion objects
- AWS handles over 830k requests/s
- 3.7mil EMR clusters spun since 2010
While I don’t have any data about RDS and Dynamo, it would be great if Microsoft would release any details about Azure.
✚ If AWS has a market share of 71% and Azure 20%, that leaves Google plus others with 9%. Makes me wonder how accurate this data is.
Original title and link: Microsoft Azure Sales Top $1 Billion Challenging Amazon (©myNoSQL)
Sunday, 14 April 2013
SQL Server's Future
Brent Ozar about the state and future of the things in the SQL Server space:
In SQL Server 2012 and beyond, we’ve got:
- AlwaysOn Availability Groups – high availability, disaster recovery, and scale-out reads
- Hekaton - in-memory storage with optimized stored procedures and new data formats on disk
- Column store indexes – faster data retrieval for certain kinds of queries
Call me maybe crazy, but I don’t see really widespread adoption for any of these.
Leaving crazyness aside, I’m wondering if these features are not of interest for SQL Server users then what is would SQL Server users want to see?
✚ Hekaton is something new for me to read about.
✚ Here’s something interesting about Hekaton:
By late fall 2009, Larson and his colleagues had come up with a design and a simple prototype for an in-memory database engine that showed huge performance improvements. They had moved away from a partitioned approach, which essentially treated a multicore processor as a distributed system, to a latch-free, also called lock-free, design that focused on removing the barriers to scalability present in current systems.
✚ There’s a paper about the MVCC implementation in Hekaton: High-Performance Concurrency Contorl Mechanisms for Main-Memory Databases.
Original title and link: SQL Server’s Future (©myNoSQL)
via: http://www.brentozar.com/archive/2013/03/databases-five-years-from-today/
Thursday, 4 April 2013
Halo 4: A Success Case Study of HDInsight, Microsoft's Hadoop on Azure
Besides a bit too many businessy words, this is a nice story of using HDInsight, the Hadoop solution for Windows developed by Microsoft and Hortonworks:
Behind the scenes, a powerful new Microsoft technology platform called HDInsight was capturing data from the cloud and feeding daily game statistics to the tournament’s operator, Virgin Gaming. Virgin not only used the data to update online leaderboards each day; it also relied on the data to detect cheaters, removing them from the boards to ensure that the right gamers got the chance to win.
But this new technology didn’t just support the Infinity Challenge. From day one, the Xbox 360 game has been using the Hadoop open source framework to gain deep insights into players. The Halo 4 development team at 343 Industries is taking these insights and updating the game almost weekly, using direct player feedback to tweak the game. In the process, the game’s multiplayer ecosystem continues to evolve with the community as the title matures in the marketplace.
Original title and link: Halo 4: A Success Case Study of HDInsight, Microsoft’s Hadoop on Azure (©myNoSQL)
Wednesday, 9 January 2013
What Is Microsoft HDInsight?
Karan Gulati:
HDInsight is Microsoft’s Hadoop-based distribution.
There’s a version for on-premise Microsoft stacks and one available on Azure Service.
Original title and link: What Is Microsoft HDInsight? (©myNoSQL)
via: http://blogs.msdn.com/b/karang/archive/2013/01/04/hdinsight_2d00_what_2d00_is_2d00_it.aspx
Tuesday, 24 April 2012
Microsoft SQL Server 2012 High Availability Solutions
The recent announcement of the Microsoft SQL Server 2012 release emphasized the high availability features added to this version. Here is what I could find after some digging through the documentation:
-
AlwaysOn Failover Cluster Instances: As part of the SQL Server AlwaysOn offering, AlwaysOn Failover Cluster Instances leverages Windows Server Failover Clustering (WSFC) functionality to provide local high availability through redundancy at the server-instance level—a failover cluster instance (FCI). An FCI is a single instance of SQL Server that is installed across Windows Server Failover Clustering (WSFC) nodes and, possibly, across multiple subnets. On the network, an FCI appears to be an instance of SQL Server running on a single computer, but the FCI provides failover from one WSFC node to another if the current node becomes unavailable.
This is explained in more detail on AlwaysOn Failover Cluster Instances (SQL Server).
-
AlwaysOn Availability Groups: The AlwaysOn Availability Groups feature is a high-availability and disaster-recovery solution that provides an enterprise-level alternative to database mirroring. Introduced in SQL Server 2012, AlwaysOn Availability Groups maximizes the availability of a set of user databases for an enterprise. An availability group supports a failover environment for a discrete set of user databases, known as availability databases, that fail over together. An availability group supports a set of read-write primary databases and one to four sets of corresponding secondary databases. Optionally, secondary databases can be made available for read-only access and/or some backup operations.
More documentation about AlwaysOn Availability groups can be found here.
-
Database mirroring: This feature will be removed in a future version of Microsoft SQL Server.
-
Log shipping: SQL Server Log shipping allows you to automatically send transaction log backups from a primary database on a primary server instance to one or more secondary databases on separate secondary server instances.
This is the well-known master-slave setup. More details can be found here.
Also worth checking the availability of these feature per SQL Server 2012 editions:

Original title and link: Microsoft SQL Server 2012 High Availability Solutions (©myNoSQL)
Wednesday, 28 March 2012
Microsoft Hadoop Grand Vision: Apache Hadoop for Windows Server and Windows Azure
I’m still not sure how many are planning to run a Hadoop cluster on top of Windows Server—I initially had doubts about Hadoop on Azure too, but looking at the bigger picture it starts to make sense—, but Microsoft vision of integrating Hadoop in its toolchain is quite sound. And the slidedeck embedded below offers a glimpse at Microsoft’s perspective on Big Data, data integration, and BI:
-
“Big data is here and Hadoop is center stage”
I know I’ve already said it, but I’m still very impressed Microsoft gets this right.
-
The Grand vision:

-
Project Isotope offerings:
- Bi-directional connectors between Hadoop and SQL and PWD — see Hadoop Interoperability in Microsoft SQL Server and Parallel Data Warehouse
- ODBC driver for Hadoop
- Hosted elastic Hadoop service on Azure (nb: think Amazon Elastic MapReduce by Microsoft)
- Hive plug-in for Excel
- JavaScript support for Hadoop, with web-based interactive environment
Wednesday, 14 March 2012
NoSQL Paper: The Trinity Graph Engine
Even if my first post about the Micosoft research graph database Trinity is back from March last year, I haven’t heard much about it since. Based on my tip, Klint Finley published an interesting speculation about Trinity, Dryad, Probase, and Bing. Since then though, Microsoft moved away from using Dryad to Hadoop and I’m still not sure about the status of the Trinity project. But I have found a paper about the Trinity graph engine authored by Bin Shao, Haixun Wang, Yatao Li. You can read it or download it after the break.
We introduce Trinity, a memory-based distributed database and computation platform that supports online query processing and offline analytics on graphs. Trinity leverages graph access patterns in online and offline computation to optimize the use of main memory and communication in order to deliver the best performance. With Trinity, we can perform efficient graph analytics on web-scale, billion-node graphs using dozens of commodity machines, while existing platforms such as MapReduce and Pregel require hundreds of machines. In this paper, we analyze several typical and important graph applications, including search in a so- cial network, calculating Pagerank on a web graph, and sub-graph matching on web-scale graphs without using index, to demonstrate the strength of Trinity.
Friday, 9 March 2012
Beginners' Guide to MongoDB With Node.js on Windows Azure
A very detailed guide to getting started with MongoDB and Node.js on Windows Azure:
- Add MongoDB support to an existing Windows Azure service that was created using the Windows Azure SDK for Node.js.
- Use npm to install the MongoDB driver for Node.js.
- Use MongoDB within a Node.js application.
- Run your MongoDB Node.js application locally using the Windows Azure compute emulator.
- Publish your MongoDB Node.js application to Windows Azure.
Aren’t you getting the feeling sometimes that these Microsoft tutorials are way too detailed? They make me feel like the intended reader is some kid first seeing code. Or is this how things are in the MS world?
Original title and link: Beginners’ Guide to MongoDB With Node.js on Windows Azure (©myNoSQL)
via: http://www.windowsazure.com/en-us/develop/nodejs/tutorials/web-app-with-mongodb/
Wednesday, 7 March 2012
JavaScript Console and Excel Coming to Hadoop
Eric Baldeschwieler about the Hortonworks and Microsoft partnership for bringing Apache Hadoop to Windows:
What makes this announcement significant is that Microsoft is opening up Apache Hadoop to literally millions of new users. There are millions of JavaScript developers that can now leverage the power of Apache Hadoop. There are many more millions of Excel and PowerPivot users that can also now derive value from Apache Hadoop using software is that already very familiar to them. Simply put, these contributions by Microsoft will extend Apache Hadoop to the most prolific data analysis tools in the world.
Me, back in January, after taking a look at Hadoop on Windows Azure:
The JavaScript console and the visualization support are very nice additions on top of the managed Hadoop on Azure.
Feature checklists are still important, but technology adoption depends more and more on the user experience. Think of getting up to speed as being the first impression someone gets of a new technology.
Think of integration with familiar tools and frameworks as a huge adoption accelerator.
Original title and link: JavaScript Console and Excel Coming to Hadoop (©myNoSQL)
via: http://hortonworks.com/blog/extending-apache-hadoop-to-millions-of-new-microsoft-users/
Monday, 27 February 2012
Cache Warm-Up: Redis vs Memcached vs Microsoft AppFabric
The traffic of our football news syndicating website (Kick News) has been steadily growing a lot since it launched. When we redeveloped it a couple of years ago, we used an in-process cache, by creating an IQueryable extension method that uses an md5 hash of the underlying SQL query as the key. This worked reasonably well, but has it’s obvious problems, such as the caches needing to be refilled when the app pool recycles or when the server is restarted. On our busy site, this means we had to wait until the caches are full before we serve any requests or it would overload our database server, which is unacceptable. Before the site gets any busier we’re going to move to an out-of-process cache and the are 3 main options we’ve considered are Redis, Memcached and Windows Server AppFabric
From these 3 solutions, only Redis will help address the cache warm-up issue.
Original title and link: Cache Warm-Up: Redis vs Memcached vs Microsoft AppFabric (©myNoSQL)
via: http://www.ichi.co.uk/post/18280190946/microsoft-appfabric-vs-redis-windows-port
Friday, 17 February 2012
Hadoop Interoperability in Microsoft SQL Server and Parallel Data Warehouse
In the data deluge faced by businesses, there is also an increasing need to store and analyze vast amounts of unstructured data including data from sensors, devices, bots and crawlers. By many accounts, almost 80% of what businesses store is unstructured data — and this volume is predicted to grow exponentially over the next decade. We have entered the age of Big Data. Our customers have been asking us to help store, manage, and analyze both structured and unstructured data — in particular, data stored in Hadoop environments. As a first step, we will soon release a Community Technology Preview (CTP) of two new Hadoop connectors — one for SQL Server and one for PDW. The connectors provide interoperability between SQL Server/PDW and Hadoop environments, enabling customers to transfer data between Hadoop and SQL Server/PDW. With these connectors, customers can more easily integrate Hadoop with their Microsoft Enterprise Data Warehouses and Business Intelligence solutions to gain deeper business insights from both structured and unstructured data.
The time of data silos is long gone and the little giant is making the right moves.
Original title and link: Hadoop Interoperability in Microsoft SQL Server and Parallel Data Warehouse (©myNoSQL)
Most Popular Articles
- Translate SQL to MongoDB MapReduce
- Tutorial: Getting Started With Cassandra
- CouchDB vs MongoDB: An attempt for a More Informed Comparison
- Cassandra @ Twitter: An Interview with Ryan King
- A Couple of Nice GUI Tools for MongoDB
- NoSQL benchmarks and performance evaluations
- Ehcache: Distributed Cache or NoSQL Store?
- Document Databases Compared: CouchDB, MongoDB, RavenDB
- Quick Review of Existing Graph Databases
- NoSQL Data Modeling