NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



microsoft: All content tagged as microsoft in NoSQL databases and polyglot persistence

Hadoop Interoperability in Microsoft SQL Server and Parallel Data Warehouse

In the data deluge faced by businesses, there is also an increasing need to store and analyze vast amounts of unstructured data including data from sensors, devices, bots and crawlers. By many accounts, almost 80% of what businesses store is unstructured data — and this volume is predicted to grow exponentially over the next decade.  We have entered the age of Big Data. Our customers have been asking us to help store, manage, and analyze both structured and unstructured data — in particular, data stored in Hadoop environments.  As a first step, we will soon release a Community Technology Preview (CTP) of two new Hadoop connectors — one for SQL Server and one for PDW.  The connectors provide interoperability between SQL Server/PDW and Hadoop environments, enabling customers to transfer data between Hadoop and SQL Server/PDW.  With these connectors, customers can more easily integrate Hadoop with their Microsoft Enterprise Data Warehouses and Business Intelligence solutions to gain deeper business insights from both structured and unstructured data.

The time of data silos is long gone and the little giant is making the right moves.

Patrick Durusau

Original title and link: Hadoop Interoperability in Microsoft SQL Server and Parallel Data Warehouse (NoSQL database©myNoSQL)


How Web giants store big data

An ArsTechnica, not very technical, overview of the storage engines developed and used by Google (Google File System, BigTable), Amazon (Dynamo), Microsoft (Azure DFS), plus the Hadoop Distributed File System (HDFS).

Original title and link: How Web giants store big data (NoSQL database©myNoSQL)


Research in the MapReduce Space

Over the weekend I’ve read two papers presenting products or research related to improving or adding new capabilities to the MapReduce data processing approach. The first of them comes from a team at Microsoft and is describing TiMR a time-oriented data processing system in MapReduce. The second, from a team at Google, presents Tenzin - a SQL implementation on the MapReduce framework. It’s great to learn that while the Hadoop community is eliminating some of the initial limitations and hardening the technical details of the platform, there are already ideas and systems out there that augment the capabilities of the MapReduce data processing model.

Original title and link: Research in the MapReduce Space (NoSQL database©myNoSQL)

Microsoft, Hadoop, and Open Source Contributions

Edd Dumbill:

Microsoft’s goals go beyond integrating Hadoop into Windows. It intends to contribute the adaptions it makes back to the Apache Hadoop project, so that anybody can run a purely open source Hadoop on Windows.

In the open source world contributions are measured in code or documentation or donations. Less so in interviews or PR announcements.

So far Microsoft doesn’t seem to know this game. But if its intentions are true, the community will help.

Original title and link: Microsoft, Hadoop, and Open Source Contributions (NoSQL database©myNoSQL)


12 Hadoop Vendors to Watch in 2012

My list of 8 most interesting companies for the future of Hadoop didn’t try to include anyone having a product with the Hadoop word in it. But the list from InformationWeek does. To save you 15 clicks, here’s their list:

  • Amazon Elastic MapReduce
  • Cloudera
  • Datameer
  • EMC (with EMC Greenplum Unified Analytics Platform and EMC Data Computing Appliance)
  • Hadapt
  • Hortonworks
  • IBM (InfoSphere BigInsights)
  • Informatica (for HParser)
  • Karmasphere
  • MapR
  • Microsoft
  • Oracle

Original title and link: 12 Hadoop Vendors to Watch in 2012 (NoSQL database©myNoSQL)

Partnerships in the Hadoop Market

Just a quick recap:

Amazon doesn’t partner with anyone for their Amazon Elastic Map Reduce. And IBM is walking alone with the software-only InfoSphere BigInsights.

Original title and link: Partnerships in the Hadoop Market (NoSQL database©myNoSQL)

Claim Chowder: Microsoft’s Dryad Technology to Take on Google’s MapReduce

In Dec.2010, Joab Jackson writes for IDG News Service: Microsoft’s Dryad technology to take on Google’s MapReduce. Just 11 months later, in Nov.2011, Doug Henschen writes for the same IDG News Service: Microsoft Ditches Dryad, Focuses On Hadoop - Software.

Nothing wrong with Microsoft decision. Same cannot be said though about the titles and articles published by the IDG News Service network.

Original title and link: Claim Chowder: Microsoft’s Dryad Technology to Take on Google’s MapReduce (NoSQL database©myNoSQL)

Hadoop: Amazon Elastic MapReduce and Microsoft Project Isotop

This is how things are rolling these days. Microsoft talks about offerring Hadoop integration with Project Isotop in 2012, Amazon is announcing immediate availability of new beefed instances (Cluster Compute Eight Extra Large (cc2.8xlarge)) and reduced prices for some of the existing instances.

Original title and link: Hadoop: Amazon Elastic MapReduce and Microsoft Project Isotop (NoSQL database©myNoSQL)

Project Isotope Will Bring Together Hadoop Toolchain With Microsoft’s Data Products

There’s a series of events lately that makes me think Microsoft is nowhere near accepting defeat in the cloud services area. As regards Microsoft’s Project Isotop, things are much simpler than ZDNet article make them sound[1]: Microsoft is working on integrating Hadoop and its toolchain with their own products (SQL Server Analysis Services, PowerPivot).

Microsoft Project Isotop

A picture worth more than the 626 words.

  1. I bet the details of integration are fascinating and far from being simple, but the article is not focusing on those  

Original title and link: Project Isotope Will Bring Together Hadoop Toolchain With Microsoft’s Data Products (NoSQL database©myNoSQL)

SQL Azure Federation... Aka Sharding

One of the exciting new features in the just-released SQL Azure Q4 2011 Service Release is SQL Azure Federation. In a sentence, SQL Azure Federation enables building elastic and scalable database tiers.

We all know the benefits of sharding so why calling it differently? NIH?

Original title and link: SQL Azure Federation… Aka Sharding (NoSQL database©myNoSQL)


Data Is the New Currency. But Who’s Leading the Way?

In 2005, Tim O’Reilly said: “data is the next Intel Inside“. Today IDC Mario Morales (VP of semiconductor research) says data is the new currency. All’s good until you read the continuation:

And the companies that understand this are the ones already developing the analytics and infrastructure to extract that value—companies like IBM, HP, Intel, Microsoft, TI, Freescale and Oracle.

The article (nb: may require registration) continues by looking at what each of these companies are doing in the Big Data space, but focuses a large part on IBM Watson.

Going back to the question “who’s leading the Big Data way“, let’s take a quick look at the technology behind Watson. According to Jeopardy Goes to Hadoop and About Watson, Watson technology is based on Apache Hadoop, using an IBM language technology built on the Apache UIMA platform[1] and running Linux on IBM boxes.

To me it looks like open source is leading the advances in Big Data and these large organizations are just connecting the dots (as in packaging these technologies for enterprise environments and contributing missing pieces here and there)[2]. When did this happen before?

  1. Dmitriy Ryaboy taught me that UIMA came out of IBM in the first place and they’ve been critical in its development.  

  2. Or they are very secretive about their internal initiatives and research.  

Original title and link: Data Is the New Currency. But Who’s Leading the Way? (NoSQL database©myNoSQL)