Google: All content tagged as Google in NoSQL databases and polyglot persistence
Alex Lloyd’s talk from Berlin Buzzwords 2012 about Google’s Spanner:
Google’s paper about their large-scale distributed systems tracing solution Dapper which inspired Twitter’s Zipkin:
Here we introduce the design of Dapper, Google’s production distributed systems tracing infrastructure, and describe how our design goals of low overhead, application-level transparency, and ubiquitous deployment on a very large scale system were met. Dapper shares conceptual similarities with other tracing systems, particularly Magpie  and X-Trace , but certain design choices were made that have been key to its success in our environment, such as the use of sampling and restricting the instrumentation to a rather small number of common libraries.
Download or read the paper after the break.
Announced at GigaOm Structure Data event, Google launches a new BigData service named BigQuery:
BigQuery enables businesses and developers to gain real-time business insights from massive amounts of data without any upfront hardware or software investments.
A quick bullet point list of BigQuery features and limitations:
- BigQuery is ideal for running queries over vast amounts of data—up to billions of rows—with great speed.
- BigQuery is good for analyzing vast quantities of data quickly, but not for modifying it. In data analysis terms, BigQuery is an OLAP (online analytical processing) system.
- You can import data into BigQuery as CSV data, where it is stored in the cloud in a relatively small number of tables with no explicit relationship to each other.
- BigQuery isn’t a database system:
- It doesn’t support table indexes or other database management features.
- BigQuery supports a specialized subset of SQL; it doesn’t support update or delete requests.
- BigQuery supports joins only when one side of the join is much smaller than the other.
- BigQuery can be used by any client able to send REST commands over the Internet.
After the break you can watch the 15 minutes video recorded at the GigaOm event.
Over the weekend I’ve read two papers presenting products or research related to improving or adding new capabilities to the MapReduce data processing approach. The first of them comes from a team at Microsoft and is describing TiMR a time-oriented data processing system in MapReduce. The second, from a team at Google, presents Tenzin - a SQL implementation on the MapReduce framework. It’s great to learn that while the Hadoop community is eliminating some of the initial limitations and hardening the technical details of the platform, there are already ideas and systems out there that augment the capabilities of the MapReduce data processing model.
Original title and link: Research in the MapReduce Space ( ©myNoSQL)