ALL COVERED TOPICS

NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter

NAVIGATE MAIN CATEGORIES

Close

Google: All content tagged as Google in NoSQL databases and polyglot persistence

Introducing Google Cloud Datastore

Urs Hölzle in a post summarizing some of the announcements at Google I/O:

Google Cloud Datastore is a fully managed and schemaless solution for storing non-relational data. Based on the popular App Engine High Replication Datastore, Cloud Datastore is a standalone service that features automatic scalability and high availability while still providing powerful capabilities such as ACID transactions, SQL-like queries, indexes and more.

I’m heading over to the project’s site to read more.

Original title and link: Introducing Google Cloud Datastore (NoSQL database©myNoSQL)

via: http://googlecloudplatform.blogspot.com/2013/05/ushering-in-next-generation-of.html


Amazon Preparing 'Disruptive' Big Data AWS Service?

Interesting speculation by The Register:

AWS already has the AWS Data Pipeline, which helps administrators schedule and shuttle data among various services, AWS Redshift for data warehousing which lets people store large quantities of data in the cloud and run queries on it, its NoSQL SSD-backed DynamoDB, and its Relational Database Service (RDS). So where does MADS fit?

The Reg’s take is that MADS will allow Amazon to build services that can net together the above components and help automate the passing of data among them. It may also become a standalone product in its own right, based on its similarities to the TransLattice and Google Spanner tech.

I almost never bet, but I’d say this could be Amazon’s Spanner.

Original title and link: Amazon Preparing ‘Disruptive’ Big Data AWS Service? (NoSQL database©myNoSQL)

via: http://www.theregister.co.uk/2013/02/19/amazon_new_big_data_aws_service/


Overview of Dremel-Like Solutions: Moving Beyond Hadoop for Big Data Needs

Until I learn more about the recently announced Cloudera Impala and Druid from Metamarkets, this article by Jaikumar Vijayan should offer—with some inherent mistakes1—a good overview of the solutions aiming to offer alternatives to the batch-processing nature of Hadoop:

  • Google Dremel (BigQuery)
  • Cloudera Impala
  • Metamarkets Druid
  • Nodeable StreamReduce
  • SAP HANA integrated with Hadoop, etc.

  1. Just an example: “If you can stand latencies of a few seconds, Hadoop is fine. But Hadoop MapReduce is never going to be useful for sub-second latencies”. Then “The technology [nb Google Dremel] can run queries over trillion-row data tables in seconds…”

    Maybe just one more: consider the title “Moving beyond Hadoop” and then the quote from Google’s Ju-kay Kwek: “Google uses Dremel in conjuction with MapReduce. […] Hadoop and Dremel are distributed computing technologies, but each was built to address very different problems.” 

Original title and link: Overview of Dremel-Like Solutions: Moving Beyond Hadoop for Big Data Needs (NoSQL database©myNoSQL)

via: http://www.infoworld.com/print/205879


Google BigQuery Adds Support for JSON Import and Hierarchical Data

Besides performance and quota changes, Google BigQuery adds support for importing JSON data and nested/repeated fields:

If you’re using App Engine Datastore or other NoSQL databases, it’s likely you’re taking advantage of nested and repeated data in your data model. For example, a customer data entity might have multiple accounts, each storing a list of invoices. Now, instead of having to flatten that data, you can keep your data in a hierarchical format when you import to BigQuery.

Original title and link: Google BigQuery Adds Support for JSON Import and Hierarchical Data (NoSQL database©myNoSQL)

via: http://googleenterprise.blogspot.com/2012/10/google-bigquery-updates-faster-easier.html


Todd Hoff on Google Spanner's

Todd Hoff of Highscalability.com:

What struck me most in the paper was a deeply buried section essentially describing Google’s motivation for shifting away from NoSQL and to NewSQL. The money quote:

We believe it is better to have application programmers deal with performance problems due to overuse of transactions as bottlenecks arise, rather than always coding around the lack of transactions.

That’s one piece of the Spanner paper that’s catching everyone’s attention. I’m wondering how much of this reference to transactions refers to:

  1. multi-operations transactions
  2. synchronous replication
  3. data strong consistency

Original title and link: Todd Hoff on Google Spanner’s (NoSQL database©myNoSQL)

via: http://highscalability.com/blog/2012/9/24/google-spanners-most-surprising-revelation-nosql-is-out-and.html


Building Spanner Presentation

Alex Lloyd’s talk from Berlin Buzzwords 2012 about Google’s Spanner:


Cloudant's Mike Miller on Google Spanner

Cloudant’s Mike Miller sharing his thoughts about Google’s Spanner paper:

Spanner’s key innovation is around time. It includes a novel system using GPS and Atomic Clocks to distribute a globally synchronized “proper time.” The previous dogma in distributed systems was that synchronizing time within and between datacenters is insurmountably hard and uncertain. Ergo, serialization of requests is impossible at global scale. Google’s key innovation is to accept uncertainty, keep it small (via atomic clocks and GPS), quantify the uncertainty and operate around it. In retrospect this is obvious, but it doesn’t make it any less brilliant.

Original title and link: Cloudant’s Mike Miller on Google Spanner (NoSQL database©myNoSQL)

via: https://cloudant.com/blog/cloudant-labs-on-google-spanner/


Google Cloud Platform Is the Biggest Deal in IT Since Amazon Launched EC2

Remember what I was writing in the state of Hadoop market about having a second option for on-demand cloud-based Hadoop services? Benjamin Black compares Google Cloud Platform with Amazon services:

  • Cloud Engine is a lot like EC2 & EBS
  • Cloud Storage is a lot like S3
  • Cloud SQL is a lot like RDS
  • Analytics can be used like CloudWatch (and I know of people putting billions of their own data points in Analytics)
  • BigQuery has no AWS equivalent, but maybe you could build it with EMR?
  • PageSpeed has no AWS equivalent

Hadoop and MapR are already listed as possible use cases for Google Cloud Platform.

I don’t think I could write a better conclusion than Black did in his post:

This is big, planetary scale infrastructure. This is cloud legitimized and super-sized. In the words of the prophet: Shit just got real.

Original title and link: Google Cloud Platform Is the Biggest Deal in IT Since Amazon Launched EC2 (NoSQL database©myNoSQL)

via: http://blog.b3k.us/2012/07/04/cloud-independence-day.html


Dapper, a Large-Scale Distributed Systems Tracing Infrastructure

Google’s paper about their large-scale distributed systems tracing solution Dapper which inspired Twitter’s Zipkin:

Here we introduce the design of Dapper, Google’s production distributed systems tracing infrastructure, and describe how our design goals of low overhead, application-level transparency, and ubiquitous deployment on a very large scale system were met. Dapper shares conceptual similarities with other tracing systems, particularly Magpie [3] and X-Trace [12], but certain design choices were made that have been key to its success in our environment, such as the use of sampling and restricting the instrumentation to a rather small number of common libraries.

Download or read the paper after the break.


Google BigQuery: Running SQL-like Queries Against Very Large Datasets

Announced at GigaOm Structure Data event, Google launches a new BigData service named BigQuery:

BigQuery enables businesses and developers to gain real-time business insights from massive amounts of data without any upfront hardware or software investments.

A quick bullet point list of BigQuery features and limitations:

  • BigQuery is ideal for running queries over vast amounts of data—up to billions of rows—with great speed.
  • BigQuery is good for analyzing vast quantities of data quickly, but not for modifying it. In data analysis terms, BigQuery is an OLAP (online analytical processing) system.
  • You can import data into BigQuery as CSV data, where it is stored in the cloud in a relatively small number of tables with no explicit relationship to each other.
  • BigQuery isn’t a database system:
    • It doesn’t support table indexes or other database management features.
    • BigQuery supports a specialized subset of SQL; it doesn’t support update or delete requests.
    • BigQuery supports joins only when one side of the join is much smaller than the other.
  • BigQuery can be used by any client able to send REST commands over the Internet.

After the break you can watch the 15 minutes video recorded at the GigaOm event.


How Web giants store big data

An ArsTechnica, not very technical, overview of the storage engines developed and used by Google (Google File System, BigTable), Amazon (Dynamo), Microsoft (Azure DFS), plus the Hadoop Distributed File System (HDFS).

Original title and link: How Web giants store big data (NoSQL database©myNoSQL)

via: http://arstechnica.com/business/news/2012/01/the-big-disk-drive-in-the-sky-how-the-giants-of-the-web-store-big-data.ars/1


Research in the MapReduce Space

Over the weekend I’ve read two papers presenting products or research related to improving or adding new capabilities to the MapReduce data processing approach. The first of them comes from a team at Microsoft and is describing TiMR a time-oriented data processing system in MapReduce. The second, from a team at Google, presents Tenzin - a SQL implementation on the MapReduce framework. It’s great to learn that while the Hadoop community is eliminating some of the initial limitations and hardening the technical details of the platform, there are already ideas and systems out there that augment the capabilities of the MapReduce data processing model.

Original title and link: Research in the MapReduce Space (NoSQL database©myNoSQL)