NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



BigQuery: All content tagged as BigQuery in NoSQL databases and polyglot persistence

How Safari Books Online uses Google BigQuery for BI

Looking for alternative solutions to built our dashboards and enable interactive ad-hoc querying, we played with several technologies, including Hadoop. In the end, we decided to use Google BigQuery.

Compare the original processing flow:

BigQuery processing flow

with these 2 possible alternatives and tell me if you notice any significant differences.

Alternatives to BigQuery

Original title and link: How Safari Books Online uses Google BigQuery for BI (NoSQL database©myNoSQL)


Goolge BigQuery: JOIN and GROUPBY EACH. And Something Is Wrong With SQL

New features added to Google BigQuery:

  • Big JOIN: use SQL-like queries to join very large datasets at interactive speeds
  • Big Group Aggregations: perform groupings on large numbers of distinct values
  • Timestamp: native support for importing and querying Timestamp data

I read with interest both the announcement and the technical (?) details post about the new SQL keyword EACH introduced by BigQuery to perform JOINs and GROUP BY for “large tables”. Unfortunately I couldn’t find what’s behind this new keyword.

This made me think again of what’s wrong with SQL: almost every engine implementation detail bubbles up to the user creating a new flavor of SQL. Just think about it: EACH has no meaning for either of these operations; is there a NOTEACH JOIN?. But it was needed to instruct the engine to perform the operation differently.

Original title and link: Goolge BigQuery: JOIN and GROUPBY EACH. And Something Is Wrong With SQL (NoSQL database©myNoSQL)

Overview of Dremel-Like Solutions: Moving Beyond Hadoop for Big Data Needs

Until I learn more about the recently announced Cloudera Impala and Druid from Metamarkets, this article by Jaikumar Vijayan should offer—with some inherent mistakes1—a good overview of the solutions aiming to offer alternatives to the batch-processing nature of Hadoop:

  • Google Dremel (BigQuery)
  • Cloudera Impala
  • Metamarkets Druid
  • Nodeable StreamReduce
  • SAP HANA integrated with Hadoop, etc.

  1. Just an example: “If you can stand latencies of a few seconds, Hadoop is fine. But Hadoop MapReduce is never going to be useful for sub-second latencies”. Then “The technology [nb Google Dremel] can run queries over trillion-row data tables in seconds…”

    Maybe just one more: consider the title “Moving beyond Hadoop” and then the quote from Google’s Ju-kay Kwek: “Google uses Dremel in conjuction with MapReduce. […] Hadoop and Dremel are distributed computing technologies, but each was built to address very different problems.” 

Original title and link: Overview of Dremel-Like Solutions: Moving Beyond Hadoop for Big Data Needs (NoSQL database©myNoSQL)


Google BigQuery Adds Support for JSON Import and Hierarchical Data

Besides performance and quota changes, Google BigQuery adds support for importing JSON data and nested/repeated fields:

If you’re using App Engine Datastore or other NoSQL databases, it’s likely you’re taking advantage of nested and repeated data in your data model. For example, a customer data entity might have multiple accounts, each storing a list of invoices. Now, instead of having to flatten that data, you can keep your data in a hierarchical format when you import to BigQuery.

Original title and link: Google BigQuery Adds Support for JSON Import and Hierarchical Data (NoSQL database©myNoSQL)


Google BigQuery: Running SQL-like Queries Against Very Large Datasets

Announced at GigaOm Structure Data event, Google launches a new BigData service named BigQuery:

BigQuery enables businesses and developers to gain real-time business insights from massive amounts of data without any upfront hardware or software investments.

A quick bullet point list of BigQuery features and limitations:

  • BigQuery is ideal for running queries over vast amounts of data—up to billions of rows—with great speed.
  • BigQuery is good for analyzing vast quantities of data quickly, but not for modifying it. In data analysis terms, BigQuery is an OLAP (online analytical processing) system.
  • You can import data into BigQuery as CSV data, where it is stored in the cloud in a relatively small number of tables with no explicit relationship to each other.
  • BigQuery isn’t a database system:
    • It doesn’t support table indexes or other database management features.
    • BigQuery supports a specialized subset of SQL; it doesn’t support update or delete requests.
    • BigQuery supports joins only when one side of the join is much smaller than the other.
  • BigQuery can be used by any client able to send REST commands over the Internet.

After the break you can watch the 15 minutes video recorded at the GigaOm event.