NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



28msec - query data from any source in real time

Derrick Harris writing about 28msec, still-in-stealth-mode, generic query language:

Their solution was to create a platform able to extract data from any of these sources, transform it into a standard format, and then let users analyze it using a single query language that looks a lot like the SQL they already know. 28msec is based on the open source JSONiq and Zorba query languages and will be available as a cloud service.

This sounds like a variant of an ETL process: Extract-Transform-Query. But it got me thinking of what Daniel Abadi has wrote about the difference between Hadapt and PolyBase, HAWQ—just replace Hadoop with another source of data and SQL with JSONiq:

[…] they all can access data in Hadoop, but there needs to be some sort of structured schema defined in order for the database to understand how to access it via SQL. So, bottom line, Polybase/SQL-H/Hawq let you dynamically get at data in Hadoop/HDFS that could theoretically have been stored in the DBMS all along, but for some reason is being stored in Hadoop instead of the DBMS.

The question is not if this process will work (ETL processes have been around for quite a while), but what can you do to optimize this extract-transform-query process.

Original title and link: 28msec - query data from any source in real time (NoSQL database©myNoSQL)