SQL: All content tagged as SQL in NoSQL databases and polyglot persistence
Another weekend read, this time from Facebook and The Ohio State University and closer to the hot topic of the last two weeks: SQL, MapReduce, Hadoop:
MapReduce has become an effective approach to big data analytics in large cluster systems, where SQL-like queries play important roles to interface between users and systems. However, based on our Facebook daily operation results, certain types of queries are executed at an unacceptable low speed by Hive (a production SQL-to-MapReduce translator). In this paper, we demonstrate that existing SQL-to-MapReduce translators that operate in a one-operation-to-one-job mode and do not consider query correlations cannot generate high-performance MapReduce programs for certain queries, due to the mismatch between complex SQL structures and simple MapReduce framework. We propose and develop a system called YSmart, a correlation aware SQL-to- MapReduce translator. YSmart applies a set of rules to use the minimal number of MapReduce jobs to execute multiple correlated operations in a complex query. YSmart can significantly reduce redundant computations, I/O operations and network transfers compared to existing translators. We have implemented YSmart with intensive evaluation for complex queries on two Amazon EC2 clusters and one Facebook production cluster. The results show that YSmart can outperform Hive and Pig, two widely used SQL-to-MapReduce translators, by more than four times for query execution.
Released by the Salesforce team, Phoenix adds a SQL layer on top of HBase and an almost complete JDBC driver.
Direct use of the HBase API, along with coprocessors and custom filters, results in performance on the order of milliseconds for small queries, or seconds for tens of millions of rows.
Original title and link: SQL Over HBase With Phoenix ( ©myNoSQL)
Felix Lin sent me a link to the slides he presented at NoSQL Taiwan meetup. There are 105 of them!
The deck covers:
- how to build a simple social site using SQL
- what are the performance issues with SQL
- how to use the data structures in Redis for getting the same features
- how to solve the performance issues in SQL by using Redis
Check them up after the break: