NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



SQL: All content tagged as SQL in NoSQL databases and polyglot persistence

Playing With Hadoop Pig

Anything missing from Pig?

[…] the following SQL operations can be translated as follows. We put the order in which the operations have to be run between parenthesis.

  • SELECT id, name: resultData = FOREACH limitData GENERATE id, name
  • FROM Table: data = LOAD ‘person.csv’ USING PigStorage(‘,’) AS (id:int, name:chararray, age:int)
  • WHERE a=1: filteredData = FILTER data BY a=1
  • ORDER BY age DESC: orderedData = ORDER filteredData BY age DESC
  • LIMIT 10: limitData = LIMIT orderedData 10

One can also use left join and join as follows:

  • JOIN: join_data: JOIN data1 BY id1, data2 BY id2
  • LEFT JOIN: left_join_data = JOIN data1 BY id1 LEFT OUTER, data2 BY id2

Original title and link: Playing With Hadoop Pig (NoSQL database©myNoSQL)


SQL Over HBase With Phoenix

Released by the Salesforce team, Phoenix adds a SQL layer on top of HBase and an almost complete JDBC driver.

Direct use of the HBase API, along with coprocessors and custom filters, results in performance on the order of milliseconds for small queries, or seconds for tens of millions of rows.

The project already has a page about the performance and the results are looking great. For a bullet list summary, check out James Taylor’s post.

Original title and link: SQL Over HBase With Phoenix (NoSQL database©myNoSQL)

What Is Thredis?

Gregory Trubetskoy about his Thredis project:

Thredis is Redis + SQL + Threads. Or perhaps it’s pure lunacy resulting from some mad winter hacking mixed with eggnog. Or perhaps it’s the first hybrid SQL/NoSQL server. You be the judge.

I only wish the community is not taking Redis this direction.

Original title and link: What Is Thredis? (NoSQL database©myNoSQL)


Modeling a Simple Social App Using SQL and Redis

Felix Lin sent me a link to the slides he presented at NoSQL Taiwan meetup. There are 105 of them!

The deck covers:

  • how to build a simple social site using SQL
  • what are the performance issues with SQL
  • how to use the data structures in Redis for getting the same features
  • how to solve the performance issues in SQL by using Redis

Check them up after the break:

Implementing SQL With Unix Utilities

  1. SELECT col1, col2 (i.e. projections) can be implemented with several variants of Unix utilities: cut and awk are the two most obvious. 2. JOIN can be implemented with the… wait for it join utility. You’ll need to sort its input first, though.
  2. Many GROUP BY operations can be performed with combinations of grep -c, sort with or without the -urnk options (look at the man page — you can apply options to individual sort keys), and uniq with or without the -c option. Many more can be done with 20 or 30 characters of awk.
  3. Output formatting is easy with column, especially with the -t option.

Reminded me of the now vanished from the Internet Ted Dziuba’s Taco Bell Programming.

Original title and link: Implementing SQL With Unix Utilities (NoSQL database©myNoSQL)


6 Ideal Features for Big Data Transactional Database

Dan Kusnetzky proposes the following 6 features as part of an ideal transactional database:

  1. SQL
  2. ACID transactions
  3. Data and application independence
  4. Elasticity
  5. Multi-tenancy
  6. Geographic distribution

The real question would such a database including all these features be possible? We already know that ACID transactions and support for geographic distribution don’t mix well. SQL was created to work with the relational model, so it’ll be quite limited when considering other data models—think graphs. There are also some (good) arguments why a declarative language as SQL might not be the best fit for large scale databases. Last, but not least, designing a common API to support the different data models is not that realistic either.

Original title and link: 6 Ideal Features for Big Data Transactional Database (NoSQL database©myNoSQL)


Which Is Better for Programmers: SQL vs. NoSQL?

Jeff Cogswell compares some short code samples in an attempt to answer the much bigger question:

But what about the programmers, who write the client code that access the databases? Where do the disagreements leave them? From a programming perspective, is SQL really that horrible and outdated? Or is the new NoSQL really that awful to work with? Perhaps they both have strengths and good points.

I confess that reading the above made me curious about what the article would conclude. Unfortunately, by the time I’ve read the first comparison (JavaScript in NodeJS using SQL vs Mongo) I realized my expectations were too high. For a few reasons:

  1. it would have been impossible to compare the APIs of all relevant NoSQL databases with a relational database;
  2. it would have been very difficult to choose a generic, representative enough use case;
  3. the results would have always been heavily influenced by the quality of drivers and libraries used.

Last but not least, many of the merits of the NoSQL databases are related to operational complexity and not programming complexity. As someone that did a fare amount of coding and close to zero operations, I would probably feel OK accepting a bit of programming complexity for simplified operations. But that might be just a biased opinion.

Original title and link: Which Is Better for Programmers: SQL vs. NoSQL? (NoSQL database©myNoSQL)


MapReduce and Massively Parallel Processing (MPP): Two Sides of the Big Data

Andrew Brust for ZDNet:

But, for a variety of reasons, MPP and MapReduce are used in rather different scenarios. You will find MPP employed in high-end data warehousing appliances. […] MPP gets used on expensive, specialized hardware tuned for CPU, storage and network performance. MapReduce and Hadoop find themselves deployed to clusters of commodity servers that in turn use commodity disks. The commodity nature of typical Hadoop hardware (and the free nature of Hadoop software) means that clusters can grow as data volumes do, whereas MPP products are bound by the cost of, and finite hardware in, the appliance and the relative high cost of the software. […] MPP and MapReduce are separated by more than just hardware. MapReduce’s native control mechanism is Java code (to implement the Map and Reduce logic), whereas MPP products are queried with SQL (Structured Query Language). […] Nonetheless, Hadoop is natively controlled through imperative code while MPP appliances are queried though declarative query. In a great many cases, SQL is easier and more productive than is writing MapReduce jobs, and database professionals with the SQL skill set are more plentiful and less costly than Hadoop specialists.

I totally agree with Andrew Brust that none of these are good reasons for these platforms to remain separate. Actually when analyzing the importance of the Teradata (MPP) and Hortonworks (Hadoop) partnership, I wrote:

Depending on the level of integration the two team will pull together, this partnership might result in one of the most complete and powerful structured and unstructured data warehouse and analytics platform.

This very same thing could be said about any platform that would offer a viable, fully integrated, cost effective, distributed, structured and unstructured data warehouse or analytics platform. MPP and MapReduce do not represent different sides of the Big Data, but rather complementary approaches for Big Data.

Original title and link: MapReduce and Massively Paralle Processing (MPP): Two Sides of the Big Data (NoSQL database©myNoSQL)


Taking a Step Back From ORMs and a Parallel to the Database World

Jeff Davis:

So, my proposal is this: take a step back from ORMs, and consider working more closely with SQL and a good database driver. Try to work with the database, and find out what it has to offer; don’t use layers of indirection to avoid knowing about the database. See what you like and don’t like about the process after an honest assessment, and whether ORMs are a real improvement or a distracting complication.

I know a lot of applications using ORMs that worked perfectly fine. And I know applications that had to go around the ORMs or even got rid completely of them.

Here is a parallel to think about: ORM vs SQL is similar to always using a relational database versus using the storage solution that better fits the problem—as in using a NoSQL database or going polyglot persistence. An ORM comes with the advantage of keeping you inside a single paradigm (object oriented) at the cost of not being able to (easily) use the full power of the underlying storage.

Original title and link: Taking a Step Back From ORMs and a Parallel to the Database World (NoSQL database©myNoSQL)


SQL or Hadoop: What Tools Should I Use to Process My Data?

Great decision flowchart created by Aaron Cordova to help answer the question: what tools should I use to process my data:

SQL or Hadoop

Click to view full size. Credit Aaron Cordova

Original title and link: SQL or Hadoop: What Tools Should I Use to Process My Data? (NoSQL database©myNoSQL)

MarkLogic Querying for SQL People

Inspired by the MongoDB MapReduce translated to SQL and Neo4j Cypher Querying for SQL People, MarkLogic’s Jason Hunter and Eric Bloch put together a page mapping SQL terms and queries to MarkLogix terms and XQuery queries respectively.

Here is how SQL statements translate to MarkLogic XQuery expressions: