NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



JSON: All content tagged as JSON in NoSQL databases and polyglot persistence

PostgreSQL and the NoSQL world

As I linked earlier today to the MemSQL and JSON story, I’ve thought again about PostgreSQL and its community approach of bringing in new features. It’s hard to miss what they are doing. And I think they are doing it right.

The PostgreSQL community is looking outside the box and listens. What features are users of NoSQL databases most excited about? Can we offer native support for them? Can we integrate with these other tools? These are the right questions to ask when considering expanding outside your space to help your users.

PostgreSQL 9.2 introduced a JSON data type and JSON functions and operators. Here’s what I wrote about when linking to a post about the augmented support for the JSON data type.

PostgreSQL 9.3 added a Javascript engine (V8) bringing even more power to the JSON data type. Also in PostgreSQL 9.3 there’s support for foreign data wrappers: a feature allowing to query from PostgreSQL external data sources.

While it might sound easy to watch what others are doing and then do it yourself—this is probably well-known as the Microsoft strategy—the reality is there’s a lot of complexity of following this strategy. Besides asking the right questions when picking what features to bring in, there are always the technical and design decisions:

  1. can we actually support this?
  2. can we support it in a way that’ll not break or impact negatively existing features?
  3. how should we expose these “imported” features so we make them appealing to existing users (with their vision of the product), while keeping them attractive and familiar to new users?

The last question is the most difficult to come with the right answers.

✚ Here’s also a post I’ve linked to showing how to use PostgreSQL as a schemaless database.

Original title and link: PostgreSQL and the NoSQL world (NoSQL database©myNoSQL)

MemSQL: Use SQL to Query JSON

According to this, the next version of MemSQL will support JSON as a data type, similar to PostgreSQL JSON type, but with a slightly—really, just slightly—better syntax and function names:

By supporting the JSON datatype within a high performance database, MemSQL enables real-time analytics on data feeds of variable structure.

  • Use standard SQL over JSON data, including built-ins, GROUP BY, JOINs, and more.
  • Create JSON indexes online.
  • Client drivers require no changes to support JSON.
  • JSON properties are updatable.

Just in case you are wondering how this sits with me yelling about a premature return to SQL, keep in mind that:

  1. MemSQL is SQL-based
  2. They’re trying to extend SQL on top of JSON

So, I guess that’s OK. It won’t be easy though as they’ll have to address some interesting problems—null vs missing attributes, applying aggregation functions on heterogenous types, and so on. Or they might decide to go with “f### it, just normalize your JSON data first”.

Original title and link: MemSQL: Use SQL to Query JSON (NoSQL database©myNoSQL)


IBM and 10gen are collaborating on a standard that would make it easier to write applications that can access data from both MongoDB and relational systems such as IBM DB2

The details are pretty confusing1

[…] the new standard — which encompasses the MongoDB API, data representation (BSON), query language and wire protocol — appears to be all about establishing a way for mobile and other next-generation applications to connect with enterprise database systems such as IBM’s popular DB2 database and its WebSphere eXtreme Scale data grid.

But the juicy part is in the comments; if you can ignore the pitches.

  1. if this is a new standard and it is all based on the already existing MongoDB API, BSON, and wire protocol, then 1) what’s new about it and 2) what exactly will make it a standard

Original title and link: IBM and 10gen are collaborating on a standard that would make it easier to write applications that can access data from both MongoDB and relational systems such as IBM DB2 (NoSQL database©myNoSQL)


PostgreSQL as NoSQL with Data Validation

Szymon Guz writes about JSON support in PostgreSQL:

So, I’ve shown you how you can use PostgreSQL as a simple NoSQL database storing JSON blobs of text. The great advantage over the simple NoSQL databases storing blobs is that you can constrain the blobs, so they are always correct and you shouldn’t have any problems with parsing and getting them from the database.

You can also query the database very easily, with huge speed. The ad-hoc queries are really simple, much simpler than the map-reduce queries which are needed in many NoSQL databases.

Since before NoSQL was called NoSQL, I’ve always thought that there’s a market, and more important, there are use cases for using single, unitary platforms for handling data. But there’s also a market, and the corresponding uses cases, for using different platforms for handling data. And there’s also the federated database systems and the logical data warehouses.

✚ I have this dream about how the databases will look in the future, but I never get around to putting together all the pieces, crossing the t’s and dotting the i’s.

Original title and link: PostgreSQL as NoSQL with Data Validation (NoSQL database©myNoSQL)


PosgreSQL as a Schemaless Database

A very interesting set of slides from Christophe Pettus looking at the features in PosgreSQL that would allow one to use it as a document database:

  1. XML
    1. built-in type
    2. can handle very large documents (2GB)
    3. XPath support
    4. export functions
    5. no indexing, except defining custom ones using expression index
  2. hstore
    1. hierarchical storage type
    2. in contrib (not part of the core)
    3. custom functions (nb: very ugly syntax imo)
    4. GiST and GIN indexes (nb: I’ve posted in the past about PostgreSQL GiST and GIN Index Types)
    5. supports also expression indexes
  3. JSON
    1. built-in type starting with PostgreSQL 9.2
    2. validates JSON
    3. support expression indexing
    4. nothing else besides a lot of feature scheduled for

Christophe Pettus’s slides also include the results and some thoughts about a locally-run pseudo-benchmark against these engines and MongoDB.

You can see all the slides and download them after the break.

Original title and link: PosgreSQL as a Schemaless Database (NoSQL database©myNoSQL)

Google BigQuery Adds Support for JSON Import and Hierarchical Data

Besides performance and quota changes, Google BigQuery adds support for importing JSON data and nested/repeated fields:

If you’re using App Engine Datastore or other NoSQL databases, it’s likely you’re taking advantage of nested and repeated data in your data model. For example, a customer data entity might have multiple accounts, each storing a list of invoices. Now, instead of having to flatten that data, you can keep your data in a hierarchical format when you import to BigQuery.

Original title and link: Google BigQuery Adds Support for JSON Import and Hierarchical Data (NoSQL database©myNoSQL)


JSONiq: The JSON Query Language

The long time reader William Candillon of 28msec send me a link to JSONiq - The JSON Query Language, a group initiative to bring XQuery-like queriability to JSON:

Our goal in the JSONiq group, is to put the maturity of XQuery to work with JSON data. JSONiq is an open extension of the XQuery data model and syntax to support JSON.

After reading and experimenting a bit with JSONiq my initial thought is that while it looks interesting, it feels like an XMLish complicated query language that doesn’t really reflect the simplicity and philosophy of JSON.

let $stats := db:find("stats")
for $access in $stats
group by $url := $access("url")
return {
  "url": $url,
  "avg": avg($access("response_time")),
  "hits": count($access)

What do you think?

Original title and link: JSONiq: The JSON Query Language (NoSQL database©myNoSQL)

Jaql: Query Language for JSON in IBM InfoSphere BigInsights

jaql was created and is used by IBM InfoSphere BigInsights—the IBM Apache Hadoop distribution:

Jaql’s query language was inspired by many programming and query languages that include: Lisp, SQL, XQuery, and PigLatin. Jaql is a functional, declarative query language that is designed to process large data sets. For parallelism, Jaql rewrites high-level queries when appropriate into a “low-level” query consisting of Map-Reduce jobs that are evaluated using the Apache Hadoop project. Interestingly, the query rewriter produces valid Jaql queries which illustrates a departure from the rigid, declarative-only approach (but with hints!) of most relational databases. Instead, developers can interact with the “low-level” queries if needed and can add in their own low-level functionality such as indexed access or hash-based joins that are missing from Map-Reduce platforms.

Original title and link: Jaql: Query Language for JSON in IBM InfoSphere BigInsights (NoSQL database©myNoSQL)

Pig Latin and JSON on Amazon Elastic Map Reduce

In order to not have to learn everything about setting up Hadoop and still have the ability to leverage the power of Hadoop’s distributed data processing framework and not have to learn how to write map reduce jobs and … (this could go on for a while so I’ll just stop here). For all these reasons, I choose to use Amazon’s Elastic Map infrastructure and Pig.

I will talk you through how I was able to do all this [take my log data stored on S3 (which is in compressed JSON format) and run queries against it] with a little help from the Pig community and a lot of late nights. I will also provide an example Pig script detailing a little about how I deal with my logs (which are admittedly slightly abnormal).

Sadly such an useful tool in the Hadoop ecosystem doesn’t make the headlines.

Original title and link: Pig Latin and JSON on Amazon Elastic Map Reduce (NoSQL databases © myNoSQL)


MongoDB Impressions

I’ve spent the past three days diving into MongoDB and jQuery. [ …] But the real attraction is the ability to work with syntax very similar to Javascript and JSON from client to server to database. Just better for my relatively weak, pan-fried brain.

Let’s not forget though, that MongoDB is BSON so you’ll need to go through a native driver. But there are other NoSQL databases that speak natively JSON (e.g. CouchDB) and that may not even need a server side component.

Original title and link: MongoDB Impressions (NoSQL databases © myNoSQL)


ColdFusion's Flawed JSON and NoSQL databases

It’s kind of difficult to understand why they cannot get JSON right:

Now the surprise: ColdFusion 9.0.1 says goodbye to the number format in JSON. All numbers in JSON will become strings once ColdFusion gets its hand on the data. There are no more numbers. 10 becomes “10”, 10.2 becomes “10.2” etc. Even though the patch notes stated that differently. And i still believe it was not Adobe’s intention to remove numbers but somehow this flawed implementation slipped in.

As the author puts it:

ColdFusion 9.0.1 is essentially disqualified to be used with NoSQL datastores where JSON is involved.