NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



NoSQL Database: All content tagged as NoSQL Database in NoSQL databases and polyglot persistence

What happened to the buzz around NoSQL?

Christopher Taylor:

There you have it. It isn’t that NoSQL isn’t an important advance in information technology…it just isn’t the revolution that the past buzz would have indicated.

Name just one technology that has completely replaced — this is actually what analysts and investors are expecting — the last 30+ years incumbent technologies.

Original title and link: What happened to the buzz around NoSQL? (NoSQL database©myNoSQL)


Popularizing the NoSQL Movement

Indeed, “big data” is mostly an outgrowth of the huge amounts of information that were collected and needed to be used by the big Internet services. Most the big services quickly found out that the amount of data they needed to store and the frequency in which they need to update and access it could not be met by traditional relational database management systems

Even if I’m writing about NoSQL databases and adjacent technologies for almost 3 years, I don’t think I’ve authored a single generic post meant to popularize the NoSQL movement. By now, I understand the recipe: get all terms buzzwords, gather all your previous posts about companies that might be somehow related, and glue everything together with things that not necessary make sense, but sound good . I still don’t think I could write one though.

NB: After reading this article, could you please teach me how should I spell NoSQL: noSQL, !SQL, NoSQL, no-SQL, or not-only-SQL.

Original title and link: Popularizing the NoSQL Movement (NoSQL database©myNoSQL)


The Total Cost of (Non) Ownership of a NoSQL Database Service

The Amazon team released a whitepaper comparing the total cost of ownership for 3 scenarios:

  1. on-premise NoSQL database
  2. NoSQL database deployed on Amazon EC2 and Amazon EBS
  3. Amazon DynamoDB

The Total Cost of Ownership of a NoSQL Database service

As you can imagine DynamoDB comes out as the most cost-effective solution (79% more effective than on-premise NoSQL database and 61% more cost-effective than AWS hosted NoSQL database). Read or download the paper after the break.

5 Requirements for Enterprise NoSQL databases

Emil Eifrem enumerates 5 requirements for adopting NoSQL databases in the enterprise environment:

  1. Ability to Handle Today’s Complex and Connected Data
  2. Simplify the Development of Applications Using Complex and Connected Data
  3. Support for End-to-End Transactions
  4. Enterprise-grade Durability so that Data is Never Lost
  5. Java Still Reigns for Enterprise Development

I think Emil Eifrem has left out a couple of other critical aspects, but I agree with 4 and 1/2 of those on his list.

Original title and link: 5 Requirements for Enterprise NoSQL databases (NoSQL database©myNoSQL)


Attacking NoSQL and Node.js: Server-Side JavaScript Injection (SSJS)

Jeff Darcy has written a while back about the (lack of) security in NoSQL database. Unfortunately things haven’t changed much and if you check the NoSQL + Node.js applications I’ve posted lately you’ll notice that some of them are completely ignoring security.

And there are some people realizing the risks and starting to express their concerns:

Playing with MongoDB lately, I’m getting scared. Because I’m seeing some really bad practices out there. Seeing it in live code. In tutorials.

Bryan Sullivan (Senior Security Researcher, Adobe Secure Software Engineering Team) has published a paper (PDF) explaining some of the possible server-side JavaScript injection attacks and the risks the apps and the data are exposed to. Teaser: he can do pretty much everything.

It should be noted that exploitation of server-side JavaScript injection vulnerabilities is more like that of SQL injection than of cross-site scripting. SSJS injection does not require any social engineering of an intermediate victim user the way that reflected XSS or DOM-based XSS do; instead, the attacker can attack the application directly with arbitrarily created HTTP requests.

Because of this, defenses against SSJS injection are also similar to SQL injection defenses:

  • Avoid creating “ad-hoc” JavaScript commands by concatenating script with user input.
  • Validate user input used in SSJS commands with regular expressions.
  • Avoid use of the JavaScript eval command. In particular, when parsing JSON input, use a safer alternative such as JSON.parse.

Remember there’s no such thing as security through obscurity.

Original title and link: Attacking NoSQL and Node.js: Server-Side JavaScript Injection (SSJS) (NoSQL database©myNoSQL)

The Wonderful Wizard of Oz Through a Polyglot Persistence Glass

Adrian Giordani:

Relational databases are the Yellow Brick Road of managing large structured data globally. […] In the 1939 film of The Wizard of Oz, a red brick road is intertwined with the yellow one. Similarly, a new type of database might soon offer a different path: NoSQL, or Not-Only-SQL, first coined in 2008, is promising a faster and more scalable database architecture, at least for some cases.

But who’s the Tin Woodman?

Original title and link: The Wonderful Wizard of Oz Through a Polyglot Persistence Glass (NoSQL database©myNoSQL)


Enterprise Caches Versus Data Grids Versus NoSQL Databases

RedHat/JBoss Manik Surtani:

[…] If you want to compare distributed systems, both data grids and NoSQL have kind of come from different starting points, if you will. They solve different problems, but where they stand today they’ve kind of converged. Data grids have been primarily in-memory but now they spill off onto disk and so on and so forth and they’ve added in-query and mapreduce onto it while NoSQL have primarily been on disk, but now cache stuff in-memory anyway for performance. They are starting to look the same now, or are very similar.

One big difference though that I see between data grids and NoSQL, something that still exists today, is how you actually interact with these systems. Data grids tend to be in VM, they tend to be embedded, you tend to launch a Java or JVM program, you tend to connect to a data grid API and you work with it whereas NoSQL tends to be a little bit more client server, a bit more like old-fashion databases where you open a socket to your NoSQL database or your NoSQL grid, if you will, and start talking to it. That’s the biggest difference I see today, but even that will eventually go away.

They seem to converge, but:

  • spilling off to disk is not equivalent to optimized disk access
  • distributed, sometimes even transactional caches are not equivalent with single node caches

Original title and link: Enterprise Caches Versus Data Grids Versus NoSQL Databases (NoSQL database©myNoSQL)


NoSQL Databases Best Practices and Emerging Trends

Jans Aasman (CEO AllegroGraph) interviewed by Srini Penchikala:

InfoQ: What best practices and architecture patterns should the developers and architects consider when using a solution like this one in their software applications?

Jans: If your application requires simple straight joins and your schema hardly changes then any RDBM will do.

If your application is mostly document based, where a document can be looked at as a pre-joined nested tree (think a Facebook page, think a nested JSON object) and where you don’t want to be limited by an RDB schema then key-value stores and document stores like MongoDB are a good alternative.

If you want what is described in the previous paragraph but you have to perform complex joins or apply graph algorithms then the MongoGraph approach might be a viable solution.

Thinking about the products and projects I’ve been working on, most of them have had to deal with all these aspects in different areas of the applications and with different importance to the final solution. Mistakenly though, in most of the cases they ended up using a relational database only. With polyglot persistence here, this shouldn’t happen anymore. That’s not to say though that every project must use all of these technologies just because they are available. But it could use any of them or all combined.

InfoQ: What are the emerging trends in combining the NoSQL data stores?

Jans: From the perspective of a Semantic Web - Graph database vendor what we see is that nearly all graph databases now perform their text indexing with Lucene based indexing (Solr or Elastic Search) and I wouldn’t be surprised that most vendors soon will allow JSON objects as first class objects for graph databases. It was surprisingly straightforward to mix the JSON and triple/graph paradigm. We are also experimenting with key-value stores to see how that mixes with the triple/graph paradigm.

This topic was also discussed during my NoSQL Applications panel, but due to a panel time constraints we couldn’t reach a conclusion. But it’s definitely an interesting perspective.

Original title and link: NoSQL Databases Best Practices and Emerging Trends (NoSQL database©myNoSQL)


Distributed Caches, NoSQL Databases, and RDBMS

Greg Luck[1] following up on his article Ehcache: Distributed Cache or NoSQL Store? talks about architectural differences between distributed caches, NoSQL database, and RDBMS and where distributed caches fit:

NoSQL and RDBMS are generally on disk. Disks are mechanical devices and exhibit large latencies due to seek time as the head moves to the right track and read or write times dependent on the RPM of the disk platter. NoSQL tends to optimise disk use, for example, by only appending to logs with the disk head in place and occasionally flushing to disk. By contrast, caches are principally in memory. […] With RDBMS a cache is added to avoid these scale out difficulties. For NoSQL, scale out is built-in, so the cache will get used when lower latencies are required.

  1. Greg Luck: Founder and CTO, Ehcache  

Original title and link: Distributed Caches, NoSQL Databases, and RDBMS (NoSQL database©myNoSQL)


How to Implement an IMAP Server on Top of a CouchDB/NoSQL Data Store?

Interesting question on SO:

To summarize my objective here, I am really just looking for a simple, opensource method which allows me to create and maintain a (preferably noSQL db) backup/archieve of one/more remote IMAP email accounts on a per user basis and sync each individual users email accounts using a simple, low cost solution which easily scales out, consumes server resources in an efficient maner with the ADDED ABILITY that each user needs to be able to connect to his central email archive by simply addingba new imap account to his existing email client using an imap server, username and password provided through this archive server/setup.

This reminded me of a GSOC project to design and implement a distributed mailbox on top of Hadoop HDFS as part of the Apache James project. The project description can be found on this JIRA ticket and more details here:

We need to implement mailbox storage as a distributed system on top of Hadoop HDFS. The James mailbox API will be used. A first step is to design how to interact with Hadoop (native api, gora incubator at apache,…) and deal with specific performance questions related to mail loading/parsing in a distributed system (use map/reduce or not, use existing local lucene indexes for search,…). The second step is to implement the HDFS mailbox (maildir mailbox is similar because is stores mails as a file and can be an inspiration). A single James server will still be deployed because we don’t have any distributed UID generation.

According to the last comments on the ticket, this project was completed Ioan Eugen Stan under Eric Charles’ mentorship.

Original title and link: How to Implement an IMAP Server on Top of a CouchDB/NoSQL Data Store? (NoSQL database©myNoSQL)


IBM DB2 to Include NoSQL Features

It didn’t take long for IBM to follow Oracle’s foray into the NoSQL space by announcing that IBM DB2 and Informix will include NoSQL features.

Mark Brunelli quoting Curt Cotner, IBM VP and CTO for database servers:

So, we actually took one of these NoSQL triplestores from the open source [community and] we modified it to sit on top of DB2 so that it can use DB2’s indexing, DB2’s logging, DB2’s solution for high availability [and] and all the things you would expect.

Reports are not very clear yet, but it seems that DB2 NoSQLish features are based on IBM’s Rational Jazz tripplestore solution—an approach similar to Oracle’s NoSQL Database 11G which is based on Oracle’s BerkleyDB Java Edition.

When speculating about Oracle’s future in the NoSQL market I was writing that I expect Oracle to extend the support for NoSQLish interfaces to its core database products. And it looks like IBM is taking exactly this route:

Curt Cotner: “All of the DB2 and IBM Informix customers will have access to that and it will be part of your existing stack and you won’t have to pay extra for it. We’ll put that into our database products because we think that this is [something] that people want from their application programming experience, and it makes sense to put it natively inside of DB2.”

Looking back at these events (Oracle’s NoSQL database, Oracle Big Data appliance, IBM DB2 and Informix supporting NoSQL features), makes me think if and how are these related to the new Enterprise NoSQL trend I’ve mentioned earlier.

Original title and link: IBM DB2 to Include NoSQL Features (NoSQL database©myNoSQL)

The Oracle NoSQL Database and Big Data Appliance

There’s been a lot of speculation about the announcements coming from Oracle’s OpenWorld event. A first part was revealed during the keynote in the form of an in-memory analytics appliance called Exalytics [2]. But there’s talk about a Big Data Appliance and an Oracle NoSQL database.

Here’re my predictions[1]

  1. Oracle became very aggressive in selling products based on hardware, software, and services. So they’ll announce a Hadoop appliance integrated with an existing Oracle product. It could be either the Oracle Exadata or even the newly announced Exalytics.

    This appliance will place Oracle in competition with all other Hadoop appliance sellers: EMC, NetApp, IBM. Also these days most of the analytics databases try to integrate with Hadoop.

  2. Oracle already has a couple of non-relational solutions in their portfolio: BerkleyDB, TimesTen, Coherence. And they’ve already started to test the NoSQL market by announcing the MySQL and MySQL Cluster NoSQL hybrid systems.

    I don’t expect Oracle NoSQL database to be a new product. Just a rebranding or repackaging of one of the above mentioned ones. Probably the TimesTen.

  3. Oracle will invest more into integrating its line of products with Hadoop. Having both a Hadoop and an in-memory analytics appliance will make them very competitive in this space.

  4. Oracle will extend the support for NoSQLish interfaces (memcached) to its other database products.

What are your predictions?

  1. or speculations  

  2. I’m currently gathering more details about Exalytics.  

Original title and link: The Oracle NoSQL Database and Big Data Appliance (NoSQL database©myNoSQL)