NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



nosql future: All content tagged as nosql future in NoSQL databases and polyglot persistence

Is UnQL Dead?

UNQL started with quite some hype last year. However, after some burst of activity the project came to a hold. So it seems, that – at least as a project – UNQL has been a failure. IMHO one of the major issues with the current UNQL is, that it tries to cover everything in NoSQL, from key-value stores to document-stores to graph-database. Basically you end up with greatest common divisor – namely key-value access.

I’ve never posted about UnQL because I really couldn’t see its future.

Original title and link: Is UnQL Dead? (NoSQL database©myNoSQL)


The Three Pillars of Data-Based Computing: SQL, Hadoop And

IBM’s Arvind Krishna in an interview for The Register:

Krishna said he sees the potential for three pillars of data-based computing: SQL – to give a language and syntax for programming; Hadoop – to provide a MapReduce semantic; and a third pillar which is yet to be decided upon. That could be a MongoDB or HBase, but the market will pick a winner. “There’s a whole set: one will survive,” Krishna said.

I’m pretty sure that last part (i.e. “that could be MongoDB or HBase”) is a mis-quote as the rest of what Krishna is saying makes a lot of sense:

“Wherever open source is mature I will leverage it; I won’t compete with it. To believe one can be monolithic, proprietary and closed and … succeed is a foolish proposition. One has to embrace open source and work with an ecosystem. Clients are looking to you to add value.”

Original title and link: The Three Pillars of Data-Based Computing: SQL, Hadoop And (NoSQL database©myNoSQL)


My Humble Request to the NoSQL Techies

C. Mohan in his 4th post about the NoSQL space:

So, here is my humble request to the NoSQL techies: For each of your systems, please send me or point me to detailed technical information on each of the important aspects of your system. This should be documentation in the form of papers or presentations, and not pointers to source code comments and such! If some significant aspects of a system aren’t documented reasonably, I am urging the appropriate people to produce such documentation. Of course, for legal reasons, you should NOT send me any confidential or proprietary information.

Here is my offer in return for the above: Once I get hold of such documentation, I am willing to maintain a page for each significant NoSQL system where I will consolidate all the information on that system. Once I get hold of all that information, I will be able to do the comparisons between systems and make suggestions for improvements, etc. for each of the systems. I am planning a tutorial on NoSQL systems and it would be in the best interest of the techies of the different systems to get their systems featured in such a tutorial by providing accurate and complete information on their systems.

In the over 2 and 1/2 years since writing on this NoSQL blog I’ve seen numerous similar attempts. So far the closest to what one would call success are Stefan Edlich’s unstructured but very wide attempt to catalogue NoSQL databases and this blog which is continuously covering various aspects of NoSQL databases. My attempt to create a 5-dimensional characterization of NoSQL databases remains incomplete after 1 and 1/2 years since its debut. But I really hope Mohan will pull this out as everyone would benefit from having better information organized in an accessible public format.

These aside, I think his post brings up a couple of interesting remarks that I’d like to comment on:

  1. The origin of most of the NoSQL databases is not in research labs or academic world, but rather out there in the field. Most of them have been created by people that have run into problems and attempting to solve them led to trying out different approaches.
  2. Most of the NoSQL databases are either open source community driven or backed by small startups. Some of these startups do benefit of funding, but oftentimes that represents a fraction of what other trendy sectors are getting. As an example, Cloudera has raised $76mil in its 3 1/2 years of existence. Compare that with Color’s $40mil.
  3. Most of these systems are created and follow a roadmap rooted in pragmatism and practicality. They are need-based systems. If you’ve worked on an open source project or in a startup you know exactly what I mean. Features are prioritized and implemented based on the current interests of the main stakeholders which is basically the product current users.

These being said, one should note that:

  1. Most of the open source NoSQL database have excellent documentation (at least based on open source projects’ standard). Just take a look at Apache HBase Reference Guide or Redis’s documentation.
  2. There are many books covering NoSQL databases. While I don’t have all of the NoSQL books (or even read cover to cover all those that I have), many of them discuss these solutions in very detail1.
  3. If you’d been following this blog, you’d have noticed that developers involved with NoSQL databases spend a lot of their time documenting them in great detail.

    Let me give you just a couple of examples: Lars George’s rare but heavily technical posts (HBase and Data Locality, Hadoop and HBase: Configuring the Number of Server Side Threads (Xceivers), HBase and Bloom Filters) or Salvatore Sanfilipo’s posts about Redis (Redis Persistence Demystified, Redis Cluster Explained, Redis Guide: What Each Redis Data Type Should Be Used For, Redis diskstore and B-trees).

    Indeed these are not academic papers, but they are definitely providing an in-depth perspective of the nuts and bolts of NoSQL databases. And such materials are not coming only from the people developing NoSQL databases, but also from those running them in production.

    To date, I’ve published almost 3000 posts on this blog and besides my own contributions, a large number of these posts link to articles diving into the details of the various forms of NoSQL solutions.

  4. Even if most of the developers working on NoSQL solutions are busy implementing and running them in production, sometimes they even find the time to publish academic papers and participate at related events.

    I wish I could, but I don’t think I’ve even captured a small fraction of what these guys have published: LinkedIn NoSQL Paper: Serving Large-Scale Batch Computed Data With Project Voldemort, Paper: Apache Hadoop Goes Realtime at Facebook, Riak Bitcask Explained.

  5. Many companies backing NoSQL solutions spend a tremendous amount of time and effort to continuously improve the documentation available. Take a look at DataStax’s documentation for Cassandra, Basho’s documentation for Riak, 10gen’s MongoDB documentation, and I could go on and on for a while.

  6. Last, but not least, check the job boards of these companies: almost each of them is looking for technical writers and evangelists. Obviously that’s because they want to bring more clarity to their products and make things easier for their users.

Bottom line, I think that the NoSQL space is doing quite well in documenting their technical decisions, trade-offs, recommended use cases. I’d actually say that most of the time it’s easier for me to get details about almost any NoSQL database then to figure out some details of a traditional database vendor solution—try to learn how IBM DB2 is implementing compression, or how Teradata is doing hybrid row and column storage. But maybe all this is because I’ve spent so much time in this space.

Anyways, I applaud and wish C. Mohan’s initiative will be successful. And because it is always my intention to help the NoSQL community, I’m ready to offer him both my help and support.

  1. Sometimes I wish I’d get a copy of every NoSQL book published. 

Original title and link: My Humble Request to the NoSQL Techies (NoSQL database©myNoSQL)

5 Business Analytics Tech Trends

In interviews, CIOs consistently identified five IT trends that are having an impact on how they deliver analytics: the rise of Big Data, technologies for faster processing, declining costs for IT commodities, proliferating mobile devices and social media.

Same old, same old.

Original title and link: 5 Business Analytics Tech Trends (NoSQL database©myNoSQL)


The HBase Roadmap: Where Do We Want HBase to Be in Two Years?

The HBase project management committee:

After further banter, we arrived at list: reliability, operability (insight into the running application, dynamic config. changes, usability improvements that make it easier on a clueful ops), and performance (in this order). It was offered that we are not too bad on performance — especially in 0.94 — and that use cases will drive the performance improvements so focus should be on the first two items in the list. […] To improve reliability, testing has to be better. This has been said repeatedly in the past.

EMC has announced a 1000+ nodes cluster for Apache Hadoop testing, so maybe a similar initiative is needed for HBase too. Considering how many large organizations are using HBase it shouldn’t be difficult to get these resources as long as someone will assume ownership and leadership for it.

Original title and link: The HBase Roadmap: Where Do We Want HBase to Be in Two Years? (NoSQL database©myNoSQL)


The NoSQL Hoopla … What Is NonsenSQL About It?

Dr. C. Mohan’s first post about NoSQL databases:

Having worked in the database field for more than 3 decades with a fair amount of impact on the research and commercial sides of this field (see, it pains me to see the casual way in which some designs have been done and some supposedly new ideas get proposed/implemented. Not enough efforts are being made to relate these proposals to what has been done in the past and benefit from the lessons learnt in the context of RDBMSs. Not everything needs to be done differently just because it is supposedly a very different world now! 

There are evolutionary and revolutionary products. And sometimes changing the perspective and starting from scratch is needed to validate or invalidate new or old time hypothesis. In the world of polyglot persistence there’s space for every solution that solves real problems. As perfect as one product could be it will not be able to address all the needs. The data storage space is not a zero-sum game. Winners don’t take it all.

Original title and link: The NoSQL Hoopla … What Is NonsenSQL About It? (NoSQL database©myNoSQL)


No Unified Stack Soon for Big Data... Is That a Surprise?

The panelists agreed that a standardized stack of big data analysis software would make it easier to develop large scale data analysis systems, in much the same way the open source LAMP stack engendered a whole generation of Web 2.0 services over the past decade. But the ways software such as Hadoop can be used vary so much that it may be difficult to settle on one core package of technologies, the panelists said.

We’ll never be able to stop people dreaming of cheap, off-the-shelf software that solves all world problems including hunger, political debates, football bets, etc. Myself? I still dream of a teleportation kit.

What panelists seem not to acknowledge (except Mark Baker, Canonical Ubuntu) is that LAMP served well the most common problems. But everything that went beyond that needed heavily customized solutions. Yahoo? Check. Google? Check. Facebook? Check check check. And I could go on and on.

Big Data is not the most common problem.

Original title and link: No Unified Stack Soon for Big Data… Is That a Surprise? (NoSQL database©myNoSQL)


Big Data: The Only Business Model Left

Shomit Ghose in a guest post on Forbes:

So if maximum entropy has made hardware, software and networks as relevant and commoditized as steel or cement, what’s a budding entrepreneur to do? The answer is to focus on ventures in one of two areas: either in the monetization of data, or in providing the infrastructure to enable the monetization of data. Period.

These are exciting times for tech and data people. But I don’t think data is the only area that will show innovation and growth in the future (think education, healthcare, energy, etc.)

Original title and link: Big Data: The Only Business Model Left (NoSQL database©myNoSQL)


The Generalization of "NoSQL"

Based on this information (nb: the post is a short version of not all NoSQL databases are the same) I think the term “NoSQL” is doing all of the non-relational database options a disservice. The term “NoSQL” does help to argue with management that maybe a relational database is not the best option but that’s about where it’s usefulness ends.

I haven’t kept count of how many times I’ve heard this argument and its alternative “NoSQL is a (very) bad term”. What these seem to forget is that united under the NoSQL monicker the non-relational databases coped easier with all the attacks from detractors and brought them the deserved attention. Maybe it is a too wide term or even a meaningless one, but it served well in bringing awareness to polyglot persistence

Original title and link: The Generalization of “NoSQL” (NoSQL database©myNoSQL)


The time for NoSQL is now

Andrew C. Oliver:

The transition to NoSQL databases will take time. We still don’t have TOAD, Crystal Reports, query language standardization and other essential tools needed for mass adoption. There will be missteps (i.e. I may need a different type of database for reporting than for my operational system), but I truly think this is one technology that isn’t just marketing.

This coming from someone that was happy to discover back in 1998 all the knobs in Oracle.

Original title and link: The time for NoSQL is now (NoSQL database©myNoSQL)


Doug Cutting About Hadoop's Adoption

Doug Cutting expressing his suprise with Hadoop’s growth in an interview with Audrey Watters over O’Reillly Radar:

Yes. I didn’t expect Hadoop to become such a central component of data processing. I recognized that Google’s techniques would be useful to other search engines and that open source was the best way to spread these techniques. But I did not realize how many other folks had big data problems nor how many of these Hadoop applied to.

Hadoop is not Doug Cutting’s first widely successful open source project, so I’m tempted to think this is just pure modesty.

Original title and link: Doug Cutting About Hadoop’s Adoption (NoSQL database©myNoSQL)