ALL COVERED TOPICS

NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter

NAVIGATE MAIN CATEGORIES

Close

nosql databases: All content tagged as nosql databases in NoSQL databases and polyglot persistence

NoSQL Everywhere? Not So Fast

So how can big companies get in on the action? Let’s contrast the nature of data suited for NoSQL with the properties of enterprise data that requires the single-source-of-truth systems that we talked about. We’ll use three V’s: volume, velocity, and variety.

Just in case you want to read an InformationWeek post with no start, no end, and no logic, but (ab)using all the necessary buzzwords.

Original title and link: NoSQL Everywhere? Not So Fast (NoSQL database©myNoSQL)

via: http://www.informationweek.com/news/software/info_management/232901328?printer_friendly=this-page


Cloud Computing Lets Us Rethink How We Use Data

But not everything we do in a database needs guaranteed transactional consistency.

Imagine you are charged with designing a system to collect data on temperature, air flow and electricity use in a building every few minutes from hundreds of locations. The system will be used to make the building more energy efficient. Now imagine you lose a few data points every day.  The cause isn’t important but it could be a glitch with a sensor, a dropped packet, or an incomplete write operation in the database.

Do you care?

It depends from what angle I’m looking at this question. If I’m the producer of the sensor, I do care if it has a glitch. If I’m a network administrator I do care there are dropped packets. And if I am a database system I do care if I’m dropping write operations. And I also have to tell whoever is using me if I am able to receive operations—am I available when I’m needed?

Original title and link: Cloud Computing Lets Us Rethink How We Use Data (NoSQL database©myNoSQL)

via: http://www.tomsitpro.com/articles/mapreduce-hadoop-cloud_computing-acid-relational_database,1-165.html


My Humble Request to the NoSQL Techies

C. Mohan in his 4th post about the NoSQL space:

So, here is my humble request to the NoSQL techies: For each of your systems, please send me or point me to detailed technical information on each of the important aspects of your system. This should be documentation in the form of papers or presentations, and not pointers to source code comments and such! If some significant aspects of a system aren’t documented reasonably, I am urging the appropriate people to produce such documentation. Of course, for legal reasons, you should NOT send me any confidential or proprietary information.

Here is my offer in return for the above: Once I get hold of such documentation, I am willing to maintain a page for each significant NoSQL system where I will consolidate all the information on that system. Once I get hold of all that information, I will be able to do the comparisons between systems and make suggestions for improvements, etc. for each of the systems. I am planning a tutorial on NoSQL systems and it would be in the best interest of the techies of the different systems to get their systems featured in such a tutorial by providing accurate and complete information on their systems.

In the over 2 and 1/2 years since writing on this NoSQL blog I’ve seen numerous similar attempts. So far the closest to what one would call success are Stefan Edlich’s nosql-databases.org unstructured but very wide attempt to catalogue NoSQL databases and this blog which is continuously covering various aspects of NoSQL databases. My attempt to create a 5-dimensional characterization of NoSQL databases remains incomplete after 1 and 1/2 years since its debut. But I really hope Mohan will pull this out as everyone would benefit from having better information organized in an accessible public format.

These aside, I think his post brings up a couple of interesting remarks that I’d like to comment on:

  1. The origin of most of the NoSQL databases is not in research labs or academic world, but rather out there in the field. Most of them have been created by people that have run into problems and attempting to solve them led to trying out different approaches.
  2. Most of the NoSQL databases are either open source community driven or backed by small startups. Some of these startups do benefit of funding, but oftentimes that represents a fraction of what other trendy sectors are getting. As an example, Cloudera has raised $76mil in its 3 1/2 years of existence. Compare that with Color’s $40mil.
  3. Most of these systems are created and follow a roadmap rooted in pragmatism and practicality. They are need-based systems. If you’ve worked on an open source project or in a startup you know exactly what I mean. Features are prioritized and implemented based on the current interests of the main stakeholders which is basically the product current users.

These being said, one should note that:

  1. Most of the open source NoSQL database have excellent documentation (at least based on open source projects’ standard). Just take a look at Apache HBase Reference Guide or Redis’s documentation.
  2. There are many books covering NoSQL databases. While I don’t have all of the NoSQL books (or even read cover to cover all those that I have), many of them discuss these solutions in very detail1.
  3. If you’d been following this blog, you’d have noticed that developers involved with NoSQL databases spend a lot of their time documenting them in great detail.

    Let me give you just a couple of examples: Lars George’s rare but heavily technical posts (HBase and Data Locality, Hadoop and HBase: Configuring the Number of Server Side Threads (Xceivers), HBase and Bloom Filters) or Salvatore Sanfilipo’s posts about Redis (Redis Persistence Demystified, Redis Cluster Explained, Redis Guide: What Each Redis Data Type Should Be Used For, Redis diskstore and B-trees).

    Indeed these are not academic papers, but they are definitely providing an in-depth perspective of the nuts and bolts of NoSQL databases. And such materials are not coming only from the people developing NoSQL databases, but also from those running them in production.

    To date, I’ve published almost 3000 posts on this blog and besides my own contributions, a large number of these posts link to articles diving into the details of the various forms of NoSQL solutions.

  4. Even if most of the developers working on NoSQL solutions are busy implementing and running them in production, sometimes they even find the time to publish academic papers and participate at related events.

    I wish I could, but I don’t think I’ve even captured a small fraction of what these guys have published: LinkedIn NoSQL Paper: Serving Large-Scale Batch Computed Data With Project Voldemort, Paper: Apache Hadoop Goes Realtime at Facebook, Riak Bitcask Explained.

  5. Many companies backing NoSQL solutions spend a tremendous amount of time and effort to continuously improve the documentation available. Take a look at DataStax’s documentation for Cassandra, Basho’s documentation for Riak, 10gen’s MongoDB documentation, and I could go on and on for a while.

  6. Last, but not least, check the job boards of these companies: almost each of them is looking for technical writers and evangelists. Obviously that’s because they want to bring more clarity to their products and make things easier for their users.

Bottom line, I think that the NoSQL space is doing quite well in documenting their technical decisions, trade-offs, recommended use cases. I’d actually say that most of the time it’s easier for me to get details about almost any NoSQL database then to figure out some details of a traditional database vendor solution—try to learn how IBM DB2 is implementing compression, or how Teradata is doing hybrid row and column storage. But maybe all this is because I’ve spent so much time in this space.

Anyways, I applaud and wish C. Mohan’s initiative will be successful. And because it is always my intention to help the NoSQL community, I’m ready to offer him both my help and support.


  1. Sometimes I wish I’d get a copy of every NoSQL book published. 

Original title and link: My Humble Request to the NoSQL Techies (NoSQL database©myNoSQL)


The Hidden Cost of Scaling With NoSQL... Spreading FUD

All false:

While NoSQL databases are not all alike, there are certain tradeoffs common to them all:

  • Data integrity
  • Flexible indexing
  • Interactive updating of data
  • Concurrency guarantees

If I’d be at DatabaseJournal I’d click unpublish as fast as possible.

Original title and link: The Hidden Cost of Scaling With NoSQL… Spreading FUD (NoSQL database©myNoSQL)

via: http://www.databasejournal.com/sqletc/hidden-cost-of-scaling-with-nosql.html


The NoSQL Hoopla … What Is NonsenSQL About It?

Dr. C. Mohan’s first post about NoSQL databases:

Having worked in the database field for more than 3 decades with a fair amount of impact on the research and commercial sides of this field (see bit.ly/cmohan), it pains me to see the casual way in which some designs have been done and some supposedly new ideas get proposed/implemented. Not enough efforts are being made to relate these proposals to what has been done in the past and benefit from the lessons learnt in the context of RDBMSs. Not everything needs to be done differently just because it is supposedly a very different world now! 

There are evolutionary and revolutionary products. And sometimes changing the perspective and starting from scratch is needed to validate or invalidate new or old time hypothesis. In the world of polyglot persistence there’s space for every solution that solves real problems. As perfect as one product could be it will not be able to address all the needs. The data storage space is not a zero-sum game. Winners don’t take it all.

Original title and link: The NoSQL Hoopla … What Is NonsenSQL About It? (NoSQL database©myNoSQL)

via: http://cmohan.tumblr.com/post/20141910210/the-nosql-hoopla-what-is-nonsensql-about-it


NoSQL Databases Adoption in Numbers

Source of data is Jaspersoft NoSQL connectors downloads. RedMonk published a graphic and an analysis and Klint Finley followed up with job trends:

NoSQL databases adoption

Couple of things I don’t see mentioned in the RedMonk post:

  1. if and how data has been normalized based on each connector availability

    According to the post data has been collected between Jan.2011-Mar.2012 and I think that not all connectors have been available since the beginning of the period.

  2. if and how marketing pushes for each connectors have been weighed in

    Announcing the Hadoop connector at an event with 2000 attendees or the MongoDB connector at an event with 800 attendeed could definitely influence the results (nb: keep in mind that the largest number is less than 7000, thus 200-500 downloads triggered by such an event have a significant impact)

  3. Redis and VoltDB are mostly OLTP only databases

Original title and link: NoSQL Databases Adoption in Numbers (NoSQL database©myNoSQL)


Data Encryption for Hadoop and NoSQL Databases From Gazzang

The Gazzang Encryption Platform for Big Data works as a last line of defense for protecting data within Hadoop, Cassandra and MongoDB, non-relational, distributed and horizontally scalable data stores that have become common management tools for big data initiatives.

Sounds good so far. But then:

Gazzang today launched a cloud-based encryption […] The Encryption Platform transparently encrypts and secures data “on the fly,” whether in the cloud or on premises, ensuring there is minimal performance lag in the encryption or decryption process.

Anyone having any idea how a cloud-based solution could encrypt/decrypt on premises data on the fly? I don’t.

Original title and link: Data Encryption for Hadoop and NoSQL Databases From Gazzang (NoSQL database©myNoSQL)

via: http://blog.gazzang.com/company-0/news--amp-press/Gazzang-Launches-Big-Data-Encryption-and-Key-Management-Platform/


A 771 Words Description of Map Reduce

Can a skyscraper completed in 1931 be used to explain a parallel processing algorithm introduced in 2004? In this post, I use the anology of counting smartphones in the Empire State Building to explain MapReduce…without using code.

Andrew Brust’s metaphor is nice, but I wonder if these days there’s a single person coming even close to data that needs a 771 words description of how Map Reduce works.

Original title and link: A 771 Words Description of Map Reduce (NoSQL database©myNoSQL)

via: http://www.zdnet.com/blog/big-data/the-mapreduce-101-story-in-102-stories/190


6 Reasons Why We Need NoSQL

  1. We’are dealing with much more data.
  2. We require sub-second responses to queries
  3. We want applications to be up 24/7
  4. We’re seeing many applications in which the database has to soak up data as fast (or even much faster) than it processes queries
  5. We’re frequently dealing with changing data or with unstructured data
  6. We’re willing to sacrifice our sacred cows.

Not bad. But it reads more like the definition of Big Data.

Original title and link: 6 Reasons Why We Need NoSQL (NoSQL database©myNoSQL)

via: http://bigdatadiary.com/6-reasons-why-we-need-nosql/


The Generalization of "NoSQL"

Based on this information (nb: the post is a short version of not all NoSQL databases are the same) I think the term “NoSQL” is doing all of the non-relational database options a disservice. The term “NoSQL” does help to argue with management that maybe a relational database is not the best option but that’s about where it’s usefulness ends.

I haven’t kept count of how many times I’ve heard this argument and its alternative “NoSQL is a (very) bad term”. What these seem to forget is that united under the NoSQL monicker the non-relational databases coped easier with all the attacks from detractors and brought them the deserved attention. Maybe it is a too wide term or even a meaningless one, but it served well in bringing awareness to polyglot persistence

Original title and link: The Generalization of “NoSQL” (NoSQL database©myNoSQL)

via: http://kellabyte.com/2012/02/12/the-generalization-of-nosql/


Examples of Using MySQL in Interesting Ways

Maggie Nelson:

Here are a couple of examples of using MySQL in interesting (and it’s up to you whether unwise) ways:

  • MySQL as a graph database, like Twitter’s FlockDB.
  • MySQL as document store, like FriendFeed’s extremely custom schema design.
  • MySQL as a key/value store. This lets you play with NoSQL concepts using MySQL.

Such examples abound. In fact most of the companies known for their contributions or using NoSQL databases run some sort of interesting relational database deployment. Most of the time these examples are interpreted as clear proof that relational databases can solve any problem. Reality is different though: the engineers’ long time familiarity with relational databases allowed them to ingeniously overpass their limitations when lacking better alternatives. But with NoSQL databases getting more mature every day, less and less problems require acrobatic usages of relational databases.

Original title and link: Examples of Using MySQL in Interesting Ways (NoSQL database©myNoSQL)

via: http://phpadvent.org/2011/out-with-the-old-by-maggie-nelson