nosql theory: All content tagged as nosql theory in NoSQL databases and polyglot persistence
Thursday, 5 April 2012
Cloud Computing Lets Us Rethink How We Use Data
But not everything we do in a database needs guaranteed transactional consistency.
Imagine you are charged with designing a system to collect data on temperature, air flow and electricity use in a building every few minutes from hundreds of locations. The system will be used to make the building more energy efficient. Now imagine you lose a few data points every day. The cause isn’t important but it could be a glitch with a sensor, a dropped packet, or an incomplete write operation in the database.
Do you care?
It depends from what angle I’m looking at this question. If I’m the producer of the sensor, I do care if it has a glitch. If I’m a network administrator I do care there are dropped packets. And if I am a database system I do care if I’m dropping write operations. And I also have to tell whoever is using me if I am able to receive operations—am I available when I’m needed?
Original title and link: Cloud Computing Lets Us Rethink How We Use Data (©myNoSQL)
Friday, 30 March 2012
Design Your Database Schema
Three paterns of making a relational database behave like a document database. Useful in the times there were no document databases around.
If we were to use a relational database we might end up with a single table with an ungodly amount of columns so that each event has all its specific columns available. But we will never use all columns for one event of course. Maybe try to re-use columns, and call them names like column1, column2 etc. Hmm… sounds like fun to maintain and develop against.
The other pattern would be to start creating a normalized schema with multiple tables – probably one per game, and one per even type etc. So then we end up with a complex schema that needs to be maintained and versioned. Inserts and selects will be spread across tables and for sure we need to change the schema when new games or events are introduced.
There is also a third pattern out there which is to store a binary blob in the database… lets not even talk about that one.
Original title and link: Design Your Database Schema (©myNoSQL)
Wednesday, 21 March 2012
IBM: Behind the Buzz About NoSQL
Mature database management systems like DB2 also offer advantages like high availability and data compression that the newer NoSQL systems have not had time to develop.
Misinform your customers to save them the trouble of discovering alternative solutions.
Original title and link: IBM: Behind the Buzz About NoSQL (©myNoSQL)
via: http://ibmdatamag.com/2012/03/behind-the-buzz-about-nosql/
Monday, 19 March 2012
Visualizing System Latency
Besides the many practical lessons emphasized in Jack Clark’s interview with Adrian Cockcroft on ZDNet—luckly I’ve had the chance to see some of Cockcroft’s presentations about Netflix architecture and also talk to him directly—one thing that sticked with me was the ending paragraph:
The thing I’ve been publicly asking for has been better IO in the cloud. Obviously I want SSDs in there. We’ve been asking cloud vendors to do that for a while. With Cassandra, we’ve had to go onto horizontal scale and use the internal disks and triple replicate across availability zones, so you end up with a triple-redundant data store that is careful not to overload the disks.
That reminded me of this old ACM article authored by Brendan Gregg:
When I/O latency is presented as a visual heat map, some intriguing and beautiful patterns can emerge. These patterns provide insight into how a system is actually performing and what kinds of latency end-user applications experience. Many characteristics seen in these patterns are still not understood, but so far their analysis is revealing systemic behaviors that were previously unknown.

I was wondering if in the NoSQL databases space (and data storage space in general) are there any of the monitoring tools that provide such advanced visualization of latency data. Do you know any?
Original title and link: Visualizing System Latency (©myNoSQL)
Thursday, 15 March 2012
Hadoop Terms and Components Index Card
A quick description of the most important terms and components of Hadoop—HDFS, NameNode, DataNode, MapReduce, JobTracker, TaskTracker—and its high level design principles:
- The system must properly distribute data across a system evenly and safely.
- The system must support partial failure of a node in the system. This means, if a node goes down, the operations within the cluster continue without change in the final outcome.
- If there is a failure, the system should be able to recover the data through the existence of backup (later referred to as replicated blocks).
- When a node is brought back online, it should be able to rejoin the system immediately
- The system shall maintain linear scalability, meaning addition of resources will increase performance linearly, just as removal of resources would decrease performance linearly.
Original title and link: Hadoop Terms and Components Index Card (©myNoSQL)
via: http://mycloudresearch.wordpress.com/2012/03/14/simple-hadoop-overview/
Tuesday, 13 March 2012
Horizontal Scalability vs Elasticity
Abel Perez in a post about Cassandra:
Horizontal scalability boils down to the ability to add new hardware to a system without any interruption or downtime. An ideal horizontally scalable system does not require reconfiguration and supports incremental addition of hardware.
Nope. This is the definition of elasticity.
Horizontal scalability is the capability of a system to accept adding or removing multiple nodes (independent units of resources) and making them work as a single system. The scalability of a system can be further categorized as: negative, sub-linear, linear, or supra-linear depending on the shape of the performace1/nodes curve
-
This is the part where things can get more complicated as there are multiple ways to characterize the performance of a system (e.g. throughput, latency, etc.) ↩
Original title and link: Horizontal Scalability vs Elasticity (©myNoSQL)
A 771 Words Description of Map Reduce
Can a skyscraper completed in 1931 be used to explain a parallel processing algorithm introduced in 2004? In this post, I use the anology of counting smartphones in the Empire State Building to explain MapReduce…without using code.
Andrew Brust’s metaphor is nice, but I wonder if these days there’s a single person coming even close to data that needs a 771 words description of how Map Reduce works.
Original title and link: A 771 Words Description of Map Reduce (©myNoSQL)
via: http://www.zdnet.com/blog/big-data/the-mapreduce-101-story-in-102-stories/190
6 Reasons Why We Need NoSQL
- We’are dealing with much more data.
- We require sub-second responses to queries
- We want applications to be up 24/7
- We’re seeing many applications in which the database has to soak up data as fast (or even much faster) than it processes queries
- We’re frequently dealing with changing data or with unstructured data
- We’re willing to sacrifice our sacred cows.
Not bad. But it reads more like the definition of Big Data.
Original title and link: 6 Reasons Why We Need NoSQL (©myNoSQL)
With Concatenative Programming, a Parallel Compiler Is a Plain Old Map-Reduce
I’m still digesting Jon Purdy’s post:
A compiler for a statically typed concatenative language could literally:
- Divide the program into arbitrary segments
- Compile every segment in parallel
- Compose all the segments at the end
This is impossible to do with any other type of language. With concatenative programming, a parallel compiler is a plain old map-reduce!
Original title and link: With Concatenative Programming, a Parallel Compiler Is a Plain Old Map-Reduce (©myNoSQL)
via: http://evincarofautumn.blogspot.com.au/2012/02/why-concatenative-programming-matters.html
Monday, 12 March 2012
Threat to NoSQL Database?
This question was posted on LinkedIn:
During my research I came across a new database technology called ‘NuoDB’. It seems to share many attributes of NoSQL databases and still maintain a SQL query interface. It uses a key value store as its data storage engine. It also promises the performance and scale of NoSQL databases.
Enterprises that have not yet embraced NoSQL, will be inclined to try this option before going NoSQL way in my opinion. Mainly because it does not require them to change their database interface layer drastically and also because NoSQL databases have not moved towards a standards based query interface yet.
For an outsider this comment might look extremely valid. I mean who in his right mind would give away all the expertise and tools and history of SQL for something like NoSQL?
But the real answer is in the details. “It seems to share many attributes of NoSQL databases” . Ask yourself what are these shared attributes:
- what is the supported data model? The relational model advantages have been discussed over and over for the last 30 years. But there are alternative data models that bring different
- what is the persistence model? Is it disk based, memory based, cluster based? IS it durable?
- what is the distribution model? Is it master-slave or master-master or peer-to-peer or masterless?
- what are the scalability characteristics of the system?
- what are the elasticity characteristics of the system?
In the only comment worth reading, Stefan Edlich correctly points to the tons of NewSQL solutions. Before asking if these systems “pose a threat” to NoSQL databases, I’d firstly ask if they are at least a threat to the existing relational databases first. And the answer is no.
Sid Anand wrote in the State of NoSQL 2012 post:
Many of the NoSQL vendors view the “battle of NoSQL” to be akin to the RDBMS battle of the 80s, a winner-take-all battle. In the NoSQL world, it is by no means a winner-take-all battle. Distributed Systems are about compromises.
I’d go even further and say that data storage is not anymore a winner-takes-all battle. Actually it’s not even a zero-sum game. We are living the polyglot persistence age.
Original title and link: Threat to NoSQL Database? (©myNoSQL)
Monday, 27 February 2012
Taking a Step Back From ORMs and a Parallel to the Database World
Jeff Davis:
So, my proposal is this: take a step back from ORMs, and consider working more closely with SQL and a good database driver. Try to work with the database, and find out what it has to offer; don’t use layers of indirection to avoid knowing about the database. See what you like and don’t like about the process after an honest assessment, and whether ORMs are a real improvement or a distracting complication.
I know a lot of applications using ORMs that worked perfectly fine. And I know applications that had to go around the ORMs or even got rid completely of them.
Here is a parallel to think about: ORM vs SQL is similar to always using a relational database versus using the storage solution that better fits the problem—as in using a NoSQL database or going polyglot persistence. An ORM comes with the advantage of keeping you inside a single paradigm (object oriented) at the cost of not being able to (easily) use the full power of the underlying storage.
Original title and link: Taking a Step Back From ORMs and a Parallel to the Database World (©myNoSQL)
via: http://thoughts.davisjeff.com/2012/02/26/taking-a-step-back-from-orms/
Wednesday, 1 February 2012
5 Requirements for Enterprise NoSQL databases
Emil Eifrem enumerates 5 requirements for adopting NoSQL databases in the enterprise environment:
- Ability to Handle Today’s Complex and Connected Data
- Simplify the Development of Applications Using Complex and Connected Data
- Support for End-to-End Transactions
- Enterprise-grade Durability so that Data is Never Lost
- Java Still Reigns for Enterprise Development
I think Emil Eifrem has left out a couple of other critical aspects, but I agree with 4 and 1/2 of those on his list.
Original title and link: 5 Requirements for Enterprise NoSQL databases (©myNoSQL)
Most Popular Articles
- Translate SQL to MongoDB MapReduce
- Tutorial: Getting Started With Cassandra
- CouchDB vs MongoDB: An attempt for a More Informed Comparison
- Cassandra @ Twitter: An Interview with Ryan King
- A Couple of Nice GUI Tools for MongoDB
- NoSQL benchmarks and performance evaluations
- Ehcache: Distributed Cache or NoSQL Store?
- Document Databases Compared: CouchDB, MongoDB, RavenDB
- Quick Review of Existing Graph Databases
- NoSQL Data Modeling