NoSQL theory: All content tagged as NoSQL theory in NoSQL databases and polyglot persistence
Paige Roberts writes in a post about integrating predictive analytics with Hadoop:
Unstructured is really a misnomer. I think it was Curt Monash who coined the term polystructured. That makes a lot more sense, since if data was truly without structure, even humans wouldn’t be able to make sense of it. In every seemingly unstructured dataset, there is some form of structure. An email has structure. A web page has structure. A Twitter stream has structure. Facebook interactions have structure. Machine generated log files have structure. But none of those structures are remotely alike. Nor are they remotely similar to the structure of a standard transactional record.
I don’t think there are many that are thinking of unstructured data as data with completely random structure. My understanding of the term unstructured refers to three dimensions:
- variability: data representing the same entities can take different forms and contain different details. The simplest example I could think of is the information about a video shared on two different platforms.
- multi-purpose: data is not representing a single entity, but rather a set of related entities in an aggregated or compo
- data closer to natural language than mathematical structure: take for example some normal English text—according to the grammar rules it has structure, but it’s not easily understandable by machines (nb: maybe machine descriptiveness would be a better way to name this dimension)
Original title and link: Unstructured Data: What Is It? ( ©myNoSQL)
In an interview for the DataStax blog, Philippe Modard, engineer and CTO at V2i:
The big difference over relational databases is the data model. Once we understood how things needed to be modeled and defined, everything else was a piece of cake.
Indeed NoSQL new data models are the first obstacles developers encounter when considering a NoSQL database. Some could think it’s about using new APIs, lacking a query language like SQL, or having to use a different one. But I don’t think these are the real causes.
The first time I’ve experienced the unfamiliarity of a new data model was back in 2005 when I’ve started using Jackrabbit JCR implementation (a hierarchical model). Then a couple years later I’ve had the same feeling when first using the Google App Engine data store.
It wasn’t about the new APIs though. And it wasn’t about the query languages either. For me it was about rethinking how I store and access data. It was striking to realize how used I was to think in terms of a relational model. Even if not everything I’ve implemented before was purely relational.
Looking at the various NoSQL databases around, you could see how those that started with a data model that felt closer to the relational model have seen faster adoption. And I don’t think the main reason behind it is better data models per se, but just familiarity.
Original title and link: NoSQL Data Models and Adoption ( ©myNoSQL)
In a recent interview for AllThingsD, Mike Rhodin, the senior vice president of IBM’s Software Solutions Group gave a very realistic description of what the future of data looks like:
[…] it comes out of the digitization of the physical world, the instrumentation of physical processes that’s going to generate huge amounts of new data, which is going to drive issues around storage, and what to do with all the data, how to analyze it. That pushes you toward real-time analytics and streaming technologies, because with real time, you don’t have to save the data — you want to look for anomalies as they occur.
This is indeed the grand picture of Big Data.
Now think for a second how many companies have such systems in place. Not many. Think now how many companies can offer as-complete-as-possible integrated systems to address these challenges. Very few.
These two answers are revealing an interesting perspective about the future of the Big Data market.
On one side we have vendors building top notch solutions—consider the new features in the relational databases, NoSQL databases, Hadoop, etc. By looking at this space you’ll have to agree that all these are excellent solutions for tackling a sub-space of the overall problem. They are getting closer and closer to offering local optimum solutions.
On other side there are the system integrators and platform vendors. Their systems may not be the best in solving every aspect of a problem, but their focus is in addressing and solving the complete problem. Their sales pitch is integration and/or specialization.
As someone writing about polyglot persistence and the 1001 NoSQL, NewSQL, and the development of the relational databases, I could be tempted to think that every company would have the budget, the know-how, and the time to take top-notch sub-systems and create solutions crafted to their problem. But looking back in time and also applying the lessons from other markets, I think it is safe to say that integrated solutions are preferred.
The lesson to be learned by both NoSQL and relational database vendors, actually by all (sub)system vendors that are playing in the Big Data market is to design products with openness and integration in mind. Very few, if any, sub-systems will be part of the grand solution if they are architected as silos. They can continue to provide the ultimate local optimum solutions, but as long as they are not architected to be part of a collaborative integrated platform they’ll lose important segments of the market. Many products I’m writing about are already following this principle, many are making steps towards being friendlier in terms of integration, and many are still taking the silver bullet approach.
Original title and link: The Grand Picture of Big Data and the Impact on the Architecture of Systems ( ©myNoSQL)