NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



Big Data: All content tagged as Big Data in NoSQL databases and polyglot persistence

The path of disruption: The dirty truth about big data and NoSQL

Andrew C. Oliver:

The dirty secret is that big data and NoSQL vendors aren’t just targeting gigantic, consumer-facing companies like Facebook or Google. The technology applies much more broadly, and as the supply of high-concurreny, low-cost, flexible data storage increases, so will demand. If you can hoard all that data cheaply, why not mine it cheaply as well and compete with the big names?

It’s called the path of disruption.

The best and shortest explanation can be found in Ben Thompson’s “Chromebooks and the Cost of Complexity“:

The key thing to notice is that products improve more rapidly than consumer needs expand. This means that while the incumbent product may have once been subpar, over time it becomes “too good” for most customers, offering features they don’t need yet charging for them anyways. Meanwhile, the new entrant has an inferior product, but at a much lower price, and as its product improves — again, more rapidly than consumer needs — it begins to peel away customers from the incumbent by virtue of its lower price. Eventually it becomes good enough for nearly all of the consumers, leaving the incumbent high and dry.

the path of disruption

Original title and link: The path of disruption: The dirty truth about big data and NoSQL (NoSQL database©myNoSQL)


Why More Data and Simple Algorithms Beat Complex Analytics Models

Garrett Wu:

In a nutshell, having more data allows the “data to speak for itself,” instead of relying on unproven assumptions and weak correlations.

How many reasons can you come up with proving the opposite? Are there use cases where complex analytics models beat “more data and more algorithms”?

Original title and link: Why More Data and Simple Algorithms Beat Complex Analytics Models (NoSQL database©myNoSQL)


Typical Big Data Architecture

A bit of an old post by Venu Anuganti putting in the same diagram different components involved in a Big Data architecture.

Any data architecture loosely consists of four major logical components:

Typical Big Data Architecture

I don’t think there’s a blueprint for big data architectures. But such a diagram can give you a could idea of the possible components involved. Then to make things simple for engineers, you start adding requirements, constraints, and SLAs at each level. Once you have some sort of idea of how things will look, you start building it and discover that some of the components you are planning to use don’t work well together or there’s no way to achieve those SLAs. All in all, it’s a fun job.

Original title and link: Typical Big Data Architecture (NoSQL database©myNoSQL)


Quick and Dirty (Incomplete) List of Interesting, Mostly Recent Data Warehousing and Big Data Papers by Peter Bailis

Peter Bailis:

A friend asked me for a few pointers to interesting, mostly recent papers on data warehousing and “big data” database systems, with an eye towards real-world deployments. I figured I’d share the list. While it’s biased and rather incomplete but maybe of interest to someone. While many are obvious choices (I’ve omitted several, like MapReduce), I think there are a few underappreciated gems.

Original title and link: Quick and Dirty (Incomplete) List of Interesting, Mostly Recent Data Warehousing and Big Data Papers by Peter Bailis (NoSQL database©myNoSQL)

NoSQL and Big Data Money News


  1. Cloudant has received an undisclosed investment from Samsun Ventures

  2. Think Big Analytics, a Big Data consulting company raised $3mil. from former Cisco executive Dan Scheinman and WI Harper Group

Hortonwork’s announces Certification Program for Apache Hadoop

Hortonworks’ New Certification Program Enables the Next Generation Data Architecture with Apache Hadoop:

[…]today announced the launch of the Hortonworks Certified Technology Program, designed to help customers choose leading enterprise software that has been tested to integrate with Hortonworks Data Platform (HDP), the only 100-percent open source Apache Hadoop distribution. By certifying technologies, Hortonworks is taking the risk out of the technology selection, thereby accelerating and simplifying customers’ big data projects. The Program strengthens and expands the Apache Hadoop ecosystem, while helping to increase the enterprise capabilities of Apache Hadoop.

I assume the model here is that vendors pay Hortonworks for this certification and they can use the Hortonworks stamp when talking to customers.

DataStax’s Next Great Data Developer Contest

Two scholarships up to $10,000 each for computer science students from North America enrolled in a Bachelor or Master program. Announcement here and blog post here.

Last, but not necessarily money-related:

MySQL 5.6 Released

I’m still reading about what’s new in MySQL 5.6, but what caught my eyes while skimming over the docs is support for online DDL.

Original title and link: NoSQL and Big Data Money News (NoSQL database©myNoSQL)

Big Data and the 3 vs (Volume, Variety, Velocity) From a Duality Perspective

Erik Meijer and Sadek Drobi discuss applying the theory of duality to Big Data’s 3Vs: volume, variety, velocity:

Now we have these three Vs, we know that they are all dual, but the three together give you a design space like a cube of data, so there is a cube of big data where there is eight points that you can look at and each of these eight points there are interesting databases, and time is now too short to going into the details, I have a CACM paper and an ACM Queue paper about this, but each of these points there are existing databases that fits on there.

I watched the video and also read the transcript. Twice.

Original title and link: Big Data and the 3 vs (Volume, Variety, Velocity) From a Duality Perspective (NoSQL database©myNoSQL)


Issue #1: Quo Vadis, Big Data?

A lot of people like to make predictions. I don’t. But I love filling them for later reference.

Here’s a roundup of predictions for 2013. Most of them are about the Big Data market, very few mentioning NoSQL databases. Why?

  1. It’s all so… pink
  2. Back to Planet Earth
  3. Existing solutions. Do you mean old solutions?
  4. No Hadoop?
  5. We’re going up… I mean vertical
  6. Too much Hadoop
  7. What about NoSQL databases?
  8. Show me the money

To frame the context of these predictions, let’s start with the forecast of the Big Data market from Gartner Research. According to their reports, Big Data accounted for $96 billion of global IT spending in 2012. This will rise to $120 billion in 2013 and up to $232 billion by 2016.

It’s all so… pink

Stefan Groschupf from Datameer: Big Data – Crossing the Chasm in 2013!:

We think 2013 is the year that Big Data will cross the chasm.

Datameer - Big Data will cross the chasm

Mike Gualtiere for Forrester: Big Data Predictions For 2013:

My prediction: Time magazine will name big data its 2013 person of the year.

Derrick Harris for GigaOM: What we’ll see in 2013 in data:

  • Get ready for Hadoop as you’ve never seen it before
  • The Google-Ray-Kruzeweil singularity: If Google and Kurzweil can find a way to work symbiotically as employer and employee, who knows what they’ll be able to pull off. Maybe it will be an even crazier batch of ideas with which to dazzle the public, but it might also be some legitimate progress on Google’s current batch of ideas (including those hidden away inside Google X) that have promise today but need some old-school engineering know-how.
  • Data for the people: What I’d like to see in 2013 is a combination of applications, data and devices that makes it easy for average consumers to learn about themselves in sow meaningful ways.

If it’s about what I’d like for 2013, one of the top positions would be the “Freedom of Data Act”. The non-legalese text could simply read: “If you have permission to collect and process my data, I do have permission to get it back and use it however I like”.

Going back to 2013, to prepare for the new year, Derrick Harris writes A programmer’s guide to big data: 12 tools to know—none of these were on my list though.

[…] if your job revolves around writing code rather than data flows, you might need a little help. Here are 12 tools (listed alphabetically) that aim to help. As usual with this type of list, it’s very possible I left out some good options, so please note any omissions in the comments.

Back to Planet Earth

Reading more like Planet Food, The Red Hat Storage Team writes in Red Hat Predicts Significant Trends in Scale-out Open Hybrid Cloud Storage in 2013:

  • Prediction #2 — Storage Software will Eat Storage Hardware for Lunch!
  • Prediction #3 — Open Source Storage Software will Eat Proprietary Storage Software for Dinner!
  • Prediction #5 — Big Data and Small Storage is the Perfect Recipe for Success!

Richard McDougall (VMware Application Infrastructure CTO): 2013 Predictions for Big Data:

  • Prediction #4: “Delete” will become a forbidden word
  • Prediction #3: There will be a mad dash for software-defined storage
  • Prediction #2: The default infrastructure for Big Data will change
  • Prediction #1: The focus on big data use cases will shift heavily towards real-time

That’s 2 for software-defined storage.

Nick Kolakowski for Slashdot: Hadoop, Mobile, and Other Big Data Trends in 2013:

Build Your Own Massive Data System: While other organization don’t have Facebook’s resources, they do have a need to wrangle increasing amounts of data. That could drive many of them, over the next year or so, to opt for custom-built solutions over “off the shelf” platforms.

The emphasis should be on: even if you have the talent and budget, do not create yet another clone of an existing solution.

Elliot Bentley and Chris Mayer for JAXenter: Reasons to be excited about Big Data in 2013:

  • Hadoop’s next real-time move: Hadoop has reached maturity but its main hindrance has been the inability of gleaning analysis at the speed which enterprises demand. 2013 could be the year where we see this change and a new direction for data-centric products.
  • Jumping in is easier than ever: As the Hadoop platform solidifies, it is forming the foundation for clever startups like Precog and Continuuity which are abstracting away existing barriers to entry, and we’re likely to see even more of thin within the coming year.

Indeed, engineers have always been known for jumping in heads first.

Existing solutions. Do you mean old solutions?

Jeff Bertolucci for InformationWeek: 5 Big Data Predictions For 2013:

Data warehouses will go the way of the dinosaur. Pervasive Software, a data management and analytics company, foresees gloom and doom for existing data warehouses.

“The ‘Big Data Revolution’ is exposing how technically obsolete the existing data warehousing infrastructure really is. Relational technology is not well suited for large-scale analytical workloads. Big data analytics demand a completely modern technology infrastructure, such as Hadoop and its ecosystem,” […]


If throwing out old solutions is not your thing, Maarten Ectors writes on his blog: Big Data 2013 Predictions:

If you just invested a lot of money in a Big Data solution from any of the traditional BI vendors (Teradata, IBM, Oracle, SAS, EMC, HP, etc.) then you are likely to see sub-optimal ROI in 2013.

No Hadoop?

Yves for the Talend blog: Predicts 2013: Hadoop Becomes Enterprise-Acceptable, Transitions from Experimental to Mainstream:

In 2013, no longer an experimental platform, Hadoop will become a major player in the overall IT environment.

Herb Cunitz for the Hortonworks blog Apache Hadoop: Seven Predictions for 2013:

Prediction #2: Emergence of vertically aligned Apache Hadoop “solutions”: […] As more and more companies gain success we will see patterns and solutions arise that are custom-fit for a challenge found in a particular industry. As the system integrators and consultants become more and more expert on Apache Hadoop, they will wrap solutions in packages and we will see the emergence of these vertical solutions

Prediction #6: The big data ecosystem expands. Related to number four prediction, existing application vendors will all clamor to make their products Hadoop-compatible. Led by Teradata and Microsoft and many others, application vendors are waking up to the reality that their applications must run on Hadoop. Already, it seems everyone is building a reference architectures which incorporate Hadoop and HDP to leverage all the goodness they already provide around data lifecycle management, data governance, security, etc. Meanwhile the Hadoop community is doing everything it can to foster adoption by the ISVs. In 2013, nearly everyone will be speaking big data.

We’re going up… I mean vertical

Christophe from Wibidata: Welcome to 2013!:

We believe that the cutting edge trend in 2013 will be about building Big Data Applications, which means a greated focus on real-time serving technologies such as HBase and Kiji as well as emerging real-time query engines like Impala and Apache Drill.

If you ask yourself what are Big Data Applications, Christophe has an answer:

The differentiating factor between established applications and those that use Big Data is the ability of an application to dynamically adapt based on new data. This includes the ability to rescore models as sensor data fluctuates, incorporate external factors – such as weather and social media – that become relevant and modify the next best action each time end user behavior changes. Most applications make decisions using a bevy of rules and relying on select fractions of data. Products that claim real-time decisions or contextualized results largely operate in silos, using just the data that someone thought to include when the application was first deployed, not the most relevant and important data.

Staying with the application space, Jim Kaskade for Infochimps: Intelligent Applications: The Big Data Theme for 2013:

My prediction for 2013 is that competitive advantage will translate into enterprises using sophisticated Big Data analytics to create a new breed of applications - Intelligent Applications.


Too much Hadoop

Andrew Brust for ZDNet: Big Data 2013: Industry Players’ Forecasts:

My take on where Big Data technology is going comes dow to two themes: a lessening dependency on MapReduce and a pushing down of Hadoop deeper into the enterprise software stack.

By the lessening dependency on MapReduce, I mean to say that products like Cloudera’s Impala, and Microsoft’s PolyBase, which bypass MapReduce and work directly against data stored in Hadoop’s Distributed File System (HDFS) will gain momentum. MapR’s prediction about the continued rise of SQL-based tools aligns with this, as does another prediction from Pervasive that “YARN changes the Hadoop game”.

And what do I mean by my prediction that Hadoop will be pushed deeper into the software stack? Simply that (a) Hadoop has gained such significant adoption that it has in effect become an industry standard and that (b) standards tend to become the foundation of higher-valued software tools, rather than tools in their own right.

James Kobielus (IBM Big Data Evangelist) for The Big Data Hub: Koby’s Big Data Predictions for 2013:

  • Hybrid big-data deployments will become the standard
  • Cross-scale data architectures will predominate
  • Governance will become a prime focus of maturing big-data deployments
  • Data science centers of excellence will spring up everywhere
  • Next-best-action deployments will become more cross-application

No word about Hadoop. No word about IBM products. But reading between the lines makes me feel there’s an IBM product for every bullet point.

What about NoSQL databases?

Gazzang’s predictions for 2013 contain one of the few references to NoSQL databases in their 2013 The Year Big Data Goes Big-Time:

  • A damaging big data breach will cause the market to question holes and vulnerabilities in NoSQL infrastructure.
  • Vertical line of business applications on top of big data will start to explode, with some early examples already starting to emerge in retail, financial services and oil and gas.
  • The first significant big data company acquisitions will happen, signaling a shift in focus from proof-of-concept projects to high-business-value implementations/rollouts.

Not really the best mention of NoSQL databases. Somehow in the same vein, Armel Nene writes on his post Big Data, Bigger Myths:

NoSQL is the way forward and Hadoop is the Holy Grail: This is a funny one. The NoSQL started as death to traditional RDBMS. Startups companies started to jump on the buzz wagon. There were NoSQL evangelist at every street corner, ok maybe not but you get the point. And the early adopters started to see problems in the movement. Experienced data admins from the SQL world started converting then they stopped, why?

Show me the money

John Bantleman (CEO of RainStor) for Wired: Big Data: Business or Technology Challenge?:

  • Prediction 1: Enterprise Big Data Initiatives Move out of the Sandbox and Define a Clear Set of Business and Technology Requirements
  • Prediction 2: Companies will Look to New Technology Combinations, other than Hadoop, when Managing Big Data
  • Prediction 3: Budget Limitations will Pose one of the Biggest Hurdles to Solving Big Data Challenges
  • Prediction 4: Big Data Tools Must Satisfy both Business and Technical Users
  • Prediction 5: Heavyweights, such as Oracle and IBM, will Make Acquisitions in the Big Data Market

Coming from the CEO of a company active in the Big Data market, some of these predictions could be interpreted in different ways.

No prediction list is complete without looking at IPOs and from the Big Data market, only one company made David Zielenziger’s list for International Business Times, 5 Tech IPOs For 2013 From Cloud Events To Ultrafast Chips: Cloudera. Why? The IPO of 2013.

The list of predictions could go on and on for a while. So I’ll finish here with a conversation I had on Twitter:

Kontra: If the future of ‘big data’ is Hadoop, we’re royally screwed. We’re in dark ages with regards to data, multi-DC transactions/reliability/etc.

Alex: It very much depends on what we define as “future”. IMO it’s a building block, but there’s a lot to be built on top.

Kontra: Hadoop, currently, is unusable for majority of use cases often used by the majority of big(ish) data users without huge resources.

Alex: True. But other tools in the space are unusable to the majority of companies that cannot afford multi-million single tool investments

Kontra: That’s the point: we are in the dark ages when it comes to data, with or without Hadoop. It’s painful.

Alex: well, I think and hope that we are in the early renaissance days.


  1. Big Data — Crossing the Chasm in 2013!
  2. Big Data Predictions For 2013
  3. What we’ll see in 2013 in data
  4. A programmer’s guide to big data: 12 tools to know
  5. Red Hat Predicts Significant Trends in Scale-out Open Hybrid Cloud Storage in 2013
  6. 2013 Predictions for Big Data
  7. Hadoop, Mobile, and Other Big Data Trends in 2013
  8. Reasons to be excited about Big Data in 2013
  9. 5 Big Data Predictions For 2013
  10. Big Data 2013 Predictions
  11. Predicts 2013: Hadoop Becomes Enterprise-Acceptable, Transitions from Experimental to Mainstream
  12. Apache Hadoop: Seven Predictions for 2013
  13. Welcome to 2013!
  14. Intelligent Applications: The Big Data Theme for 2013
  15. Big Data 2013: Industry Players’ Forecasts
  16. Koby’s Big Data Predictions for 2013
  17. 2013 The Year Big Data Goes Big-Time
  18. Big Data, Bigger Myths
  19. Big Data: Business or Technology Challenge?
  20. 5 Tech IPOs For 2013 From Cloud Events To Ultrafast Chips

Original title and link: Issue #1: Quo Vadis, Big Data? (NoSQL database©myNoSQL)

Reports Indicate That Part of Your Business Algorithm Is Executed by Humans

Jay Kreps1 had a very interesting follow up to the GigaOM’s article Why big data might be more about automation than insights :

That article reminded me how immature people’s thinking about the use of data is. They are still thinking about “reports”. Reports indicate that that part of your business algorithm that is executed by a human. When you understand it well enough, whatever you are doing looking at a report a computer can do better and faster. But the real advantage is that computers can disaggregate decisions humans make into many many individual cases and be far more accurate.

The algorithms is:

  1. add instrumentation
  2. visualzie data
  3. turn visualization into a report
  4. automate reaction to report
  5. Wash, rinse, repeat.

  1. Jay Kreps is working at LinkedIn in the SNA team. 

Original title and link: Reports Indicate That Part of Your Business Algorithm Is Executed by Humans (NoSQL database©myNoSQL)

5 Business Analytics Tech Trends

In interviews, CIOs consistently identified five IT trends that are having an impact on how they deliver analytics: the rise of Big Data, technologies for faster processing, declining costs for IT commodities, proliferating mobile devices and social media.

Same old, same old.

Original title and link: 5 Business Analytics Tech Trends (NoSQL database©myNoSQL)


Data Scientist’s Anthem

Shamir Karkal:

Data Scientist’s anthem - We R Who We R

Andrei Savu

Original title and link: Data Scientist’s Anthem (NoSQL database©myNoSQL)

Objectivity CEO: We Have Been Solving the Big Data Problem

Jay Jarell, the President and CEO of Objectivity, in a PR announcement:

We have been solving the Big Data problem for decades.

For decades!

Original title and link: Objectivity CEO: We Have Been Solving the Big Data Problem (NoSQL database©myNoSQL)

Oracle Database or Hadoop? And What Led to NoSQL Databases

In a follow up post to SQL or Hadoop: What Tools Should I Use to Process My Data?, Gwen Shapira presents some reasons why, even if many things that fit into Hadoop better, could be done with Oracle, that’s not also a good idea:

But, do you really want to use Oracle to store millions of emails and scanned documents?[1] I have few customers who do it, and I think it causes more problems than it solves. After you stored them, do you really want to use your network and storage bandwidth so  the application servers will keep reading the data from the database? Big data is… big. It is best not to move it around too much and run the processing on the servers that store the data. After all, the code takes fewer packets than the data. But, Oracle makes cores very expensive.  Are you sure you want to use them to run processing-intensive data mining algorithms?

Then there’s the issue of actually programming the processing code. If your big data is in Oracle and you want to process it efficiently, PL/SQL is pretty much the only option. […]

All these are very solid arguments.

Generalizing a bit the point Gwen’s making, I would say that this is exactly the history and what made relational databases successful. Providing decent solutions, up to a point, to a wide range of problems and covering more scenarios than alternative storage solutions existing at that time, made relational databases the de facto storage for the last 30 years[2]. But during the last years, more and more problems crossed the boundaries of what could have been considered decent solutions leading to the need for specialized, better than good enough alternative solutions. And thus NoSQL databases.

  1. Interestingly, when presented with a Hadoop and Solr solution for archiving emails, I’ve also wondered if that is the best solution.  

  2. This is a bit of an oversimplification to make the point, as there were other obvious technical advantages of relational databases over some of the alternative solutions.  

Original title and link: Oracle Database or Hadoop? And What Led to NoSQL Databases (NoSQL database©myNoSQL)