Big Data Causes Concern and Big Confusion. A Big Data Definition to Help Clarify the Confusion
Thor Olavsrud (CIO) reports for ITWorld the results of a survey about BigData:
A new survey conducted by LogLogic in conjunction with IT security research consultancy Echelon One finds that 49% of organizations are somewhat or very concerned about managing big data, but 38% don’t understand what big data is and a further 27% say they have a partial understanding. Additionally, the survey found that 59% of organizations lack the tools required to manage data from their IT systems, instead turning to separate and disparate systems or even spreadsheets.
Let’s see how LogLogic and Thor Olavsrud help understanding what Big Data is and clarify the confusion. Mandeep Khera (CMO LogLogic):
“Big data is about many terabytes of unstructured data. Information is power, and big data, if managed properly, can provide a ton of insight to help deal with security, operational and compliance issues. Organizations of every size are collecting more data from a variety of sources within the enterprise and cloud infrastructures,”
Thor Olavsrud:
“The data is coming from sensors, transaction records, images and videos, social media posts, logs and all sorts of other sources. That’s big data”
That’s about less than 38% correct. So let me try to give a couple of hints:
- Big Data is characterized by the 3 V’s: volume, variety, velocity
- Forrester added to these the 4th V: variability. And they explain it here
- my definitions of the 4 V’s are as follows:
- Volume: data exceeds the limits of vertically scalable tools requiring distributed storage solutions and parallel processing tools
- Variety: data takes different formats that make the integration complex and expensive
- Velocity: data ingestion and data analysis time windows are small compared to the speed of data acquisition
- Variability: data can have different meanings and format over different time periods
- Even if some disagree with these 4 dimensions, I think volume, velocity, variety, variability are the closest to define Big Data.
Concluding, Big Data is a multi-phase problem including the acquisition/ingestion, organization, storage, and analysis of large amounts of data originated in various sources and coming in multiple formats.
Original title and link: Big Data Causes Concern and Big Confusion. A Big Data Definition to Help Clarify the Confusion (©myNoSQL)