BigData: All content tagged as BigData in NoSQL databases and polyglot persistence
Wednesday, 15 February 2012
Everything is Big Data Now… But Don’t let yourself fouled by buzzwords
Peter Collingridge for Jenn Webb in Book marketing is broken. Big data can fix it on O’Reilly Radar :
But when you’re in a much faster-paced world, with the industry moving toward being consumer- rather than trade-facing, and with a fragmented retail and media landscape, you need to make decisions based on fact: What is the ROI on a £50,000 marketing campaign? Where do my banner ads have the best CTR? Who are the key influencers here — are they bloggers, mainstream media, or somewhere else? How many of our Twitter followers actually engage? When should we publish, in what format, and at what price?
Big Data is not equivalent to data analytics or BI. And neither of these are equivalent to automatic decision making or business success.
While it’s understandable why vendors would encourage this misbelief, do not fall for it. Neither every data flow is Big Data, nor will Big Data automatically solve all world problems.
Original title and link: Everything is Big Data Now… But Don’t let yourself fouled by buzzwords (©myNoSQL)
Tuesday, 14 February 2012
Trivialization of the Big Data term
Most teams in small businesses are required to manage a never-ending stream of changing schedules, shifting priorities, and adjustments to resource allocation, which result in a massive amount of updates to their project portfolio. How can smaller organizations leverage big data to gain visibility, control and predictability over their work?
Original title and link: Trivialization of the Big Data term (©myNoSQL)
Monday, 13 February 2012
It's a revolution - The Impact of Big Data in the World
Gary King, director of Harvard’s Institute for Quantitative Social Science for The New York Times:
“It’s a revolution. We’re really just getting under way. But the march of quantification, made possible by enormous new sources of data, will sweep through academia, business and government. There is no area that is going to be untouched.”
Original title and link: It’s a revolution - The Impact of Big Data in the World (©myNoSQL)
How Web giants store big data
An ArsTechnica, not very technical, overview of the storage engines developed and used by Google (Google File System, BigTable), Amazon (Dynamo), Microsoft (Azure DFS), plus the Hadoop Distributed File System (HDFS).
Original title and link: How Web giants store big data (©myNoSQL)
Wednesday, 8 February 2012
The Outer Limits of Data Warehouse Technology
The story of adopting Hadoop (through Zettaset) at Zions Bancorporation:
The quest for a solution began in 2009 with an investigation of Zion’s existing Microsoft and Oracle technologies, as well as other technologies within the firm and new solutions on the market, Wood relates. After developing a list of six potential vendors, he says, he and his team quickly focused on two Hadoop-based solutions. The team, Wood explains, recognized the potential in Hadoop for “making security decisions proactively rather than reactively, based on mining business intelligence and combining it with event data from security devices.”
Original title and link: The Outer Limits of Data Warehouse Technology (©myNoSQL)
via: http://www.banktech.com/business-intelligence/232600226?printer_friendly=this-page
5 Top Misconceptions about Big Data and Hadoop
The MapR team analyzes the top 5 misconceptions in the Big Data/Hadoop market:
- Big Data is not simply about massive amounts of data — petabytes and beyond. Big Data represents a paradigm shift.
- Since Hadoop is a funny name and somewhat new to people they assume it must be risky.
- Another misconception about Hadoop, is that it is a batch process.
- Perhaps the biggest misconception is that Hadoop is a single, monolithic, component.
- With respect to open source, the question about a distribution is not a simple binary “open” or “closed”.
The first 4 points are indeed how things are seen from the outside.
While I do understand the nuance introduced by the last point—allowing to plug MapR—, things are black and white: it is either open source or not. But that’s just one dimension of the various components of the Hadoop stack. What really matters is how well a component integrates with the rest of the stack. The questions to be asked are: does it maintain the same interfaces? what’s the cost of replacing it? does it allow to use a 3rd party component? does it force me to get special components or hardware?
Original title and link: 5 Top Misconceptions about Big Data and Hadoop (©myNoSQL)
via: http://www.mapr.com/blog/top-misconceptions-about-big-data-and-hadoop
Visualizing Hadoop data with Tableau Software and Cloudera Connector for Tableau
Put together one of the most impressive visualization tools, Tableau Software, with one of the best solutions for big data, Hadoop, and you’ll probably get some astonishing results.

Credit Cloudera.
While Tableau Software works with structured data only, with this connector it gets access to Hive through HiveQL.
Original title and link: Visualizing Hadoop data with Tableau Software and Cloudera Connector for Tableau (©myNoSQL)
via: http://www.cloudera.com/blog/2012/02/cloudera-connector-for-tableau-has-been-released/
Sunday, 5 February 2012
Hadoop and NoSQL in a Big Data Environment with Ron Bodkin
Ron Bodkin interviewed by Michael Floyd over InfoQ describes the Hadoop growing addiction:
People are using Hadoop for a variety of analytics. Many of the first uses of Hadoop are complementing traditional data warehouses I just mentioned, where the goal is to take some of the pressure of the data warehouse, start to be able to process less structured data more effectively and to be able to do transformations and build summaries and aggregates, but not have to have all that data loaded to the data warehouse. But then the next thing that happens is once people have started doing that level of processing they realize there is a power of being able to ask questions they never thought of before the data, they can store all the data in small samples and they can go back and have a powerful query engine, a cluster of commodity machines that lets them dig into that raw data and analyze it new ways ultimately leading to data science being able to do machine learning and being able to discover patterns in data and keep them improving and refining the data.
The interview is only 16 minutes long and you have the full transcript.
Original title and link: Hadoop and NoSQL in a Big Data Environment with Ron Bodkin (©myNoSQL)
Friday, 3 February 2012
What's the big deal about Big Data?
Roger Ehrenberg (Founder and Managing Partner of IA Ventures):
Every so often a term becomes so beloved by media that it moves from “instructive” to “hackneyed” to “worthless,” and Big Data is one of those terms. […] But since this time the term Big Data has become diluted. Very diluted. So much so that it is almost totally meaningless. Does Big Data mean new kinds of databases? Sure. Does it mean innovative ways to visualize data to create actionable intelligence? Absolutely. Can it be applied to the health care sector? Without question. Has it contributed to the rise of the Data Scientist? Mos def.
And I thought I was off.
Original title and link: What’s the big deal about Big Data? (©myNoSQL)
via: http://informationarbitrage.com/post/16121669634/whats-the-big-deal-about-big-data
Wednesday, 1 February 2012
How to Hadoop: Maximizing the value of big data
Brian Christian1 (Zettaset) suggests two roads for adopting Hadoop:
The first, building the capability internally, seems to hold out the promise of flexibility and control for organizations that employ it. While this has sometimes been the case for some large companies, a variety of studies indicate that even among Fortune 500 companies, less than 20 percent that began Hadoop development succeeded in deploying a solution.
The second approach entails working with a big-data, Hadoop-focused third party to develop a bespoke solution. In addition to eliminating the requirement of enormous equipment and human capital investment, this approach also enables organizations, their executives, and IT staff to focus on their core value propositions rather than being forced to become Hadoop specialists.
It would be easy if the decision what be just about CAPEX vs OPEX. Or on-premise vs managed deployments. But there are tons of variables that must be considered when going the Big Data way. Eventually pretty much everyone will do something around Big Data, but those at the forefront still have to figure out many important aspects.
-
Brian Christian is CEO of Zettaset, which delivers a fault-tolerant and highly available solution for big data aggregation ↩
Original title and link: How to Hadoop: Maximizing the value of big data (©myNoSQL)
via: http://venturebeat.com/2012/01/24/big-data-server-efficiency/
Tuesday, 31 January 2012
Vertica and Hadoop for Big Data
Here is what I’ve jotted down during Vertica’s webinar Hadoop vs. RDBMS for Big Data Analytics: Why Choose?
- the webinar has focused on clarifying where and how Vertica and Hadoop fit in the Big Data space
- Vertica’s strenghts:
- support for SQL, extended SQL, and analytics making it interactive investigation of data
- storage space efficiency — I don’t think it’s correct to interpret Hadoop data redundancy as storage space inneficiency
- analytics SDK (allows customizing in-database analytic functions)
- ease of operating and maintenance (auto-tunning features)
- the following slide is pretty eloquent about Hadoop and Vertica being complementary solutions :

-
when covering a scenario for using both Hadoop and Vertica, they chose the ease one: Hadoop as ETL. It’s not that it’s not a good one, but it’s the only one databases vendors are using these days when speaking about integration with Hadoop.

-
other possible Hadoop + Vertica use cases:
- Filter, join, and aggregation in Vertica with intermediate results fed into MR jobs
- parallel import and export to HDFS
- Hadoop MapReduce for data transformation and Vertica for optimized storage and retrieval
- there will be a community edition of Vertica. It was announced in October for the end of 2011, but I don’t think it’s out yet
- there’s a GitHub repo for user defined extensions for Vertica
-
the following categorization of Big Data tools is interesting but feels in favor of Vertica which would be placed somewhere close to the center of the triangle

Original title and link: Vertica and Hadoop for Big Data (©myNoSQL)
Monday, 30 January 2012
5 Key Elements for a Firehose Data System
The 5 key elements for a firehose data system as per a presentation by Josh Berkus, CEO of PostgreSQL Experts Inc. summarized by Brian Proffitt on ITworld:
- Queuing software to manage out-of-sequence data
- Buffering techniques to deal with component outages
- Materialized views that update data into aggregate tables
- Configuration management for all the systems in the solution
- Comprehensive monitoring to look for failures
Basically firehose data systems are the perfect showcase of the 4 V’s in Big Data. To get an idea of the complexity involved by such systems check the DataSift architecture which relies on MySQL, HBase, Memcached, Redis, Kafka to deal just1 with the Twitter firehose.
-
While Twitter is a high volume service, the Internet of Things or the Sensor networks are producing much much more data. ↩
Original title and link: 5 Key Elements for a Firehose Data System (©myNoSQL)
Most Popular Articles
- Translate SQL to MongoDB MapReduce
- Tutorial: Getting Started With Cassandra
- CouchDB vs MongoDB: An attempt for a More Informed Comparison
- Cassandra @ Twitter: An Interview with Ryan King
- A Couple of Nice GUI Tools for MongoDB
- NoSQL benchmarks and performance evaluations
- Ehcache: Distributed Cache or NoSQL Store?
- Document Databases Compared: CouchDB, MongoDB, RavenDB
- Quick Review of Existing Graph Databases
- NoSQL Data Modeling