Greenplum: All content tagged as Greenplum in NoSQL databases and polyglot persistence
Monday, 11 March 2013
What It Means to Be “all In” on Hadoop
Another post about the Pivotal HD and the accompanying statements, this time from Matthew Aslett:
Pivotal HD is not Hadoop
Neither is Cloudera’s Distribution, including Apache Hadoop.
Nor the Hortonworks Data Platform.
Nor the MapR Distribution.
Nor IBM’s InfoSphere BigInsights.
Nor the WANdisco Distro.
Nor Intel’s Distribution for Apache Hadoop.
Original title and link: What It Means to Be “all In” on Hadoop (©myNoSQL)
via: http://blogs.the451group.com/information_management/2013/03/11/all-in-on-hadoop/
Monday, 4 March 2013
How Many Hadoops?
The short answer is there is only one Apache Hadoop distribution.
The long answer is that there are many distributions that include Apache Hadoop or are claiming compatibility with Apache Hadoop.
The oldest and probably most popular: Cloudera’s Distribution of Hadoop (CDH)
The 100% open source: Hortonworks Data Platform.
The prioprietary: MapR.
The blue one: IBM InfoSphere BigInsights.
The latest: WANdisco Hadoop WDD, Intel Distribution of Hadoop and Pivotal HD from EMC Greenplum.
There’s also the version Facebook’s running on their cluster which includes Facebook Corona: a different approach to job scheduling and resource management.
But this list is not complete as it doesn’t include appliances featuring Hadoop. In this category we have:
- Oracle’s Big Data appliance featuring Cloudera’s Distribution of Hadoop
- Netapp’s Hadooplers
- EMC Greenplum DCA
- Teradata Aster Discovery Platform featuring Hortonworks’s Hadoop Data Platform
- Data Direct Networks (DDN)
I hope I didn’t miss any important ones1. As a conclusion for this list, my question is: who is actually benefiting from all these distributions?
-
I left aside for now Hadoop-as-a-Service. ↩
Original title and link: How Many Hadoops? (©myNoSQL)
Monday, 19 March 2012
Big Data Market Analysis: Vendors Revenue and Forecasts
I think this is the first extensive Big Data report I’m reading that includes enough relevant and quite exhaustive data about the majority of players in the Big Data market, plus some captivating forecasts.
As of early 2012, the Big Data market stands at just over $5 billion based on related software, hardware, and services revenue. Increased interest in and awareness of the power of Big Data and related analytic capabilities to gain competitive advantage and to improve operational efficiencies, coupled with developments in the technologies and services that make Big Data a practical reality, will result in a super-charged CAGR of 58% between now and 2017.

While there are many stories behind these numbers and many things to think about, here is what I’ve jotted down while studying the report:
- it’s no surprise that “megavendors” (IBM, HP, etc.) account for the largest part of today’s Big Data market revenue
- still, the revenue ratio of pure-players vs megavendors feels quite unbalanced: $311mil out of $5.1bil
- the pure-player category includes: Vertica, Aster Data, Splunk, Greenplum, 1010data, Cloudera, Think Big Analytics, MapR, Digital Reasoning, Datameer, Hortonworks, DataStax, HPCC Systems, Karmasphere
- there are a couple of names that position themselves in the Big Data market that do not show up in anywhere (e.g. 10gen, Couchbase)
- this could lead to the conclusion that the companies that include hardware in their offer benefit of larger revenues
- I’m wondering though what is the margin in the hardware market segment. While not having any data at hand, I think I’ve read reports about HP and Dell not doing so well due exactly to lower margins
- see bullet point further down about revenue by hardware, software, and services
- this could explain why so many companies are trying their hand at appliances
- by looking at the various numbers you can see that those selling appliances usually have a large corporation behind supporting the production costs for hadware and probably the cost of the sales force
- in the Big Data revenue by vendor you can find quite a few well-known names from the consulting segment
- the revenue by type pie lists services as accounting for 44%, hardware for 31%, and software for 13% which might give an idea of what makes up the megavendors’ sales packages
- most of the NoSQL database companies and Hadoop companies are mostly in the software and services segment
Great job done by the Wikibon team.
Original title and link: Big Data Market Analysis: Vendors Revenue and Forecasts (©myNoSQL)
via: http://wikibon.org/wiki/v/Big_Data_Market_Size_and_Vendor_Revenues
Wednesday, 25 January 2012
12 Hadoop Vendors to Watch in 2012
My list of 8 most interesting companies for the future of Hadoop didn’t try to include anyone having a product with the Hadoop word in it. But the list from InformationWeek does. To save you 15 clicks, here’s their list:
- Amazon Elastic MapReduce
- Cloudera
- Datameer
- EMC (with EMC Greenplum Unified Analytics Platform and EMC Data Computing Appliance)
- Hadapt
- Hortonworks
- IBM (InfoSphere BigInsights)
- Informatica (for HParser)
- Karmasphere
- MapR
- Microsoft
- Oracle
Original title and link: 12 Hadoop Vendors to Watch in 2012 (©myNoSQL)
Thursday, 12 January 2012
Comparing Hadoop Appliances: Oracle’s Big Data Appliance, EMC Greenplum DCA, Netapp Hadooplers
Great post from Gwen Shapira over Pythian diving into the pros and cons of Hadoop appliances vs building your own Hadoop clusters. Plus a comparison of existing Hadoop appliances: Oracle Big Data Appliance, EMC Greenplum DCA, and Netapp Hadooplers.
Another good reason to roll your own is the flexibility: Appliances are called that way because they have a very specific configuration. You get a certain number of nodes, cpus, RAM and storage. Oracle’s offering is an 18 node rack. What if you want 12 nodes? or 23? tough luck. What if you want less RAM and more CPU? you are still stuck.
Original title and link: Comparing Hadoop Appliances: Oracle’s Big Data Appliance, EMC Greenplum DCA, Netapp Hadooplers (©myNoSQL)
via: http://www.pythian.com/news/29955/comparing-hadoop-appliances/
Tuesday, 10 January 2012
Partnerships in the Hadoop Market
Just a quick recap:
- Cloudera: Oracle, Dell, NetApp
- Hortonworks: Microsoft
- MapR: EMC (integration with Greenplum HD)
Amazon doesn’t partner with anyone for their Amazon Elastic Map Reduce. And IBM is walking alone with the software-only InfoSphere BigInsights.
Original title and link: Partnerships in the Hadoop Market (©myNoSQL)
Wednesday, 14 December 2011
EMC Greenplum Database and Hadoop Distribution Puts a Social Spin on Big Data
Huge technological contribution to the Hadoop ecosystem:
Greenplum, the analytics division of EMC, has announced new software that lets data analysts explore all their organization’s data and share interesting findings and data sets Facebook-style among their colleagues.
Original title and link: EMC Greenplum Database and Hadoop Distribution Puts a Social Spin on Big Data (©myNoSQL)
via: http://gigaom.com/cloud/emc-greenplum-puts-a-social-spin-on-big-data/
Wednesday, 30 November 2011
Explaining Hadoop to Your CEO
Dan Woods (Forbes):
The answer is, yes, Hadoop could be helpful, but there are other technologies as well. For example, technologies such as Splunk allow you to explore big data sets in a way that’s more interactive than most Hadoop implementations. Splunk not only lets you play with big data; you can also distill it and visualize it. Pervasive’s DataRush allows you to write parallel programs using a simplified programming model, and then process lots of data at scale. 1010data allows you to look at a spreadsheet that has a trillion rows, as well as handle time series data. EMC Greenplum and Teradata Aster Data and SAP HANA will also want a crack at your business. If you take any of these technologies and combine them with QlikView, Tableau, or TIBCO Spotfire, you can figure out what a big data set means to your business very quickly. So if your job is understanding the business value of the data, Hadoop is one of many things that you should analyze.
Translation:
Blah blah blah Big Data, blah blah blah list of vendors, blah blah blah Big Data
It might even work for a dummy CEO.
Original title and link: Explaining Hadoop to Your CEO (©myNoSQL)
via: http://www.forbes.com/sites/danwoods/2011/11/03/explaining-hadoop-to-your-ceo/
Wednesday, 5 October 2011
Hadoop: It's Still a Niche Technology
In an otherwise generic but interesting post about Hadoop and its integration with data analytics and data warehouse solutions, Jessica Twentyman writes:
It’s still a niche technology, but Hadoop’s profile received a serious boost over that past year, thanks in part to start-up companies such as Cloudera and MapR that offer commercially licensed and supported distributions of Hadoop. Its growing popularity is also the result of serious interest shown by EDW vendors like EMC, IBM and Teradata. EMC bought Hadoop specialist Greenplum in June 2010; Teradata announced its acquisition of Aster Data in March 2011; and IBM announced its own Hadoop offering, Infosphere, in May 2011.
Unfortunately she got this all wrong. It is the open source community, developers, data scientists, and Cloudera that help popularize Hadoop.
These data analytics and data warehouse vendors are just capitalizing on Hadoop delivering results. They haven’t been knocking at doors asking: “Have you heard of Hadoop? Do you want to try it?”. They’ve run into Hadoop in most of the places they went and that made them realize it is a business opportunity.
So, I’ll say it again: Hadoop is popular thanks to the open source community, developers, data scientists and Cloudera.
Original title and link: Hadoop: It’s Still a Niche Technology (©myNoSQL)
Tuesday, 4 October 2011
R: the Leading Statistics Language and Key Weapon in Advanced Analytics Today
David Smith (Revolution Analytics):
Of course, this isn’t the first time that R has been embedded into a data warehousing appliance. IBM Netezza’s iClass device integrates with Revolution R, and AsterData, the Teradata Data Warehouse Appliance, and Greenplum all provide connections to R as well. Here at Revolution Analytics, we think that such enterprise-level integrations with R serve to grow the R ecosystem and serve as validation of R as a key platform for advanced analytics. As CEO Norman Nie said to GigaOm this weekend,
“Oracle’s announcement to embed R demonstrates validation for the leading statistics language and offers further evidence that R is a key weapon in advanced analytics today”
And let’s not leave aside the strategic partnership between Revolution Analytics and Cloudera to include RevoConnectR in the CDH.
Original title and link: R: the Leading Statistics Language and Key Weapon in Advanced Analytics Today (©myNoSQL)
via: http://www.r-bloggers.com/oracles-big-data-appliance-to-include-r/
Wednesday, 10 August 2011
BI Pentaho Integrates Hadoop, NoSQL Databases, and Analytic Databases
- The ability to orchestrate execution of Hadoop related tasks (i.e., executing a Hive Query, Pig Script, or M/R job) as part of a broader IT workflow.
- The ability to setup dependencies, so if a step fails the job can branch down a recovery path or send a notification, or if it’s a success it goes on to subsequent dependent tasks. Likewise it supports initiating several tasks in parallel.
- New integration for Pig — so that developers have the ability to execute a Pig job from a PDI Job flow, integrate the execution of Pig jobs in broader IT workflows through PDI Jobs, take advantage of our out of the box scheduler, and so on.
The list of tools Pentaho 4 integrates with is quite long:
- a long list of traditional RDBMS
- analytics databases (Greenplum, Vertica, Netezza, Teradata, etc.)
- NoSQL databases (MongoDB, HBase, etc.)
- Hadoop variants
- LexisNexis HPCC
This is the world of polyglot persistence and hybrid data storage.
Original title and link: BI Pentaho Integrates Hadoop, NoSQL Databases, and Analytic Databases (©myNoSQL)
Friday, 10 June 2011
EMC BigData Acquisition Budget: $3 Billion
Bloomberg reports on EMC’s planned budget for acquisitions in the BigData market:
EMC Corp. may spend about $3 billion on acquisitions this year, keeping pace with last year’s tally, to add businesses that can help corporate customers analyze reams of data, Chief Operating Officer Pat Gelsinger said.
[…]
EMC says it spent $3.2 billion last year on acquisitions including Isilon Systems Inc. and Greenplum Inc. to gain products that let its customers store and analyze a vast and rapid onslaught of data from business applications and the Web. EMC may spend about that much again in 2011 as it races Oracle Corp. (ORCL), International Business Machines Corp., Hewlett-Packard Co. (HPQ) and SAP AG (SAP) to offer more robust data-analysis products.
EMC joins HP which has also directly[1] and indirectly announced its plans for acquisitions in the BigData market.
On the other hand, can you imagine how much could be done for the community driven NoSQL databases with only 1-2% of this budget?
-
Earlier this year, HP acquired Vertica. ↩
Original title and link: EMC BigData Acquisition Budget: $3 Billion (NoSQL databases © myNoSQL)
Most Popular Articles
- Translate SQL to MongoDB MapReduce
- Tutorial: Getting Started With Cassandra
- CouchDB vs MongoDB: An attempt for a More Informed Comparison
- Cassandra @ Twitter: An Interview with Ryan King
- A Couple of Nice GUI Tools for MongoDB
- NoSQL benchmarks and performance evaluations
- Ehcache: Distributed Cache or NoSQL Store?
- Document Databases Compared: CouchDB, MongoDB, RavenDB
- Quick Review of Existing Graph Databases
- NoSQL Data Modeling