Greenplum: All content tagged as Greenplum in NoSQL databases and polyglot persistence
Tuesday, 17 May 2011
Druid: Distributed In-Memory OLAP Data Store
Over the last twelve months, we tried and failed to achieve scale and speed with relational databases (Greenplum, InfoBright, MySQL) and NoSQL offerings (HBase).
Stepping back from our two failures, let’s examine why these systems failed to scale for our needs:
Relational Database Architectures
- Full table scans were slow, regardless of the storage engine used
- Maintaining proper dimension tables, indexes and aggregate tables was painful
- Parallelization of queries was not always supported or non-trivial
Massive NOSQL With Pre-Computation
- Supporting high dimensional OLAP requires pre-computing an exponentially large amount of data
Many of the questions you have in mind have already been asked in the this comment thread, but with not so many answers until now.
Original title and link: Druid: Distributed In-Memory OLAP Data Store (NoSQL databases © myNoSQL)
via: http://metamarketsgroup.com/blog/druid-part-i-real-time-analytics-at-a-billion-rows-per-second/
Wednesday, 6 April 2011
The Data Processing Platform for Tomorrow
In the blue corner we have IBM with Netezza as analytic database, Cognos for BI, and SPSS for predictive analytics. In the green corner we have EMC with Greenplum and the partnership with SAS[1]. And in the open source corner we have Hadoop and R.
Update: there’s also another corner I don’t know how to color where Teradata and its recently acquired Aster Data partner with SAS.
Who is ready to bet on which of these platforms will be processing more data in the next years?
Original title and link: The Data Processing Platform for Tomorrow (NoSQL databases © myNoSQL)
Tuesday, 22 March 2011
Types of Big Data Work
Mike Minelli: Working with big data can be classified into three basic categories […] One is information management, a second is business intelligence, and the third is advanced analytics
Information management captures and stores the information, BI analyzes data to see what has happened in the past, and advanced analytics is predictive, looking at what the data indicates for the future.
There’s also a list of tools for BigData: AsterData (acquired by Teradata), Datameer, Paraccel, IBM Netezza, Oracle Exadata, EMC Greenplum.
Original title and link: Types of Big Data Work (NoSQL databases © myNoSQL)
Friday, 18 March 2011
Cloudera: A Business Inteligence Leader
The Informatica accord is Cloudera’s second partnership this year with a leading DI player. Back in August, Cloudera cemented a deal with open source software (OSS) data integration (DI) specialist Talend. It also has partnerships with Teradata Corp., the former Netezza Inc., the former Greenplum Software Corp., Aster Data Systems Inc., Vertica Inc., and Pentaho.
One thing’s for sure: Cloudera is certainly attracting attention.
The strategy is surprisingly simple: make it easy to put data in and get it out.
Original title and link: Cloudera: A Business Inteligence Leader (NoSQL databases © myNoSQL)
via: http://tdwi.org/articles/2011/02/16/cloudera-leader-bi-hadoop.aspx
Tuesday, 1 February 2011
New Tools in the NoSQL and Big Data Market
DataStax OpsCenter for Apache Cassandra
DataStax (ex-Riptano) announced yesterday their tool for managing including sophisticated visualizations of the cluster, comprehensive management and configuration, monitoring and operating enterprise Cassandra applications named OpsCenter.
DataStax OpsCenter for Apache Cassandra will require a subscription, but a developer version, not to be used in production, will be made available too.
Call me an idealist, but I would have suggested a different than Gold/Silver/Bronze or Mission-Critical/Premier model:
- 1-5 nodes: free (nb: good kharma)
- 6-low tens of nodes: moderately priced package
- premier: everything else
EMC Greenplum Community Edition
After acquiring Greenplum[1], EMC is making available a community edition:
[…] the new EMC Greenplum Community Edition removes the cost barrier to entry for big data power tools empowering large numbers of developers, data scientists, and other data professionals. This free set of tools enables the community to not only better understand their data, gain deeper insights and better visualize insights, but to also contribute and participate in the development of next-generation tools and solutions. With the Community Edition stack, developers can build complex applications to collect, analyze and operationalize big data leveraging best of breed big data tools including the Greenplum Database with its in-database analytic processing capabilities.
I couldn’t find the details of the community edition license, but instead I’ve found this:
The software is only intended for research, development and experiments, with license purchases required for commercial uses.
About the (marketing) rationale behind this release you can read more on Chuck Hollis’, Global Marketing CTO, blog
Original title and link: New Tools in the NoSQL and Big Data Market (NoSQL databases © myNoSQL)
Wednesday, 13 October 2010
Hadoop Spreading through Cloudera Parternships
Cloudera in its attempt to Hadoopize the world goes on partnership spree:
Many of you may have read about some of the recent announcements of partnerships between Cloudera and some of the leading data management software companies like Teradata, Netezza, Greenplum (EMC), Quest and Aster Data. We established these partnerships because Hadoop is increasingly serving as an open platform that many different applications and complimentary technologies work with. Our goal is to to make this as easy and as standardized as possible.
Checking the ☞ press release section turns out the following parnerships:
- Membase
- Talend
- Quest
- Pentaho
- NTT Data
- Aster Data
- EMC Greenplum
- Teradata
- Netezza
Quite a few companies from the non-relational market.
Original title and link: Hadoop Spreading through Cloudera Parternships (NoSQL databases © myNoSQL)
via: http://www.cloudera.com/blog/2010/10/cdh3-beta-3-now-available/
Most Popular Articles
- Translate SQL to MongoDB MapReduce
- Tutorial: Getting Started With Cassandra
- CouchDB vs MongoDB: An attempt for a More Informed Comparison
- Cassandra @ Twitter: An Interview with Ryan King
- A Couple of Nice GUI Tools for MongoDB
- NoSQL benchmarks and performance evaluations
- Ehcache: Distributed Cache or NoSQL Store?
- Document Databases Compared: CouchDB, MongoDB, RavenDB
- Quick Review of Existing Graph Databases
- NoSQL Data Modeling