InfoBright: All content tagged as InfoBright in NoSQL databases and polyglot persistence
Tuesday, 4 October 2011
An Infobright Column Store Use Case
Alex Pinkin describes the difference a column store, Infobright, made to solving their problems implementing dashboards, reports, and alerts:
What is the secret sauce in Infobright? First, its column oriented storage model which leads to smaller disk I/O. Second, its “knowledge grid” which is aggregate data Infobright calculates during data loading. Data is stored in 65K Data Packs. Data Pack nodes in the knowledge grid contain a set of statistics about the data that is stored in each of the Data Packs. For instance, Infobright can pre-calculate min, max, and avg value for each column in the pack during the load, as well as keep track of distinct values for columns with low cardinality. Such metadata can really help when executing a query since it’s possible to ignore data packs which have no data matching filter criteria. If a data pack can be ignored, there is no penalty associated with decompressing the data pack.
Compared to our MySQL implementation, Infobright eliminated the need to create and manage indexes, as well as to partition tables.
Original title and link: An Infobright Column Store Use Case (©myNoSQL)
Friday, 24 June 2011
2 Ways to Tackle Really Big Data
So there you have the two approaches to handling machine-generated-data. If you have vast archives, EMC, IBM Netezza, and Teradata all have purpose-build appliances that scale into the petabytes. You also could use Hadoop, which promises much lower cost, but you’ll have to develop separate processes and applications for that environment. You’ll also have to establish or outsource expertise on Hadoop deployment, management, and data processing. For fast-query needs, EMC, IBM Netezza, and Teradata all have fast, standard appliances and faster, high-performance appliances (and companies including Kognitio and Oracle have similar configuration choices). Column-oriented database and appliance vendors including HP Vertica, InfoBright, ParAccel, and Sybase have speed advantages inherent in their database architectures.
I’m wondering why Hadoop is mentioned just in passing considering how many large datasets it is already handling.
Original title and link: 2 Ways to Tackle Really Big Data (NoSQL database©myNoSQL)
Monday, 20 June 2011
Columnar DBMS Vendor Customer Metrics
Very interesting customer base numbers for Sybase IQ, Vertica, SAND Technology, Infobright published by Curt Monash—most are in the hundreds, except for Sybase IQ.
This got me thinking what numbers would NoSQL companies have—is any of them sharing such numbers?. I’d speculate that most of them are in the tens, with 10gen (MongoDB) leading the space with probably a couple of hundreds at best.
Original title and link: Columnar DBMS Vendor Customer Metrics (NoSQL database©myNoSQL)
Thursday, 16 June 2011
Infobright Rough Query: Aproximating Query Results
Very interesting idea in the latest Infobright release:
The most interesting of the group might be Rough Query, which speeds the process of finding the needle in a multi-terabyte haystack by quickly pointing users to a relevant range of data, at which point they can drill down with more-complex queries. So, in theory, a query that might have taken 20 minutes before might now take just a few minutes because Rough Query works in seconds by using only the in-memory data and the subsequent search is against a much smaller data set.
Curt Monash provides more context about Rough Queries in his post:
To understand Infobright Rough Query, recall the essence of Infobright’s architecture:
Infobright’s core technical idea is to chop columns of data into 64K chunks, called data packs, and then store concise information about what’s in the packs. The more basic information is stored in data pack nodes,* one per data pack. If you’re familiar with Netezza zone maps, data pack nodes sound like zone maps on steroids. They store maximum values, minimum values, and (where meaningful) aggregates, and also encode information as to which intervals between the min and max values do or don’t contain actual data values.
I.e., a concise, imprecise representation of the database is always kept in RAM, in something Infobright calls the “Knowledge Grid.” Rough Query estimates query results based solely on the information in the Knowledge Grid — i.e., Rough Query always executes against information that’s already in RAM.
Rough Query is not meant for BI or reporting, but rather for initial investigations data scientists would perform against BigData.
Original title and link: Infobright Rough Query: Aproximating Query Results (NoSQL database©myNoSQL)
via: http://gigaom.com/cloud/infobright-wants-to-make-big-data-faster-way-faster/
Tuesday, 17 May 2011
Druid: Distributed In-Memory OLAP Data Store
Over the last twelve months, we tried and failed to achieve scale and speed with relational databases (Greenplum, InfoBright, MySQL) and NoSQL offerings (HBase).
Stepping back from our two failures, let’s examine why these systems failed to scale for our needs:
Relational Database Architectures
- Full table scans were slow, regardless of the storage engine used
- Maintaining proper dimension tables, indexes and aggregate tables was painful
- Parallelization of queries was not always supported or non-trivial
Massive NOSQL With Pre-Computation
- Supporting high dimensional OLAP requires pre-computing an exponentially large amount of data
Many of the questions you have in mind have already been asked in the this comment thread, but with not so many answers until now.
Original title and link: Druid: Distributed In-Memory OLAP Data Store (NoSQL databases © myNoSQL)
via: http://metamarketsgroup.com/blog/druid-part-i-real-time-analytics-at-a-billion-rows-per-second/
Wednesday, 16 March 2011
Oracle and MySQL Future
Curt Monash:
We’ll know they’re even more serious if they buy MySQL enhancements such as Infobright, dbShards, or Schooner MySQL
Why?
Original title and link: Oracle and MySQL Future (NoSQL databases © myNoSQL)