NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



Infobright: All content tagged as Infobright in NoSQL databases and polyglot persistence

An Infobright Column Store Use Case

Alex Pinkin describes the difference a column store, Infobright, made to solving their problems implementing dashboards, reports, and alerts:

What is the secret sauce in Infobright? First, its column oriented storage model which leads to smaller disk I/O. Second, its “knowledge grid” which is aggregate data Infobright calculates during data loading. Data is stored in 65K Data Packs. Data Pack nodes in the knowledge grid contain a set of statistics about the data that is stored in each of the Data Packs. For instance, Infobright can pre-calculate min, max, and avg value for each column in the pack during the load, as well as keep track of distinct values for columns with low cardinality. Such metadata can really help when executing a query since it’s possible to ignore data packs which have no data matching filter criteria. If a data pack can be ignored, there is no penalty associated with decompressing the data pack.

Compared to our MySQL implementation, Infobright eliminated the need to create and manage indexes, as well as to partition tables.

Original title and link: An Infobright Column Store Use Case (NoSQL database©myNoSQL)


2 Ways to Tackle Really Big Data

So there you have the two approaches to handling machine-generated-data. If you have vast archives, EMC, IBM Netezza, and Teradata all have purpose-build appliances that scale into the petabytes. You also could use Hadoop, which promises much lower cost, but you’ll have to develop separate processes and applications for that environment. You’ll also have to establish or outsource expertise on Hadoop deployment, management, and data processing. For fast-query needs, EMC, IBM Netezza, and Teradata all have fast, standard appliances and faster, high-performance appliances (and companies including Kognitio and Oracle have similar configuration choices). Column-oriented database and appliance vendors including HP Vertica, InfoBright, ParAccel, and Sybase have speed advantages inherent in their database architectures.

I’m wondering why Hadoop is mentioned just in passing considering how many large datasets it is already handling.

Original title and link: 2 Ways to Tackle Really Big Data (NoSQL database©myNoSQL)


Columnar DBMS Vendor Customer Metrics

Very interesting customer base numbers for Sybase IQ, Vertica, SAND Technology, Infobright published by Curt Monash—most are in the hundreds, except for Sybase IQ.

This got me thinking what numbers would NoSQL companies have—is any of them sharing such numbers?. I’d speculate that most of them are in the tens, with 10gen (MongoDB) leading the space with probably a couple of hundreds at best.

Original title and link: Columnar DBMS Vendor Customer Metrics (NoSQL database©myNoSQL)

Infobright Rough Query: Aproximating Query Results

Very interesting idea in the latest Infobright release:

The most interesting of the group might be Rough Query, which speeds the process of finding the needle in a multi-terabyte haystack by quickly pointing users to a relevant range of data, at which point they can drill down with more-complex queries. So, in theory, a query that might have taken 20 minutes before might now take just a few minutes because Rough Query works in seconds by using only the in-memory data and the subsequent search is against a much smaller data set.

Curt Monash provides more context about Rough Queries in his post:

To understand Infobright Rough Query, recall the essence of Infobright’s architecture:

Infobright’s core technical idea is to chop columns of data into 64K chunks, called data packs, and then store concise information about what’s in the packs. The more basic information is stored in data pack nodes,* one per data pack. If you’re familiar with Netezza zone maps, data pack nodes sound like zone maps on steroids. They store maximum values, minimum values, and (where meaningful) aggregates, and also encode information as to which intervals between the min and max values do or don’t contain actual data values.

I.e., a concise, imprecise representation of the database is always kept in RAM, in something Infobright calls the “Knowledge Grid.” Rough Query estimates query results based solely on the information in the Knowledge Grid — i.e., Rough Query always executes against information that’s already in RAM.

Rough Query is not meant for BI or reporting, but rather for initial investigations data scientists would perform against BigData.

Original title and link: Infobright Rough Query: Aproximating Query Results (NoSQL database©myNoSQL)


Druid: Distributed In-Memory OLAP Data Store

Over the last twelve months, we tried and failed to achieve scale and speed with relational databases (Greenplum, InfoBright, MySQL) and NoSQL offerings (HBase).

Stepping back from our two failures, let’s examine why these systems failed to scale for our needs:

  1. Relational Database Architectures

    • Full table scans were slow, regardless of the storage engine used
    • Maintaining proper dimension tables, indexes and aggregate tables was painful
    • Parallelization of queries was not always supported or non-trivial
  2. Massive NOSQL With Pre-Computation

    • Supporting high dimensional OLAP requires pre-computing an exponentially large amount of data

Many of the questions you have in mind have already been asked in the this comment thread, but with not so many answers until now.

Original title and link: Druid: Distributed In-Memory OLAP Data Store (NoSQL databases © myNoSQL)


Oracle and MySQL Future

Curt Monash:

We’ll know they’re even more serious if they buy MySQL enhancements such as Infobright, dbShards, or Schooner MySQL


Original title and link: Oracle and MySQL Future (NoSQL databases © myNoSQL)