ALL COVERED TOPICS

NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter

NAVIGATE MAIN CATEGORIES

Close

data mining: All content tagged as data mining in NoSQL databases and polyglot persistence

Pretty cool introduction to data mining

Dr. Saed Sayad’s Introduction to data minning site is not what you’d expect. The uncommon way to introduce the different concepts and the relationships between them kept me busy for a while.

Original title and link: Pretty cool introduction to data mining (NoSQL database©myNoSQL)


Big Data and Data Mining: Garbage In, Garbage Out

From an article touting the value of data mining in the era of BigData:

Whatever data mining tool you use for creating sales leads, remember that the tool is only as good as the data in it.

The old principle of garbage in, garbage out applies to both small and big data. But with the advent of BigData, tools are not yet on par with the needs.

Original title and link: Big Data and Data Mining: Garbage In, Garbage Out (NoSQL databases © myNoSQL)

via: http://eba.benefitnews.com/news/davidson--futureofficenetwork-salesrockstar-sales-data-mining-2713512-1.html


Pig and Cheminformatics

Rajarshi Guha about Pig Latin:

While the implementation of such code [SMARTS matching and pharmacophore searching] is pretty straightforward, it’s still pretty heavyweight compared to say, performing SMARTS matching in a database via SQL. On the other hand, being able to perform these tasks in Pig Latin, lets us write much simpler code that can be integrated with other non-cheminformatics code in a flexible manner.

Extensibility over compactness.

Original title and link: Pig and Cheminformatics (NoSQL databases © myNoSQL)

via: http://blog.rguha.net/?p=748


Big Money for Companies That Can Analyze Big Data

three skills necessary for data-driven start-ups: data munging, the corralling and wrestling of data; modeling, the statistical analysis of data through algorithms; and visualization, the presentation of all the data. While all three are necessary for success, Driscoll believes that modeling and analysis through algorithms is what will determine winners and losers in Big Data.

Most of us know these under the names data mining and business intelligence.

“The secret sauce is predictive analysis powered by data,” said Driscoll. “It’s less about what you did and more about what you should do, and not even telling you what you should do … it should just do it for you.”

Sure thing. Everyone wants to predict stock market evolution, next football game score, etc.

What you actually need big data and data mining for is:

  • tracing, identifying, and understanding/explaining past events
  • modeling and validating future strategies

Original title and link: Big Money for Companies That Can Analyze Big Data (NoSQL databases © myNoSQL)

via: http://cloud.gigaom.com/2010/10/13/big-opportunities-await-companies-that-can-analyze-big-data/


Have you Heard of Kdb+?

Until two days ago, I didn’t know anything about Kdb+, a 16 year old solution,:

[…] a fast database for analyzing massive volumes of data.

Kdb+ is a unified database capturing and analyzing streaming and historical data.

I’ll have to read the papers to make sure I understand better what Kdb+ is:

But if you know something, don’t be shy and share it with us!

Update: I got this link from @lsbardel: A first look at kdb+, the article containing interesting info about kdb+:

kdb+ has embedded a Kx propriety language called q [… which] is a proprietary array processing language developed by Arthur Whitney. The language serves as the query language for kdb+. q evolved from APL as explained by its author in an ☞ interview.

The backbone of the q language is formed by atoms, lists, dictionaries and tables.

As any serious propriety software, kdb+ provides native interfaces in C/C++, Java, C# and Python.

[read the whole article]

via: http://kx.com/