data mining: All content tagged as data mining in NoSQL databases and polyglot persistence
Wednesday, 1 June 2011
Big Data and Data Mining: Garbage In, Garbage Out
From an article touting the value of data mining in the era of BigData:
Whatever data mining tool you use for creating sales leads, remember that the tool is only as good as the data in it.
The old principle of garbage in, garbage out applies to both small and big data. But with the advent of BigData, tools are not yet on par with the needs.
Original title and link: Big Data and Data Mining: Garbage In, Garbage Out (NoSQL databases © myNoSQL)
Monday, 10 January 2011
Pig and Cheminformatics
Rajarshi Guha about Pig Latin:
While the implementation of such code [SMARTS matching and pharmacophore searching] is pretty straightforward, it’s still pretty heavyweight compared to say, performing SMARTS matching in a database via SQL. On the other hand, being able to perform these tasks in Pig Latin, lets us write much simpler code that can be integrated with other non-cheminformatics code in a flexible manner.
Extensibility over compactness.
Original title and link: Pig and Cheminformatics (NoSQL databases © myNoSQL)
Thursday, 14 October 2010
Big Money for Companies That Can Analyze Big Data
three skills necessary for data-driven start-ups: data munging, the corralling and wrestling of data; modeling, the statistical analysis of data through algorithms; and visualization, the presentation of all the data. While all three are necessary for success, Driscoll believes that modeling and analysis through algorithms is what will determine winners and losers in Big Data.
Most of us know these under the names data mining and business intelligence.
“The secret sauce is predictive analysis powered by data,” said Driscoll. “It’s less about what you did and more about what you should do, and not even telling you what you should do … it should just do it for you.”
Sure thing. Everyone wants to predict stock market evolution, next football game score, etc.
What you actually need big data and data mining for is:
- tracing, identifying, and understanding/explaining past events
- modeling and validating future strategies
Original title and link: Big Money for Companies That Can Analyze Big Data (NoSQL databases © myNoSQL)
via: http://cloud.gigaom.com/2010/10/13/big-opportunities-await-companies-that-can-analyze-big-data/
Tuesday, 8 December 2009
Have you Heard of Kdb+?
Until two days ago, I didn’t know anything about Kdb+, a 16 year old solution,:
[…] a fast database for analyzing massive volumes of data.
Kdb+ is a unified database capturing and analyzing streaming and historical data.
I’ll have to read the papers to make sure I understand better what Kdb+ is:
But if you know something, don’t be shy and share it with us!
Update: I got this link from @lsbardel: A first look at kdb+, the article containing interesting info about kdb+:
kdb+ has embedded a Kx propriety language called q [… which] is a proprietary array processing language developed by Arthur Whitney. The language serves as the query language for kdb+. q evolved from APL as explained by its author in an ☞ interview.
The backbone of the q language is formed by atoms, lists, dictionaries and tables.
As any serious propriety software, kdb+ provides native interfaces in C/C++, Java, C# and Python.