Machine learning: All content tagged as Machine learning in NoSQL databases and polyglot persistence
Tuesday, 16 April 2013
List of Machine Learning APIs
Below is a compilation of APIs that have benefited from Machine Learning in one way or another, we truly are living in the future so strap into your rocketship and prepare for blastoff.
Original title and link: List of Machine Learning APIs (©myNoSQL)
via: http://blog.mashape.com/post/48074869493/list-of-machine-learning-apis
Thursday, 11 April 2013
Machine Learning Cheatsheets
Created by Andreas Mueller:
Then you can head to this Quora thread to read a bit more about the pros and cons of the different classification algorithms.
Original title and link: Machine Learning Cheatsheets (©myNoSQL)
Wednesday, 25 April 2012
Machine Learning: Interesting Problems Are Never Off the Shelf
Aria Haghighi about the present and future of products based on machine learning:
But I think there’s an even bigger barrier beyond ingenious model design and engineering skills. In the case of machine translation and speech recognition, the problem being solved is straightforward to understand and well-specified. Many of the NLP technologies that I think will revolutionize consumer products over the next decade are much vaguer. How, exactly, can we take the excellent research in structured topic models, discourse processing, or sentiment analysis and make a mass-appeal consumer product?
Original title and link: Machine Learning: Interesting Problems Are Never Off the Shelf (©myNoSQL)
via: http://radar.oreilly.com/2012/04/great-machine-learning-products.html
Thursday, 23 February 2012
Skytree Launches a MacHine Learning Server
Skytree Server connects to any number of existing data stores, including Hadoop, and, says Hack, is tens of thousands of times faster than existing tools, performing in minutes tasks that would have taken hours or days. As of now, it’s tuned to five specific use cases the company says are the most common — recommendation systems, anomaly/outlier identification, predictive analytics, clustering and market segmentation, and similarity search.

There’s a limited but free Skytree version available on demand, so I expect to read some more about it soon.
Original title and link: Skytree Launches a MacHine Learning Server (©myNoSQL)
via: http://gigaom.com/cloud/skytree-intros-machine-learning-for-the-masses/
Monday, 20 February 2012
Characteristics of Machine Learning Models
Ricky Ho published yet another great article giving a high level summary of the algorithms used by different machine learning models:
- decision trees
- linear regression methods
- neural networks
- bayesian networks
- support vector machines
- nearest neighbor
For classification and regression problem, there are different choices of Machine Learning Models each of which can be viewed as a blackbox that solve the same problem. However, each model come from a different algorithm approaches and will perform differently under different data set. The best way is to use cross-validation to determine which model perform best on test data.
Original title and link: Characteristics of Machine Learning Models (©myNoSQL)
via: http://horicky.blogspot.com/2012/02/characteristics-of-machine-learning.html
Wednesday, 21 December 2011
Machine Learning, Hadoop, and Mahout
The presentation Cloudera Data Science team (Josh Wills, Tom Pierce, Jeff Hammerbacher) gave a couple of days ago on the state of machine learning and Hadoop.

Tuesday, 6 December 2011
Combining Hadoop MapReduce and MPI for Terascale Learning
Trying to combine MPI and Hadoop MapReduce for eliminating the drawbacks in each of them:
- MPI: The Allreduce function. The starting state for AllReduce is n nodes each with a number, and the end state is all nodes having the sum of all numbers.
- MapReduce: Conceptual simplicity. One easy to understand function is enough.
- MPI: No need to refactor code. You just sprinkle allreduce in a few locations in your single machine code.
- MapReduce: Data locality. We just hijack the MapReduce infrastructure to execute a map-only job where each process executes on the node with the data.
- MPI: Ability to use local storage (or RAM). Hadoop itself gobbles large amounts of RAM by default because it uses Java. And, in any case, you don’t have an effective large scale learning algorithm if it dies every time the data on a single node exceeds available RAM. Instead, you want to create a temporary file on the local disk and allow it to be cached in RAM by the OS, if that’s possible.
- MapReduce: Automatic cleanup of local resources. Temporary files are automatically nuked.
- MPI: Fast optimization approaches remain within the conceptual scope. Allreduce, because it’s a function call, does not conceptually limit online learning approaches as discussed below. MapReduce conceptually forces statistical query style algorithms. In practice, this can be walked around, but that’s annoying.
- MapReduce: Robustness. We don’t captures all the robustness of MapReduce which can succeed even during a gunfight in the datacenter. But we don’t generally need that: it’s easy to use Hadoop’s speculative execution approach to deal with the slow node problem and use delayed initialization to get around all startup failures giving you something with >99% success rate on a running time reliable to within a factor of 2.
Original title and link: Combining Hadoop MapReduce and MPI for Terascale Learning (©myNoSQL)
