ALL COVERED TOPICS

NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter

NAVIGATE MAIN CATEGORIES

Close

Machine learning: All content tagged as Machine learning in NoSQL databases and polyglot persistence

How to Implement a Machine Learning Algorithm

Jason Brownlee published an excerpt from his “Small Projects Methodology: Learn and Practive Applied Machine Learning” focusing on the process of implementing machine learning algorithms:

Implementing a machine learning algorithm in code can teach you a lot about the algorithm and how it works.

In this post you will learn how to be effective at implementing machine learning algorithms and how to maximize your learning from these projects.

If you think about it, the process of implementing machine learning algorithms is in many ways similar to how machine learning works.

Original title and link: How to Implement a Machine Learning Algorithm (NoSQL database©myNoSQL)

via: http://machinelearningmastery.com/how-to-implement-a-machine-learning-algorithm/


The Machine learning skills pyramid

Created by Steve Geringer:

ML Skills Pyramid v1.0

Daniel Gutierrez

Original title and link: The Machine learning skills pyramid (NoSQL database©myNoSQL)


Vowpal Wabbit - Open source machine learning

Found by Daniel Gutierrez from Inside BigData:

Vowpal Wabbit (aka VW) is an open source fast out-of-core learning system library and program started and led by John Langford who works at Microsoft Research New York. Vowpal Wabbit is notable as an efficient scalable implementation of online machine learning and support for a number of machine learning reductions, importance weighting, and a selection of different loss functions and optimization algorithms.

The project is on GitHub, there’s a short wiki page, and a presentation.

via: http://inside-bigdata.com/2013/12/05/vowpal-wabbit/


Bill Gates: Four Areas of Technology I’d look into

Bill Gates in a tweet-based interview:

Q: @fesja: @BillGates if you were 20 years old now, what would you do? which area?

A: Bill Gates: When it comes to technology, there are four areas where I think a lot of exciting things will happen in the coming decades: big data, machine learning, genomics, and ubiquitous computing. So if I were 20 years old today, I’d be looking into one (or maybe more!) of those fields.

To say that Bill Gates always had a great understanding of technology trends would be an understatement.

Original title and link: Bill Gates: Four Areas of Technology I’d look into (NoSQL database©myNoSQL)


List of Machine Learning APIs

Below is a compilation of APIs that have benefited from Machine Learning in one way or another, we truly are living in the future so strap into your rocketship and prepare for blastoff.

Original title and link: List of Machine Learning APIs (NoSQL database©myNoSQL)

via: http://blog.mashape.com/post/48074869493/list-of-machine-learning-apis


Machine Learning Cheatsheets

Created by Andreas Mueller:

machine_learning_cheatsheet

Then you can head to this Quora thread to read a bit more about the pros and cons of the different classification algorithms.

Original title and link: Machine Learning Cheatsheets (NoSQL database©myNoSQL)


Machine Learning: Interesting Problems Are Never Off the Shelf

Aria Haghighi about the present and future of products based on machine learning:

But I think there’s an even bigger barrier beyond ingenious model design and engineering skills. In the case of machine translation and speech recognition, the problem being solved is straightforward to understand and well-specified. Many of the NLP technologies that I think will revolutionize consumer products over the next decade are much vaguer. How, exactly, can we take the excellent research in structured topic models, discourse processing, or sentiment analysis and make a mass-appeal consumer product?

Original title and link: Machine Learning: Interesting Problems Are Never Off the Shelf (NoSQL database©myNoSQL)

via: http://radar.oreilly.com/2012/04/great-machine-learning-products.html


Skytree Launches a MacHine Learning Server

Skytree Server connects to any number of existing data stores, including Hadoop, and, says Hack, is tens of thousands of times faster than existing tools, performing in minutes tasks that would have taken hours or days. As of now, it’s tuned to five specific use cases the company says are the most common — recommendation systems, anomaly/outlier identification, predictive analytics, clustering and market segmentation, and similarity search.

Skytree Server Architecture

There’s a limited but free Skytree version available on demand, so I expect to read some more about it soon.

Original title and link: Skytree Launches a MacHine Learning Server (NoSQL database©myNoSQL)

via: http://gigaom.com/cloud/skytree-intros-machine-learning-for-the-masses/


Characteristics of Machine Learning Models

Ricky Ho published yet another great article giving a high level summary of the algorithms used by different machine learning models:

  • decision trees
  • linear regression methods
  • neural networks
  • bayesian networks
  • support vector machines
  • nearest neighbor

For classification and regression problem, there are different choices of Machine Learning Models each of which can be viewed as a blackbox that solve the same problem. However, each model come from a different algorithm approaches and will perform differently under different data set. The best way is to use cross-validation to determine which model perform best on test data.

Original title and link: Characteristics of Machine Learning Models (NoSQL database©myNoSQL)

via: http://horicky.blogspot.com/2012/02/characteristics-of-machine-learning.html


Machine Learning, Hadoop, and Mahout

The presentation Cloudera Data Science team (Josh Wills, Tom Pierce, Jeff Hammerbacher) gave a couple of days ago on the state of machine learning and Hadoop.

Supervised Learning Workflow


Combining Hadoop MapReduce and MPI for Terascale Learning

Trying to combine MPI and Hadoop MapReduce for eliminating the drawbacks in each of them:

  1. MPI: The Allreduce function. The starting state for AllReduce is n nodes each with a number, and the end state is all nodes having the sum of all numbers.
  2. MapReduce: Conceptual simplicity. One easy to understand function is enough.
  3. MPI: No need to refactor code. You just sprinkle allreduce in a few locations in your single machine code.
  4. MapReduce: Data locality. We just hijack the MapReduce infrastructure to execute a map-only job where each process executes on the node with the data.
  5. MPI: Ability to use local storage (or RAM). Hadoop itself gobbles large amounts of RAM by default because it uses Java. And, in any case, you don’t have an effective large scale learning algorithm if it dies every time the data on a single node exceeds available RAM. Instead, you want to create a temporary file on the local disk and allow it to be cached in RAM by the OS, if that’s possible.
  6. MapReduce: Automatic cleanup of local resources. Temporary files are automatically nuked.
  7. MPI: Fast optimization approaches remain within the conceptual scope. Allreduce, because it’s a function call, does not conceptually limit online learning approaches as discussed below. MapReduce conceptually forces statistical query style algorithms. In practice, this can be walked around, but that’s annoying.
  8. MapReduce: Robustness. We don’t captures all the robustness of MapReduce which can succeed even during a gunfight in the datacenter. But we don’t generally need that: it’s easy to use Hadoop’s speculative execution approach to deal with the slow node problem and use delayed initialization to get around all startup failures giving you something with >99% success rate on a running time reliable to within a factor of 2.

Original title and link: Combining Hadoop MapReduce and MPI for Terascale Learning (NoSQL database©myNoSQL)

via: http://hunch.net/?p=2094