NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



machine learning: All content tagged as machine learning in NoSQL databases and polyglot persistence

7 books for Machine Learning with R

Jason Brownlee put together a list of 7 machine learning books that make use of R:

In this post I want to point out some resources you can use to get started in R for machine learning.

Original title and link: 7 books for Machine Learning with R (NoSQL database©myNoSQL)


A Tour of Machine Learning Algorithms

After we understand the type of machine learning problem we are working with, we can think about the type of data to collect and the types of machine learning algorithms we can try. In this post we take a tour of the most popular machine learning algorithms. It is useful to tour the main algorithms to get a general idea of what methods are available.


Original title and link: A Tour of Machine Learning Algorithms (NoSQL database©myNoSQL)


How to Implement a Machine Learning Algorithm

Jason Brownlee published an excerpt from his “Small Projects Methodology: Learn and Practive Applied Machine Learning” focusing on the process of implementing machine learning algorithms:

Implementing a machine learning algorithm in code can teach you a lot about the algorithm and how it works.

In this post you will learn how to be effective at implementing machine learning algorithms and how to maximize your learning from these projects.

If you think about it, the process of implementing machine learning algorithms is in many ways similar to how machine learning works.

Original title and link: How to Implement a Machine Learning Algorithm (NoSQL database©myNoSQL)


The Machine learning skills pyramid

Created by Steve Geringer:

ML Skills Pyramid v1.0

Daniel Gutierrez

Original title and link: The Machine learning skills pyramid (NoSQL database©myNoSQL)

Vowpal Wabbit - Open source machine learning

Found by Daniel Gutierrez from Inside BigData:

Vowpal Wabbit (aka VW) is an open source fast out-of-core learning system library and program started and led by John Langford who works at Microsoft Research New York. Vowpal Wabbit is notable as an efficient scalable implementation of online machine learning and support for a number of machine learning reductions, importance weighting, and a selection of different loss functions and optimization algorithms.

The project is on GitHub, there’s a short wiki page, and a presentation.


Bill Gates: Four Areas of Technology I’d look into

Bill Gates in a tweet-based interview:

Q: @fesja: @BillGates if you were 20 years old now, what would you do? which area?

A: Bill Gates: When it comes to technology, there are four areas where I think a lot of exciting things will happen in the coming decades: big data, machine learning, genomics, and ubiquitous computing. So if I were 20 years old today, I’d be looking into one (or maybe more!) of those fields.

To say that Bill Gates always had a great understanding of technology trends would be an understatement.

Original title and link: Bill Gates: Four Areas of Technology I’d look into (NoSQL database©myNoSQL)

List of Machine Learning APIs

Below is a compilation of APIs that have benefited from Machine Learning in one way or another, we truly are living in the future so strap into your rocketship and prepare for blastoff.

Original title and link: List of Machine Learning APIs (NoSQL database©myNoSQL)


Machine Learning Cheatsheets

Created by Andreas Mueller:


Then you can head to this Quora thread to read a bit more about the pros and cons of the different classification algorithms.

Original title and link: Machine Learning Cheatsheets (NoSQL database©myNoSQL)

Machine Learning: Interesting Problems Are Never Off the Shelf

Aria Haghighi about the present and future of products based on machine learning:

But I think there’s an even bigger barrier beyond ingenious model design and engineering skills. In the case of machine translation and speech recognition, the problem being solved is straightforward to understand and well-specified. Many of the NLP technologies that I think will revolutionize consumer products over the next decade are much vaguer. How, exactly, can we take the excellent research in structured topic models, discourse processing, or sentiment analysis and make a mass-appeal consumer product?

Original title and link: Machine Learning: Interesting Problems Are Never Off the Shelf (NoSQL database©myNoSQL)


Skytree Launches a MacHine Learning Server

Skytree Server connects to any number of existing data stores, including Hadoop, and, says Hack, is tens of thousands of times faster than existing tools, performing in minutes tasks that would have taken hours or days. As of now, it’s tuned to five specific use cases the company says are the most common — recommendation systems, anomaly/outlier identification, predictive analytics, clustering and market segmentation, and similarity search.

Skytree Server Architecture

There’s a limited but free Skytree version available on demand, so I expect to read some more about it soon.

Original title and link: Skytree Launches a MacHine Learning Server (NoSQL database©myNoSQL)


Characteristics of Machine Learning Models

Ricky Ho published yet another great article giving a high level summary of the algorithms used by different machine learning models:

  • decision trees
  • linear regression methods
  • neural networks
  • bayesian networks
  • support vector machines
  • nearest neighbor

For classification and regression problem, there are different choices of Machine Learning Models each of which can be viewed as a blackbox that solve the same problem. However, each model come from a different algorithm approaches and will perform differently under different data set. The best way is to use cross-validation to determine which model perform best on test data.

Original title and link: Characteristics of Machine Learning Models (NoSQL database©myNoSQL)


Machine Learning, Hadoop, and Mahout

The presentation Cloudera Data Science team (Josh Wills, Tom Pierce, Jeff Hammerbacher) gave a couple of days ago on the state of machine learning and Hadoop.

Supervised Learning Workflow