Credit Matthew Freeman.
Original title and link: Data science: Introducing the (para)normal distribution ( ©myNoSQL)
In Academic torrents: Almost 1.7TB of research data available, I complained about the lack of interesting open data. Dan Goldin’s Visualizing RunKeeper data in R is a good example of what I mean. While learning R, he used his own data about his running results. That made it both interesting and fun.
What better way to celebrate running 1000 miles in 2013 than dumping the data into R and generating some visualizations? It’s also a step in my quest to replace Excel with R.
I hope no one will argue that this is a more exciting experience than learning a new technology while using the Enron email archive.
Original title and link: Visualizing RunKeeper data in R ( ©myNoSQL)
Two great posts from mongolab covering details about the structure of MongoDB’s data on disk, how this is reflected in the results returned by the
dbStats API, and last some attempts to recover disk space:
Original title and link: MongoDB data storage structure, dbStats, and managing disk space ( ©myNoSQL)
A long list of links, books, and online courses for learning yourself some “data science”(the official project page is here, but I prefer the GitHub page).
Put together by Clare Corhell:
I didn’t want to wait. I wanted to work on things I care about now. Why sleep through grad school lectures tomorrow when you can hack on interesting questions today?
Original title and link: The Open Source Data Science Masters Curriculum ( ©myNoSQL)