datascience: All content tagged as datascience in NoSQL databases and polyglot persistence
Very often I jump to using Python for any sort of data processing. And I totally forget about the powerful tools available on pretty much every Linux/Mac box1.
Jeroen Janssens’s 7 command-line tools for data science presents 6 command line tools for fetching, filtering and transforming data: jq, json2csv, csvkit, scrape, xml2json, sample.
Then Leonardo Trabuco’s Working with data on the command line gives a quick roundup of the standard Linux tools:
If you understand the philosophy of Linux tools and get familiar with some of the tools listed above — I’ve never got too deep into
sed almost always tricks me, you’ll be able to do some nice data processing experimentation directly from the command line.
The one excuse I usually find for myself when doing this is that debugging command line tools behavior is not as pleasant as debugging some Python scripts. _Sort of an OK argument, but still an excuse._ ↩
Original title and link: Data processing command line-style ( ©myNoSQL)
In Academic torrents: Almost 1.7TB of research data available, I complained about the lack of interesting open data. Dan Goldin’s Visualizing RunKeeper data in R is a good example of what I mean. While learning R, he used his own data about his running results. That made it both interesting and fun.
What better way to celebrate running 1000 miles in 2013 than dumping the data into R and generating some visualizations? It’s also a step in my quest to replace Excel with R.
I hope no one will argue that this is a more exciting experience than learning a new technology while using the Enron email archive.
Original title and link: Visualizing RunKeeper data in R ( ©myNoSQL)