The Academic Torrents initiative:
The result is a scalable, secure, and fault-tolerant repository for data,
with blazing fast download speeds.
Over the weekend, I’ve played a bit with the Python data crunching toolkit:pandas,NumPy, and matlibplot; truth is that I’ve started with A pandas cookbook by Julia Evans, but ended up spending most of the time trying to get the latest version of matplotlib installed on OS X and convincing it to display XKCD styled plots. This aside, after getting everything’s working, I got stuck at the “what now” phase — what data can I use to play with? This situation reminded me of past experiences when trying to learn or build demos around data.
We’re talking about Big Data and the lack of trained people in this space. But if you look around, you’ll realize that: 1) there’s very little data that those interested to learn can use; and 2) most of it is boring.
Plus I’m sure not everyone is inclined to spend months hacking OkCupid and having 88 dates to validate their methods and algorithms.
Original title and link: Academic torrents: Almost 1.7TB of research data available