NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



Academic torrents: Almost 1.7TB of research data available

The Academic Torrents initiative:

The result is a scalable, secure, and fault-tolerant repository for data, with blazing fast download speeds.

Over the weekend, I’ve played a bit with the Python data crunching toolkit:pandas,NumPy, and matlibplot; truth is that I’ve started with A pandas cookbook by Julia Evans, but ended up spending most of the time trying to get the latest version of matplotlib installed on OS X and convincing it to display XKCD styled plots. This aside, after getting everything’s working, I got stuck at the “what now” phase — what data can I use to play with? This situation reminded me of past experiences when trying to learn or build demos around data.

We’re talking about Big Data and the lack of trained people in this space. But if you look around, you’ll realize that: 1) there’s very little data that those interested to learn can use; and 2) most of it is boring.

Plus I’m sure not everyone is inclined to spend months hacking OkCupid and having 88 dates to validate their methods and algorithms.

Original title and link: Academic torrents: Almost 1.7TB of research data available (NoSQL database©myNoSQL)