PIG: All content tagged as PIG in NoSQL databases and polyglot persistence
In 4 years of writing this blog I haven’t seen such a prolific month:
- Apache Hadoop 2.2.0 (more links here)
- Apache HBase 0.96 (here and here)
- Apache Hive 0.12 (more links here)
- Apache Ambari 1.4.1
- Apache Pig 0.12
- Apache Oozie 4.0.0
- Plus Presto.
Actually I don’t think I’ve ever seen such an ecosystem like the one created around Hadoop.
Original title and link: A prolific season for Hadoop and its ecosystem ( ©myNoSQL)
I’m almost always enjoying the lessons learned-style presentations from Twitter’s people. The slides below, by Jimmy Lin and Dmitriy Ryaboy, have been used at HadoopSummit. Besides the technical and practical details, there are two things that I really like:
DJ Patil: “It’s impossible to overstress this: 80% of the work in any data project is in cleaning the data”
and then the reality check:
- Your boss says something vague
- You think very hard on how to move the needle
- Where’s the data?
- What’s in this dataset?
- What’s all the f#$#$ crap in the data?
- Clean the data
- Run some off-the-shelf data mining algorithm
- Productionize, act on the insight
- Rinse, repeat
The list of releases I wanted to post about has been growing fast these last couple of weeks, so instead of waiting leaving it to Here it is (in no particular order1):
- (Jan.2nd) Cassandra 1.2 — announcement on DataStax’s blog. I’m currently learning and working on a post looking at what’s new in Cassandra 1.2.
- (Jan.10th) Apache Pig 0.10.1 — Hortonworks wrote about it
- (Jan.10th) DataStax Community Edition 1.2 and OpsCenter 2.1.3 — DataStax announcement
- (Jan.10th) CouchDB 1.0.4, 1.1.2, and 1.2.1 — releases fixing some security vulnerabilities
(Jan.11th) MongoDB 2.3.2 unstable — announcement. This dev release includes support for full text indexing. For more details you can check:
- MongoDB Full Text Search Explained and MongoDB Text Search Tutorial
- Full text search in MongoDB: details about supported languages and queries
- Indexing a Markdown blog using MongoDB full text indexing
- Short demo of MongoDB text search and hashed shard keys
- (Jan.12th) Apache HBase 0.94.4 — announcement and release notes
- (Jan.14th) Apache Hive 0.10.0: Hortonworks’s post about it
- (Jan.15th) Hortonworks Data Platform 1.2 featuring Apache Amabari — official PR announcement
- (Jan.16th) Redis 2.6.9 — release notes
- (Jan.16th) HyperDex 1.0RC1 — no docs
- (Jan.16th) Klout’s Brickhouse — announcement:
[…] an open source project extending Hadoop and Hive with a collection of useful user-defined-functions. Its aim is to make the Hive Big Data developer more productive, and to enable scalable and robust dataflows.
I’ve tried to order it chronologically, but most probably I’ve failed. ↩
Original title and link: 11 Interesting Releases From the First Weeks of January ( ©myNoSQL)