Playing With Hadoop Pig
Anything missing from Pig?
[…] the following SQL operations can be translated as follows. We put the order in which the operations have to be run between parenthesis.
SELECT id, name:resultData = FOREACH limitData GENERATE id, nameFROM Table:data = LOAD ‘person.csv’ USING PigStorage(‘,’) AS (id:int, name:chararray, age:int)WHERE a=1:filteredData = FILTER data BY a=1ORDER BY age DESC:orderedData = ORDER filteredData BY age DESCLIMIT 10:limitData = LIMIT orderedData 10One can also use left join and join as follows:
- JOIN: join_data:
JOIN data1 BY id1, data2 BY id2- LEFT JOIN:
left_join_data = JOIN data1 BY id1 LEFT OUTER, data2 BY id2
Original title and link: Playing With Hadoop Pig (©myNoSQL)
via: http://chimpler.wordpress.com/2013/02/04/playing-with-hadoop-pig/