Created 12-01-2016 08:11 AM
x = LOAD '/pigdata/source.txt' using PigStorage(',') As (exchange:chararray, symbol:chararray, date:chararray, open:double, high:double, low:double, close:double, volume:long, adj_close:double); y = GROUP x by symbol; z2 = foreach y generate x.exchange as exchange1; dump z2; ({(NASDAQ),(NASDAQ),(NASDAQ),(ICICI),(ICICI),(ICICI),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ)}) ({(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ)}) z4 = distinct z2; dump z4; ({(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ)}) ({(NASDAQ),(NASDAQ),(NASDAQ),(ICICI),(ICICI),(ICICI),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ)})
clarification:- How distinct will work with bags?For tuples it is clear and what will happen if i am using distinct with bags?dump z4 is not clear to me.
Created 12-01-2016 02:24 PM
First you need to convert your bags into tuples, then flatten and distinct.
This is done using pig's built-in function BagToTuple()
See this post for explanation and example:
Created 12-01-2016 02:24 PM
First you need to convert your bags into tuples, then flatten and distinct.
This is done using pig's built-in function BagToTuple()
See this post for explanation and example:
Created 12-01-2016 02:51 PM
Hi @Greg Keys
Thanks for input.may be my question is not clear.what will happen when we use z4 = distinct z2;
How z4 is calculated from z2 is not clear.
Created 12-01-2016 04:00 PM
Same answer: since z2 is a bag, you need to flatten it to a tuple to do a distinct on it.
For the data you are showing:
z3 = for each z2 FLATTEN(BagToTuple($0));
z4 = distinct z3;
The link gives the detailed explanation of why this is required.