Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

distinct operation with bags

avatar
Expert Contributor
x = LOAD '/pigdata/source.txt' using PigStorage(',') As (exchange:chararray, symbol:chararray, date:chararray, open:double, high:double, low:double, close:double, volume:long, adj_close:double);


y = GROUP x by symbol;

z2 = foreach y generate x.exchange as exchange1;
dump z2;
({(NASDAQ),(NASDAQ),(NASDAQ),(ICICI),(ICICI),(ICICI),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ)})
({(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ)})

z4 = distinct z2; 
dump z4; 
({(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ)})
({(NASDAQ),(NASDAQ),(NASDAQ),(ICICI),(ICICI),(ICICI),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ)})

clarification:- How distinct will work with bags?For tuples it is clear and what will happen if i am using distinct with bags?dump z4 is not clear to me.

1 ACCEPTED SOLUTION

avatar
Guru

First you need to convert your bags into tuples, then flatten and distinct.

This is done using pig's built-in function BagToTuple()

See this post for explanation and example:

https://community.hortonworks.com/questions/58271/using-pig-latin-to-replace-multiple-strings-from-s...

View solution in original post

3 REPLIES 3

avatar
Guru

First you need to convert your bags into tuples, then flatten and distinct.

This is done using pig's built-in function BagToTuple()

See this post for explanation and example:

https://community.hortonworks.com/questions/58271/using-pig-latin-to-replace-multiple-strings-from-s...

avatar
Expert Contributor

Hi @Greg Keys

Thanks for input.may be my question is not clear.what will happen when we use z4 = distinct z2;

How z4 is calculated from z2 is not clear.

avatar
Guru

Same answer: since z2 is a bag, you need to flatten it to a tuple to do a distinct on it.

For the data you are showing:

z3 = for each z2 FLATTEN(BagToTuple($0));

z4 = distinct z3;

The link gives the detailed explanation of why this is required.