Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

distinct operation with bags

Solved Go to solution

distinct operation with bags

Contributor
x = LOAD '/pigdata/source.txt' using PigStorage(',') As (exchange:chararray, symbol:chararray, date:chararray, open:double, high:double, low:double, close:double, volume:long, adj_close:double);


y = GROUP x by symbol;

z2 = foreach y generate x.exchange as exchange1;
dump z2;
({(NASDAQ),(NASDAQ),(NASDAQ),(ICICI),(ICICI),(ICICI),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ)})
({(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ)})

z4 = distinct z2; 
dump z4; 
({(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ)})
({(NASDAQ),(NASDAQ),(NASDAQ),(ICICI),(ICICI),(ICICI),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ)})

clarification:- How distinct will work with bags?For tuples it is clear and what will happen if i am using distinct with bags?dump z4 is not clear to me.

1 ACCEPTED SOLUTION

Accepted Solutions

Re: distinct operation with bags

Guru

First you need to convert your bags into tuples, then flatten and distinct.

This is done using pig's built-in function BagToTuple()

See this post for explanation and example:

https://community.hortonworks.com/questions/58271/using-pig-latin-to-replace-multiple-strings-from-s...

3 REPLIES 3

Re: distinct operation with bags

Guru

First you need to convert your bags into tuples, then flatten and distinct.

This is done using pig's built-in function BagToTuple()

See this post for explanation and example:

https://community.hortonworks.com/questions/58271/using-pig-latin-to-replace-multiple-strings-from-s...

Highlighted

Re: distinct operation with bags

Contributor

Hi @Greg Keys

Thanks for input.may be my question is not clear.what will happen when we use z4 = distinct z2;

How z4 is calculated from z2 is not clear.

Re: distinct operation with bags

Guru

Same answer: since z2 is a bag, you need to flatten it to a tuple to do a distinct on it.

For the data you are showing:

z3 = for each z2 FLATTEN(BagToTuple($0));

z4 = distinct z3;

The link gives the detailed explanation of why this is required.

Don't have an account?
Coming from Hortonworks? Activate your account here