## distinct operation with bags

Contributor
```x = LOAD '/pigdata/source.txt' using PigStorage(',') As (exchange:chararray, symbol:chararray, date:chararray, open:double, high:double, low:double, close:double, volume:long, adj_close:double);

y = GROUP x by symbol;

z2 = foreach y generate x.exchange as exchange1;
dump z2;
({(NASDAQ),(NASDAQ),(NASDAQ),(ICICI),(ICICI),(ICICI),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ)})
({(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ)})

z4 = distinct z2;
dump z4;
({(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ)})
({(NASDAQ),(NASDAQ),(NASDAQ),(ICICI),(ICICI),(ICICI),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ)})

```

clarification:- How distinct will work with bags?For tuples it is clear and what will happen if i am using distinct with bags?dump z4 is not clear to me.

Guru

First you need to convert your bags into tuples, then flatten and distinct.

This is done using pig's built-in function BagToTuple()

See this post for explanation and example:

Guru

Contributor

Thanks for input.may be my question is not clear.what will happen when we use z4 = distinct z2;

How z4 is calculated from z2 is not clear.

Guru

Same answer: since z2 is a bag, you need to flatten it to a tuple to do a distinct on it.

For the data you are showing:

z3 = for each z2 FLATTEN(BagToTuple(\$0));

z4 = distinct z3;

The link gives the detailed explanation of why this is required.

