<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: distinct operation with bags in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/distinct-operation-with-bags/m-p/132475#M47695</link>
    <description>&lt;P&gt;Same answer: since z2 is a bag, you need to flatten it to a tuple to do a distinct on it.&lt;/P&gt;&lt;P&gt;For the data you are showing:&lt;/P&gt;&lt;P&gt;z3 = for each z2 FLATTEN(BagToTuple($0));&lt;/P&gt;&lt;P&gt;z4 = distinct z3;&lt;/P&gt;&lt;P&gt;The link gives the detailed explanation of why this is required.&lt;/P&gt;</description>
    <pubDate>Fri, 02 Dec 2016 00:00:12 GMT</pubDate>
    <dc:creator>gkeys</dc:creator>
    <dc:date>2016-12-02T00:00:12Z</dc:date>
    <item>
      <title>distinct operation with bags</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/distinct-operation-with-bags/m-p/132472#M47692</link>
      <description>&lt;PRE&gt;x = LOAD '/pigdata/source.txt' using PigStorage(',') As (exchange:chararray, symbol:chararray, date:chararray, open:double, high:double, low:double, close:double, volume:long, adj_close:double);


y = GROUP x by symbol;

z2 = foreach y generate x.exchange as exchange1;
dump z2;
({(NASDAQ),(NASDAQ),(NASDAQ),(ICICI),(ICICI),(ICICI),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ)})
({(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ)})

z4 = distinct z2; 
dump z4; 
({(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ)})
({(NASDAQ),(NASDAQ),(NASDAQ),(ICICI),(ICICI),(ICICI),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ),(NASDAQ)})

&lt;/PRE&gt;&lt;P&gt;&lt;STRONG&gt;clarification&lt;/STRONG&gt;:-
How distinct will work with bags?For tuples it is clear and what will happen if i am using distinct with bags?dump z4 is not clear to me.&lt;/P&gt;</description>
      <pubDate>Thu, 01 Dec 2016 16:11:46 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/distinct-operation-with-bags/m-p/132472#M47692</guid>
      <dc:creator>vamsi123</dc:creator>
      <dc:date>2016-12-01T16:11:46Z</dc:date>
    </item>
    <item>
      <title>Re: distinct operation with bags</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/distinct-operation-with-bags/m-p/132473#M47693</link>
      <description>&lt;P&gt;First you need to convert your bags into tuples, then flatten and distinct.&lt;/P&gt;&lt;P&gt;This is done using pig's built-in function &lt;STRONG&gt;BagToTuple()&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;See this post for explanation and example:&lt;/P&gt;&lt;P&gt;&lt;A href="https://community.hortonworks.com/questions/58271/using-pig-latin-to-replace-multiple-strings-from-s.html" target="_blank"&gt;https://community.hortonworks.com/questions/58271/using-pig-latin-to-replace-multiple-strings-from-s.html&lt;/A&gt; &lt;/P&gt;</description>
      <pubDate>Thu, 01 Dec 2016 22:24:05 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/distinct-operation-with-bags/m-p/132473#M47693</guid>
      <dc:creator>gkeys</dc:creator>
      <dc:date>2016-12-01T22:24:05Z</dc:date>
    </item>
    <item>
      <title>Re: distinct operation with bags</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/distinct-operation-with-bags/m-p/132474#M47694</link>
      <description>&lt;P&gt;Hi &lt;A rel="user" href="https://community.cloudera.com/users/11288/gkeys.html" nodeid="11288"&gt;@Greg Keys&lt;/A&gt;&lt;/P&gt;&lt;P&gt;Thanks for input.may be my question is not clear.what will happen when we use z4 = distinct z2;&lt;/P&gt;&lt;P&gt;How z4 is calculated from z2 is not clear.&lt;/P&gt;</description>
      <pubDate>Thu, 01 Dec 2016 22:51:06 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/distinct-operation-with-bags/m-p/132474#M47694</guid>
      <dc:creator>vamsi123</dc:creator>
      <dc:date>2016-12-01T22:51:06Z</dc:date>
    </item>
    <item>
      <title>Re: distinct operation with bags</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/distinct-operation-with-bags/m-p/132475#M47695</link>
      <description>&lt;P&gt;Same answer: since z2 is a bag, you need to flatten it to a tuple to do a distinct on it.&lt;/P&gt;&lt;P&gt;For the data you are showing:&lt;/P&gt;&lt;P&gt;z3 = for each z2 FLATTEN(BagToTuple($0));&lt;/P&gt;&lt;P&gt;z4 = distinct z3;&lt;/P&gt;&lt;P&gt;The link gives the detailed explanation of why this is required.&lt;/P&gt;</description>
      <pubDate>Fri, 02 Dec 2016 00:00:12 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/distinct-operation-with-bags/m-p/132475#M47695</guid>
      <dc:creator>gkeys</dc:creator>
      <dc:date>2016-12-02T00:00:12Z</dc:date>
    </item>
  </channel>
</rss>

