<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Error executing DISTINCT Function in Pig in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Error-executing-DISTINCT-Function-in-Pig/m-p/101182#M13912</link>
    <description>&lt;P&gt;@&lt;A href="https://community.hortonworks.com/users/1870/vidyaranyakuppa.html"&gt;Vidya SK&lt;/A&gt;&lt;/P&gt;&lt;P&gt;DISTINCT in pig is a relational operator.So it will apply or perform on relations rather than fields or some other.consider the following.&lt;/P&gt;&lt;PRE&gt;given_input = load '/given/path' using PigStorage(',') as (col1 ,col2,col3);&lt;/PRE&gt;&lt;P&gt;consider the following situations.&lt;/P&gt;&lt;P&gt;1)Suppose i want to maintain unique values in col1 then,&lt;/P&gt;&lt;PRE&gt;unique_col1 = foreach given_input generate col1;
unique_values=  DISTINCT unique_col1;  (DISTINCT only perform on relations i.e unique_col1).&lt;/PRE&gt;&lt;P&gt;suppose  col1 contains data like &lt;/P&gt;&lt;PRE&gt;hortonworks
hortonworks
cloudera &lt;/PRE&gt;&lt;P&gt;then u get&lt;/P&gt;&lt;PRE&gt;cloudera
hortonworks&lt;/PRE&gt;&lt;P&gt;2)Suppose i want to maintain unique values in col1 and col2 then&lt;/P&gt;&lt;PRE&gt; unique_two_fields = forech given_input generate col1 ,col2;

unique_values = DISTINCT unique_two_fields; (DISTINCT only performs on relations)&lt;/PRE&gt;&lt;P&gt;suppose col1 and col2 contains data like&lt;/P&gt;&lt;PRE&gt;hortonworks,clouera
hortonworks,clouera
hortonwors,hortonworks&lt;/PRE&gt;&lt;P&gt;u get like &lt;/P&gt;&lt;PRE&gt;hortonworks,clouera
hortonwors,hortonworks

Like this we should get the data that u want to make unique in one relation and then apply the distinct operator.Suppose if u want to perform any aggregations then go for group and apply aggregations.&lt;/PRE&gt;</description>
    <pubDate>Sat, 02 Jan 2016 23:29:44 GMT</pubDate>
    <dc:creator>bsuresh</dc:creator>
    <dc:date>2016-01-02T23:29:44Z</dc:date>
    <item>
      <title>Error executing DISTINCT Function in Pig</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Error-executing-DISTINCT-Function-in-Pig/m-p/101180#M13910</link>
      <description>&lt;P&gt;I am trying to execute the following Pig Script.  DISTINCT is not working.  Am I missing anything.  Please help.&lt;/P&gt;&lt;P&gt;A = LOAD '/tmp/admin/data/gpa.txt' using PigStorage(',') AS (name, age, gpa);
B = group A by age;
C = foreach B generate ABS(SUM(A.gpa)), DISTINCT(A.name), MIN(A.gpa)+MAX(A.gpa)/2, group;  
dump C;&lt;/P&gt;&lt;PRE&gt;2016-01-02 04:03:21,049 [main] ERROR org.apache.pig.PigServer - exception during parsing: Error during parsing. Could not resolve DISTINCT using imports: [, java.lang., org.apache.pig.builtin., org.apache.pig.impl.builtin.]
Failed to parse: Pig script failed to parse: 
&amp;lt;file script.pig, line 6, column 40&amp;gt; Failed to generate logical plan. Nested exception: org.apache.pig.backend.executionengine.ExecException: ERROR 1070: Could not resolve DISTINCT using imports: [, java.lang., org.apache.pig.builtin., org.apache.pig.impl.builtin.]&lt;/PRE&gt;</description>
      <pubDate>Sat, 02 Jan 2016 12:07:21 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Error-executing-DISTINCT-Function-in-Pig/m-p/101180#M13910</guid>
      <dc:creator>vidyaranya_kupp</dc:creator>
      <dc:date>2016-01-02T12:07:21Z</dc:date>
    </item>
    <item>
      <title>Re: Error executing DISTINCT Function in Pig</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Error-executing-DISTINCT-Function-in-Pig/m-p/101181#M13911</link>
      <description>&lt;P&gt;I have included the following REGISTER statement.  Still I get the above error.&lt;/P&gt;&lt;P&gt;register '/usr/hdp/current/pig-client/lib/piggybank.jar';&lt;/P&gt;</description>
      <pubDate>Sat, 02 Jan 2016 12:19:00 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Error-executing-DISTINCT-Function-in-Pig/m-p/101181#M13911</guid>
      <dc:creator>vidyaranya_kupp</dc:creator>
      <dc:date>2016-01-02T12:19:00Z</dc:date>
    </item>
    <item>
      <title>Re: Error executing DISTINCT Function in Pig</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Error-executing-DISTINCT-Function-in-Pig/m-p/101182#M13912</link>
      <description>&lt;P&gt;@&lt;A href="https://community.hortonworks.com/users/1870/vidyaranyakuppa.html"&gt;Vidya SK&lt;/A&gt;&lt;/P&gt;&lt;P&gt;DISTINCT in pig is a relational operator.So it will apply or perform on relations rather than fields or some other.consider the following.&lt;/P&gt;&lt;PRE&gt;given_input = load '/given/path' using PigStorage(',') as (col1 ,col2,col3);&lt;/PRE&gt;&lt;P&gt;consider the following situations.&lt;/P&gt;&lt;P&gt;1)Suppose i want to maintain unique values in col1 then,&lt;/P&gt;&lt;PRE&gt;unique_col1 = foreach given_input generate col1;
unique_values=  DISTINCT unique_col1;  (DISTINCT only perform on relations i.e unique_col1).&lt;/PRE&gt;&lt;P&gt;suppose  col1 contains data like &lt;/P&gt;&lt;PRE&gt;hortonworks
hortonworks
cloudera &lt;/PRE&gt;&lt;P&gt;then u get&lt;/P&gt;&lt;PRE&gt;cloudera
hortonworks&lt;/PRE&gt;&lt;P&gt;2)Suppose i want to maintain unique values in col1 and col2 then&lt;/P&gt;&lt;PRE&gt; unique_two_fields = forech given_input generate col1 ,col2;

unique_values = DISTINCT unique_two_fields; (DISTINCT only performs on relations)&lt;/PRE&gt;&lt;P&gt;suppose col1 and col2 contains data like&lt;/P&gt;&lt;PRE&gt;hortonworks,clouera
hortonworks,clouera
hortonwors,hortonworks&lt;/PRE&gt;&lt;P&gt;u get like &lt;/P&gt;&lt;PRE&gt;hortonworks,clouera
hortonwors,hortonworks

Like this we should get the data that u want to make unique in one relation and then apply the distinct operator.Suppose if u want to perform any aggregations then go for group and apply aggregations.&lt;/PRE&gt;</description>
      <pubDate>Sat, 02 Jan 2016 23:29:44 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Error-executing-DISTINCT-Function-in-Pig/m-p/101182#M13912</guid>
      <dc:creator>bsuresh</dc:creator>
      <dc:date>2016-01-02T23:29:44Z</dc:date>
    </item>
  </channel>
</rss>

