Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Error executing DISTINCT Function in Pig

Solved Go to solution

Error executing DISTINCT Function in Pig

New Contributor

I am trying to execute the following Pig Script. DISTINCT is not working. Am I missing anything. Please help.

A = LOAD '/tmp/admin/data/gpa.txt' using PigStorage(',') AS (name, age, gpa); B = group A by age; C = foreach B generate ABS(SUM(A.gpa)), DISTINCT(A.name), MIN(A.gpa)+MAX(A.gpa)/2, group; dump C;

2016-01-02 04:03:21,049 [main] ERROR org.apache.pig.PigServer - exception during parsing: Error during parsing. Could not resolve DISTINCT using imports: [, java.lang., org.apache.pig.builtin., org.apache.pig.impl.builtin.]
Failed to parse: Pig script failed to parse: 
<file script.pig, line 6, column 40> Failed to generate logical plan. Nested exception: org.apache.pig.backend.executionengine.ExecException: ERROR 1070: Could not resolve DISTINCT using imports: [, java.lang., org.apache.pig.builtin., org.apache.pig.impl.builtin.]
1 ACCEPTED SOLUTION

Accepted Solutions

Re: Error executing DISTINCT Function in Pig

Rising Star

@Vidya SK

DISTINCT in pig is a relational operator.So it will apply or perform on relations rather than fields or some other.consider the following.

given_input = load '/given/path' using PigStorage(',') as (col1 ,col2,col3);

consider the following situations.

1)Suppose i want to maintain unique values in col1 then,

unique_col1 = foreach given_input generate col1;
unique_values=  DISTINCT unique_col1;  (DISTINCT only perform on relations i.e unique_col1).

suppose col1 contains data like

hortonworks
hortonworks
cloudera 

then u get

cloudera
hortonworks

2)Suppose i want to maintain unique values in col1 and col2 then

 unique_two_fields = forech given_input generate col1 ,col2;

unique_values = DISTINCT unique_two_fields; (DISTINCT only performs on relations)

suppose col1 and col2 contains data like

hortonworks,clouera
hortonworks,clouera
hortonwors,hortonworks

u get like

hortonworks,clouera
hortonwors,hortonworks

Like this we should get the data that u want to make unique in one relation and then apply the distinct operator.Suppose if u want to perform any aggregations then go for group and apply aggregations.
2 REPLIES 2

Re: Error executing DISTINCT Function in Pig

New Contributor

I have included the following REGISTER statement. Still I get the above error.

register '/usr/hdp/current/pig-client/lib/piggybank.jar';

Re: Error executing DISTINCT Function in Pig

Rising Star

@Vidya SK

DISTINCT in pig is a relational operator.So it will apply or perform on relations rather than fields or some other.consider the following.

given_input = load '/given/path' using PigStorage(',') as (col1 ,col2,col3);

consider the following situations.

1)Suppose i want to maintain unique values in col1 then,

unique_col1 = foreach given_input generate col1;
unique_values=  DISTINCT unique_col1;  (DISTINCT only perform on relations i.e unique_col1).

suppose col1 contains data like

hortonworks
hortonworks
cloudera 

then u get

cloudera
hortonworks

2)Suppose i want to maintain unique values in col1 and col2 then

 unique_two_fields = forech given_input generate col1 ,col2;

unique_values = DISTINCT unique_two_fields; (DISTINCT only performs on relations)

suppose col1 and col2 contains data like

hortonworks,clouera
hortonworks,clouera
hortonwors,hortonworks

u get like

hortonworks,clouera
hortonwors,hortonworks

Like this we should get the data that u want to make unique in one relation and then apply the distinct operator.Suppose if u want to perform any aggregations then go for group and apply aggregations.