Created 01-02-2016 04:07 AM
I am trying to execute the following Pig Script. DISTINCT is not working. Am I missing anything. Please help.
A = LOAD '/tmp/admin/data/gpa.txt' using PigStorage(',') AS (name, age, gpa); B = group A by age; C = foreach B generate ABS(SUM(A.gpa)), DISTINCT(A.name), MIN(A.gpa)+MAX(A.gpa)/2, group; dump C;
2016-01-02 04:03:21,049 [main] ERROR org.apache.pig.PigServer - exception during parsing: Error during parsing. Could not resolve DISTINCT using imports: [, java.lang., org.apache.pig.builtin., org.apache.pig.impl.builtin.] Failed to parse: Pig script failed to parse: <file script.pig, line 6, column 40> Failed to generate logical plan. Nested exception: org.apache.pig.backend.executionengine.ExecException: ERROR 1070: Could not resolve DISTINCT using imports: [, java.lang., org.apache.pig.builtin., org.apache.pig.impl.builtin.]
Created 01-02-2016 03:29 PM
DISTINCT in pig is a relational operator.So it will apply or perform on relations rather than fields or some other.consider the following.
given_input = load '/given/path' using PigStorage(',') as (col1 ,col2,col3);
consider the following situations.
1)Suppose i want to maintain unique values in col1 then,
unique_col1 = foreach given_input generate col1; unique_values= DISTINCT unique_col1; (DISTINCT only perform on relations i.e unique_col1).
suppose col1 contains data like
hortonworks hortonworks cloudera
then u get
cloudera hortonworks
2)Suppose i want to maintain unique values in col1 and col2 then
unique_two_fields = forech given_input generate col1 ,col2; unique_values = DISTINCT unique_two_fields; (DISTINCT only performs on relations)
suppose col1 and col2 contains data like
hortonworks,clouera hortonworks,clouera hortonwors,hortonworks
u get like
hortonworks,clouera hortonwors,hortonworks Like this we should get the data that u want to make unique in one relation and then apply the distinct operator.Suppose if u want to perform any aggregations then go for group and apply aggregations.
Created 01-02-2016 04:19 AM
I have included the following REGISTER statement. Still I get the above error.
register '/usr/hdp/current/pig-client/lib/piggybank.jar';
Created 01-02-2016 03:29 PM
DISTINCT in pig is a relational operator.So it will apply or perform on relations rather than fields or some other.consider the following.
given_input = load '/given/path' using PigStorage(',') as (col1 ,col2,col3);
consider the following situations.
1)Suppose i want to maintain unique values in col1 then,
unique_col1 = foreach given_input generate col1; unique_values= DISTINCT unique_col1; (DISTINCT only perform on relations i.e unique_col1).
suppose col1 contains data like
hortonworks hortonworks cloudera
then u get
cloudera hortonworks
2)Suppose i want to maintain unique values in col1 and col2 then
unique_two_fields = forech given_input generate col1 ,col2; unique_values = DISTINCT unique_two_fields; (DISTINCT only performs on relations)
suppose col1 and col2 contains data like
hortonworks,clouera hortonworks,clouera hortonwors,hortonworks
u get like
hortonworks,clouera hortonwors,hortonworks Like this we should get the data that u want to make unique in one relation and then apply the distinct operator.Suppose if u want to perform any aggregations then go for group and apply aggregations.