Created 03-16-2017 12:34 PM
I have many files. one of which,say, header.csv, serves as a header file,i.e., it contains primary key(in database analogy) which servers as foreign key in the rest of the files.
Now, I want to do FOREACH and FILTER as follows:
A =LOAD 'header.csv' AS (Id:chararray,f1:chararrat,f2:chararray);
B = LOAD 'file1.csv' AS (Id:chararray,t1:chararray);
C = LOAD 'file2.csv' AS (Id:chararray)
..........
D = foreach A {
file1_filtered = FILTER file1 BY Id == A.Id;
file2_filtered = FILTER file2 BY Id == A.Id;
GENERATE file1_filtered,file2_filtered;
};
Finally I need to access the relations file1_filtered and file2_filtered. When I follow this approach I got the following error:
"ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: Pig script failed to parse: <line 2651, column 28> expression is not a project expression: (Name: ScalarExpression) Type: null Uid: null)"
How can I achieve this in Pig?
Created 03-17-2017 06:17 AM
We can achieve this using JOIN as follows.
1. JOIN A and B BY Id.
B_joined = JOIN A by Id, B by Id;
2. JOIN A and C by Id:
C_joined = JOIN A by Id, C by Id;
Now, we can get the required fields of A and C from their respective joined data sets as follows:
B_filtered = FOREACH B_joined GENERATE B::Id,B::t1;
C_filtered =FOREACH C_joined GENERATE C::Id;
Created 03-17-2017 06:17 AM
We can achieve this using JOIN as follows.
1. JOIN A and B BY Id.
B_joined = JOIN A by Id, B by Id;
2. JOIN A and C by Id:
C_joined = JOIN A by Id, C by Id;
Now, we can get the required fields of A and C from their respective joined data sets as follows:
B_filtered = FOREACH B_joined GENERATE B::Id,B::t1;
C_filtered =FOREACH C_joined GENERATE C::Id;