Support Questions

Find answers, ask questions, and share your expertise

How to FILTER in nested foreach

avatar
Expert Contributor

I have many files. one of which,say, header.csv, serves as a header file,i.e., it contains primary key(in database analogy) which servers as foreign key in the rest of the files.

Now, I want to do FOREACH and FILTER as follows:

A =LOAD 'header.csv' AS (Id:chararray,f1:chararrat,f2:chararray);

B = LOAD 'file1.csv' AS (Id:chararray,t1:chararray);

C = LOAD 'file2.csv' AS (Id:chararray)

..........

D = foreach A {

file1_filtered = FILTER file1 BY Id == A.Id;

file2_filtered = FILTER file2 BY Id == A.Id;

GENERATE file1_filtered,file2_filtered;

};

Finally I need to access the relations file1_filtered and file2_filtered. When I follow this approach I got the following error:

"ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: Pig script failed to parse: <line 2651, column 28> expression is not a project expression: (Name: ScalarExpression) Type: null Uid: null)"

How can I achieve this in Pig?

1 ACCEPTED SOLUTION

avatar
Expert Contributor

We can achieve this using JOIN as follows.

1. JOIN A and B BY Id.

B_joined = JOIN A by Id, B by Id;

2. JOIN A and C by Id:

C_joined = JOIN A by Id, C by Id;

Now, we can get the required fields of A and C from their respective joined data sets as follows:

B_filtered = FOREACH B_joined GENERATE B::Id,B::t1;

C_filtered =FOREACH C_joined GENERATE C::Id;

View solution in original post

1 REPLY 1

avatar
Expert Contributor

We can achieve this using JOIN as follows.

1. JOIN A and B BY Id.

B_joined = JOIN A by Id, B by Id;

2. JOIN A and C by Id:

C_joined = JOIN A by Id, C by Id;

Now, we can get the required fields of A and C from their respective joined data sets as follows:

B_filtered = FOREACH B_joined GENERATE B::Id,B::t1;

C_filtered =FOREACH C_joined GENERATE C::Id;