Created 08-02-2016 08:04 PM
Hi ,
am joining two relations(withoutschema) in pig and want to pick particular columns from both relations.
A = load 'data1' using PigStorage(',');($0,$1...$8)
B = load 'data2' using PigStoarge(',');($0..$4);
C = foreach(join A by($1,$2),B by($1,$2)) generate $0,$1,$4,$5,(how to select B relation columns)
Created 08-03-2016 05:55 PM
here's my solution, considering that result of join is sum of all fields, then if you have 10 columns in A and 4 columns in B, your result row will 14, you can cherry pick columns 1, 2,3 from A and 11, 12, 13 from B.
grunt> fs -cat email_list.csv;
1,Christine,Romero,cromero0@eventbrite.com
2,Sara,Hansen,shansen1@tinypic.com
3,Albert,Rogers,arogers2@marriott.com
4,Kimberly,Morrison,kmorrison3@irs.gov
5,Eugene,Baker,ebaker4@cbslocal.com
6,Ann,Alexander,aalexander5@hhs.gov
7,Kathleen,Reed,kreed6@youtu.be
8,Todd,Scott,tscott7@deliciousdays.com
9,Sharon,Mccoy,smccoy8@nature.com
10,Evelyn,Rice,erice9@narod.ru
grunt> fs -cat gender_list.csv;
1,Christine,Romero,Female
2,Sara,Hansen,Female
3,Albert,Rogers,Male
4,Kimberly,Morrison,Female
5,Eugene,Baker,Male
6,Ann,Alexander,Female
7,Kathleen,Reed,Female
8,Todd,Scott,Male
9,Sharon,Mccoy,Female
10,Evelyn,Rice,Female
grunt> A = load 'email_list.csv' using PigStorage(',');
grunt> B = load 'gender_list.csv' using PigStorage(',');
grunt> C = join A by ($0, $1, $2), B by ($0, $1, $2);
grunt> dump C;
(1,Christine,Romero,cromero0@eventbrite.com,1,Christine,Romero,Female)
(10,Evelyn,Rice,erice9@narod.ru,10,Evelyn,Rice,Female)
(2,Sara,Hansen,shansen1@tinypic.com,2,Sara,Hansen,Female)
(3,Albert,Rogers,arogers2@marriott.com,3,Albert,Rogers,Male)
(4,Kimberly,Morrison,kmorrison3@irs.gov,4,Kimberly,Morrison,Female)
(5,Eugene,Baker,ebaker4@cbslocal.com,5,Eugene,Baker,Male)
(6,Ann,Alexander,aalexander5@hhs.gov,6,Ann,Alexander,Female)
(7,Kathleen,Reed,kreed6@youtu.be,7,Kathleen,Reed,Female)
(8,Todd,Scott,tscott7@deliciousdays.com,8,Todd,Scott,Male)
(9,Sharon,Mccoy,smccoy8@nature.com,9,Sharon,Mccoy,Female)
grunt> D = foreach C generate $0, $1, $2, $3, $7;
grunt> dump D;
(1,Christine,Romero,cromero0@eventbrite.com,Female)
(10,Evelyn,Rice,erice9@narod.ru,Female)
(2,Sara,Hansen,shansen1@tinypic.com,Female)
(3,Albert,Rogers,arogers2@marriott.com,Male)
(4,Kimberly,Morrison,kmorrison3@irs.gov,Female)
(5,Eugene,Baker,ebaker4@cbslocal.com,Male)
(6,Ann,Alexander,aalexander5@hhs.gov,Female)
(7,Kathleen,Reed,kreed6@youtu.be,Female)
(8,Todd,Scott,tscott7@deliciousdays.com,Male)
(9,Sharon,Mccoy,smccoy8@nature.com,Female)
Created 08-02-2016 08:33 PM
@jayaprakash gadi please try this.. I haven't tested yet.
C = foreach(join A by($1,$2),B by($1,$2)) generate B.*
Created 08-03-2016 12:03 AM
am able to pick particular columns from relations A & B where relations have schema but if relations doesn't have any schema then it's not working.
Created 08-03-2016 02:22 AM
After join the elements from A are at positions $0 .. $8, the elements from B are at $9 .. $13. Also, observe Performance enhencers: Use types, and Project early and often which in your case means to remove un-needed elements before the join.
Created 08-04-2016 06:39 PM
it worked ..thanks
Created 08-03-2016 05:55 PM
here's my solution, considering that result of join is sum of all fields, then if you have 10 columns in A and 4 columns in B, your result row will 14, you can cherry pick columns 1, 2,3 from A and 11, 12, 13 from B.
grunt> fs -cat email_list.csv;
1,Christine,Romero,cromero0@eventbrite.com
2,Sara,Hansen,shansen1@tinypic.com
3,Albert,Rogers,arogers2@marriott.com
4,Kimberly,Morrison,kmorrison3@irs.gov
5,Eugene,Baker,ebaker4@cbslocal.com
6,Ann,Alexander,aalexander5@hhs.gov
7,Kathleen,Reed,kreed6@youtu.be
8,Todd,Scott,tscott7@deliciousdays.com
9,Sharon,Mccoy,smccoy8@nature.com
10,Evelyn,Rice,erice9@narod.ru
grunt> fs -cat gender_list.csv;
1,Christine,Romero,Female
2,Sara,Hansen,Female
3,Albert,Rogers,Male
4,Kimberly,Morrison,Female
5,Eugene,Baker,Male
6,Ann,Alexander,Female
7,Kathleen,Reed,Female
8,Todd,Scott,Male
9,Sharon,Mccoy,Female
10,Evelyn,Rice,Female
grunt> A = load 'email_list.csv' using PigStorage(',');
grunt> B = load 'gender_list.csv' using PigStorage(',');
grunt> C = join A by ($0, $1, $2), B by ($0, $1, $2);
grunt> dump C;
(1,Christine,Romero,cromero0@eventbrite.com,1,Christine,Romero,Female)
(10,Evelyn,Rice,erice9@narod.ru,10,Evelyn,Rice,Female)
(2,Sara,Hansen,shansen1@tinypic.com,2,Sara,Hansen,Female)
(3,Albert,Rogers,arogers2@marriott.com,3,Albert,Rogers,Male)
(4,Kimberly,Morrison,kmorrison3@irs.gov,4,Kimberly,Morrison,Female)
(5,Eugene,Baker,ebaker4@cbslocal.com,5,Eugene,Baker,Male)
(6,Ann,Alexander,aalexander5@hhs.gov,6,Ann,Alexander,Female)
(7,Kathleen,Reed,kreed6@youtu.be,7,Kathleen,Reed,Female)
(8,Todd,Scott,tscott7@deliciousdays.com,8,Todd,Scott,Male)
(9,Sharon,Mccoy,smccoy8@nature.com,9,Sharon,Mccoy,Female)
grunt> D = foreach C generate $0, $1, $2, $3, $7;
grunt> dump D;
(1,Christine,Romero,cromero0@eventbrite.com,Female)
(10,Evelyn,Rice,erice9@narod.ru,Female)
(2,Sara,Hansen,shansen1@tinypic.com,Female)
(3,Albert,Rogers,arogers2@marriott.com,Male)
(4,Kimberly,Morrison,kmorrison3@irs.gov,Female)
(5,Eugene,Baker,ebaker4@cbslocal.com,Male)
(6,Ann,Alexander,aalexander5@hhs.gov,Female)
(7,Kathleen,Reed,kreed6@youtu.be,Female)
(8,Todd,Scott,tscott7@deliciousdays.com,Male)
(9,Sharon,Mccoy,smccoy8@nature.com,Female)