Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

How to achive merge two relations using pig scripts with no duplicates

How to achive merge two relations using pig scripts with no duplicates

Explorer

I have two relations

Relation A(1 w

2 x

3 y)

Relation B(

1 z

2 x

3 y

4 K

5 L)

Want to merge these relations in third relation with no duplicates using pig script

Relation C

1 w

2 x

3 y

4 K

5 L

1 REPLY 1
Highlighted

Re: How to achive merge two relations using pig scripts with no duplicates

Super Collaborator

You could use an union followed by distinct

C = UNION A, B;
D= DISTINCT C;

If there is a schema attached to the input relations, you may need the below

A = load 'file1.txt' using PigStorage(',') as (c1:chararray,c2:chararray);
B = load 'file2.txt' using PigStorage(',') as (c1:chararray,c2:chararray);
C = UNION A,B;
D = GROUP C BY (c1,c2);
E = FOREACH D GENERATE group.c1, group.c2;
dump E;
Don't have an account?
Coming from Hortonworks? Activate your account here