Support Questions
Find answers, ask questions, and share your expertise

How to achive merge two relations using pig scripts with no duplicates

Highlighted

How to achive merge two relations using pig scripts with no duplicates

Explorer

I have two relations

Relation A(1 w

2 x

3 y)

Relation B(

1 z

2 x

3 y

4 K

5 L)

Want to merge these relations in third relation with no duplicates using pig script

Relation C

1 w

2 x

3 y

4 K

5 L

1 REPLY 1

Re: How to achive merge two relations using pig scripts with no duplicates

Super Collaborator

You could use an union followed by distinct

C = UNION A, B;
D= DISTINCT C;

If there is a schema attached to the input relations, you may need the below

A = load 'file1.txt' using PigStorage(',') as (c1:chararray,c2:chararray);
B = load 'file2.txt' using PigStorage(',') as (c1:chararray,c2:chararray);
C = UNION A,B;
D = GROUP C BY (c1,c2);
E = FOREACH D GENERATE group.c1, group.c2;
dump E;
Don't have an account?