Support Questions
Find answers, ask questions, and share your expertise

How to achive merge two relations using pig scripts with no duplicates

Explorer

I have two relations

Relation A(1 w

2 x

3 y)

Relation B(

1 z

2 x

3 y

4 K

5 L)

Want to merge these relations in third relation with no duplicates using pig script

Relation C

1 w

2 x

3 y

4 K

5 L

1 REPLY 1

Re: How to achive merge two relations using pig scripts with no duplicates

Super Collaborator

You could use an union followed by distinct

C = UNION A, B;
D= DISTINCT C;

If there is a schema attached to the input relations, you may need the below

A = load 'file1.txt' using PigStorage(',') as (c1:chararray,c2:chararray);
B = load 'file2.txt' using PigStorage(',') as (c1:chararray,c2:chararray);
C = UNION A,B;
D = GROUP C BY (c1,c2);
E = FOREACH D GENERATE group.c1, group.c2;
dump E;