Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

I am looking to generate sequence no in a file .Used RANK, but it's failing for files > 10GB. Here is the code ​temp = LOAD 'abc.txt' using PigStorage(';','-tagFile'); test = RANK temp; DUMP test;

avatar
 
1 ACCEPTED SOLUTION

avatar
Master Mentor

@Koti P

I don't see a problem with your code, I'm able to execute your code using HDP 2.4 Sandbox

temp = LOAD 'abc.txt' using PigStorage(';','-tagFile');
test = RANK temp;
DUMP test;

my abc.txt looks like so

David,1,N
Tete,2,N
Ranjit,3,M
Ranjit,3,P
David,4,Q
David,4,Q
Jillian,8,Q
JaePak,7,Q
Michael,8,T
Jillian,8,Q
Jose,10,V

and my output looks like so:

(1,abc.txt,David,1,N)
(2,abc.txt,Tete,2,N)
(3,abc.txt,Ranjit,3,M)
(4,abc.txt,Ranjit,3,P)
(5,abc.txt,David,4,Q)
(6,abc.txt,David,4,Q)
(7,abc.txt,Jillian,8,Q)
(8,abc.txt,JaePak,7,Q)
(9,abc.txt,Michael,8,T)
(10,abc.txt,Jillian,8,Q)
(11,abc.txt,Jose,10,V)

I used tez as executing engine

pig -x tez

View solution in original post

2 REPLIES 2

avatar
Master Mentor

@Koti P

I don't see a problem with your code, I'm able to execute your code using HDP 2.4 Sandbox

temp = LOAD 'abc.txt' using PigStorage(';','-tagFile');
test = RANK temp;
DUMP test;

my abc.txt looks like so

David,1,N
Tete,2,N
Ranjit,3,M
Ranjit,3,P
David,4,Q
David,4,Q
Jillian,8,Q
JaePak,7,Q
Michael,8,T
Jillian,8,Q
Jose,10,V

and my output looks like so:

(1,abc.txt,David,1,N)
(2,abc.txt,Tete,2,N)
(3,abc.txt,Ranjit,3,M)
(4,abc.txt,Ranjit,3,P)
(5,abc.txt,David,4,Q)
(6,abc.txt,David,4,Q)
(7,abc.txt,Jillian,8,Q)
(8,abc.txt,JaePak,7,Q)
(9,abc.txt,Michael,8,T)
(10,abc.txt,Jillian,8,Q)
(11,abc.txt,Jose,10,V)

I used tez as executing engine

pig -x tez

avatar

Thanks for the answer. I was out of town and could not get back. I have tested and looked good if we run with 'tez' engine. Is there anyway we can test with oozie flow. The same I am not able to test with oozie as it's taking mapreduce mode