Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Error Handling during Pig LOAD Function

avatar
Expert Contributor

Does anyone have experience using how Pig can handle error Tuples during the LOAD function?

E.g. if we LOAD 10 lines which are comma delimited using PigStorage(',') yet the 9th line of the input data is Pipe delimited. What controls do we have on how these tuples are parsed and which Variable (relation) they are assigned to?

Ideally, I'd like to have one Relation/Variable loaded with the successful rows and some other relation holding the rows which were not parsed properly.

1 ACCEPTED SOLUTION

avatar
Master Guru
hide-solution

This problem has been solved!

Want to get a detailed solution you have to login/registered on the community

Register/Login
4 REPLIES 4

avatar
Master Mentor

you can use if statement @Wes Floyd or maybe in your case since you use PigStorage(',') you can filter on commas and filter out pipe. Then load it again but PigStorage('|').

here's an example with split

A = LOAD 'data' AS (f1:int,f2:int,f3:int);

DUMP A;                
(1,2,3)
(4,5,6)
(7,8,9)        

SPLIT A INTO X IF f1<7, Y IF f2==5, Z IF (f3<6 OR f3>6);

DUMP X;
(1,2,3)
(4,5,6)

DUMP Y;
(4,5,6)

DUMP Z;
(1,2,3)
(7,8,9)

avatar
Master Mentor

@Wes Floyd @Benjamin Leonhardi I was also thinking load using PigStorage() without delimiter and then do either regex or split or filter and route to output file.

avatar
Master Guru

It would be really convenient if PigStorage Serde would exist as a Pig function as well. Then one could load it as a String check if its valid with SPLIT and then parse it into a tuple.

Something like:

A = LOAD 'myfile';

B = SPLIT IF PigStorage_valid($0) GOODDATA, OTHERWISE BADDATA;

C = FOREACH B GENERATE PigStorage_parse($0)

...

But since this doesn't exist I think the only options are to write these functions yourself or as Artem says use a regex, filter, ... to verify correctness write it and load it again with PigStorage.

avatar
Master Guru
hide-solution

This problem has been solved!

Want to get a detailed solution you have to login/registered on the community

Register/Login