Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Field with empty or no data causing error in pig

avatar
Expert Contributor

Apache Pig version 0.12.1.2.1.7.0-784

I have data where one of the field doesn't have data like

2015,,08
2015,,09
2015,,11
2015,,04
2015,,05

Now i run the pig command like

grunt> given_input = load '/pigtest/flightdelays/' using PigStorage(',') as (year,month,day);
grunt> ori = foreach given_input generate month;
grunt> illustrate ori;

generating error like :  Caused by: java.lang.RuntimeException: No (valid) input data found!

when i replace the loader with CSVExcelStorage like

grunt> given_input = load '/pigtest/flightdelays/' using org.apache.pig.piggybank.storage.CSVExcelStorage(',') as (year,month,day);
grunt> ori = foreach given_input generate month;
grunt> illustrate ori;

getting output like

-------------------------------------------------------------------------------
| given_input     | year:bytearray    | month:bytearray    | day:bytearray    |
-------------------------------------------------------------------------------
|                 | 2015              |                    | 05               |
-------------------------------------------------------------------------------
--------------------------------
| ori     | month:bytearray    |
--------------------------------
|         |                    |
--------------------------------

So,I would like to know

1)What is the problem with Pigstorage.

2)Is it loader problem or pig version problem.

3)If i want to use PigStoarage in this,How is should???

Not only illustrate even dump behaves the same.

1 ACCEPTED SOLUTION

avatar
Master Guru

Are you sure that dump behaves the same? If I do ( using your data 😞

a = load '/tmp/test' using PigStorage(',') as (year,month,day);

dump a;

(2015,,08)(2015,,09)...

And if I do

b = foreach a generate month;and dump b;

()()()

Looks to me pigstorage works perfectly fine with dump.

If I use illustrate everything goes wrong though. After using illustrate even the dump command fails with a nullpointer exception. So not only does it not work correctly it breaks the grunt shell until I restart it.

I think the problem is the illustrate command:

Which is not too surprising since this is the warning on top of it in the pig docs:

Illustrate:

(Note! This feature is NOT maintained at the moment. We are looking for someone to adopt it.)

View solution in original post

3 REPLIES 3

avatar
Master Guru

Are you sure that dump behaves the same? If I do ( using your data 😞

a = load '/tmp/test' using PigStorage(',') as (year,month,day);

dump a;

(2015,,08)(2015,,09)...

And if I do

b = foreach a generate month;and dump b;

()()()

Looks to me pigstorage works perfectly fine with dump.

If I use illustrate everything goes wrong though. After using illustrate even the dump command fails with a nullpointer exception. So not only does it not work correctly it breaks the grunt shell until I restart it.

I think the problem is the illustrate command:

Which is not too surprising since this is the warning on top of it in the pig docs:

Illustrate:

(Note! This feature is NOT maintained at the moment. We are looking for someone to adopt it.)

avatar
Expert Contributor

@Benjamin Leonhardi

I used dump after illustrate.So i got error.So the problem is with " illustrate " command.

Actually i have a habit to use illustrate for every pig command i used in grunt shell to check the output.

avatar
Master Guru

It looks like a very useful command for debugging. Never used it before. Shame it seems to be broken.