Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

piggybank jar file does not exist

avatar
Contributor

Hi all,

I am new to Sandbox and am trying to run Pig on Microsoft Azure.

To load one of my tables, I need to use the piggybank jar. I have downloaded this and saved it to hdfs in the path tmp/stackexchange

Here is the code I am trying to run:

REGISTER /tmp/stackexchange/piggybank.jarRAW_LOGS1 = LOAD Query_1-50000.csv USING org.apache.pig.piggybank.storage.CSVExcelStorage(',', YES_MULTILINE) as (Id:Long, PostTypeID:chararray, AcceptedAnswerID:chararray, ParentID:chararray, CreationDate:chararray, DeletionDate:chararray,  Score:long, ViewCount:long, Body:chararray, OwnerUserID:chararray, OwnerDisplayName:chararray, LastEditorUserId:chararray, LastEditorDisplayName:chararray, LastEditDate:chararray, LastActivityDate:chararray, Title:chararray, Tags:chararray, AnswerCount:int, CommentCount:int, FavoriteCount:int, ClosedDate:chararray, CommunityOwnedDate:chararray);

However, I am being returned the error message:

2016-03-20 17:22:48,506 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 101: file '/tmp/stackexchange/piggybank.jar' does not exist.

Does anyone know what could be wrong? Am I missing a step required to register the piggybank file perhaps?

Any help is greatly appreciated - thanks in advance.

1 ACCEPTED SOLUTION

avatar

I had some troubles a while back similar to this as shown at https://martin.atlassian.net/wiki/x/C4BRAQ. Try replacing

REGISTER /tmp/stackexchange/piggybank.jar

with

REGISTER 'hdfs:///tmp/stackexchange/piggybank.jar'

and let us know if that works.

View solution in original post

6 REPLIES 6

avatar

I had some troubles a while back similar to this as shown at https://martin.atlassian.net/wiki/x/C4BRAQ. Try replacing

REGISTER /tmp/stackexchange/piggybank.jar

with

REGISTER 'hdfs:///tmp/stackexchange/piggybank.jar'

and let us know if that works.

avatar
Contributor

Brilliant - that works. Thanks!

avatar
Master Mentor

As long as you use HDP and you have pig client installed on your edgenode, you can find piggybank jar in /usr/hdp/current/pig-client/lib/piggybank.jar. you dont need to download it separately or upload it to hdfs.

Please see this for example https://community.hortonworks.com/questions/20487/store-output-file-as-3-files-using-pig.html

avatar
Contributor

Hi Artem - thanks for the response. Can you please explain "

As long as you use HDP and you have pig client installed on your edgenode"? - Is this something additional I need to do/install?

I cannot locate the folder "/usr" on hdfs.

Thanks for your help,

Maeve

avatar
Master Mentor

HDP installs are placed in /usr/hdp/version, so in case you are on HDP, look for /usr/hdp on your local filesystem not in HDFS. Then in Ambari, make sure you have pig client installed on the machines you're on. Look for that jar in the directory I specified earlier. @Maeve Ryan

avatar
Contributor

Ah - understood now. This worked! Thank you 🙂