Created 03-21-2016 12:14 AM
Hi all,
I am new to Sandbox and am trying to run Pig on Microsoft Azure.
To load one of my tables, I need to use the piggybank jar. I have downloaded this and saved it to hdfs in the path tmp/stackexchange
Here is the code I am trying to run:
REGISTER /tmp/stackexchange/piggybank.jarRAW_LOGS1 = LOAD Query_1-50000.csv USING org.apache.pig.piggybank.storage.CSVExcelStorage(',', YES_MULTILINE) as (Id:Long, PostTypeID:chararray, AcceptedAnswerID:chararray, ParentID:chararray, CreationDate:chararray, DeletionDate:chararray, Score:long, ViewCount:long, Body:chararray, OwnerUserID:chararray, OwnerDisplayName:chararray, LastEditorUserId:chararray, LastEditorDisplayName:chararray, LastEditDate:chararray, LastActivityDate:chararray, Title:chararray, Tags:chararray, AnswerCount:int, CommentCount:int, FavoriteCount:int, ClosedDate:chararray, CommunityOwnedDate:chararray);
However, I am being returned the error message:
2016-03-20 17:22:48,506 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 101: file '/tmp/stackexchange/piggybank.jar' does not exist.
Does anyone know what could be wrong? Am I missing a step required to register the piggybank file perhaps?
Any help is greatly appreciated - thanks in advance.
Created 03-21-2016 01:00 AM
I had some troubles a while back similar to this as shown at https://martin.atlassian.net/wiki/x/C4BRAQ. Try replacing
REGISTER /tmp/stackexchange/piggybank.jar
with
REGISTER 'hdfs:///tmp/stackexchange/piggybank.jar'
and let us know if that works.
Created 03-21-2016 01:00 AM
I had some troubles a while back similar to this as shown at https://martin.atlassian.net/wiki/x/C4BRAQ. Try replacing
REGISTER /tmp/stackexchange/piggybank.jar
with
REGISTER 'hdfs:///tmp/stackexchange/piggybank.jar'
and let us know if that works.
Created 03-21-2016 06:16 PM
Brilliant - that works. Thanks!
Created 03-21-2016 01:09 AM
As long as you use HDP and you have pig client installed on your edgenode, you can find piggybank jar in /usr/hdp/current/pig-client/lib/piggybank.jar. you dont need to download it separately or upload it to hdfs.
Please see this for example https://community.hortonworks.com/questions/20487/store-output-file-as-3-files-using-pig.html
Created 03-21-2016 11:14 AM
Hi Artem - thanks for the response. Can you please explain "
As long as you use HDP and you have pig client installed on your edgenode"? - Is this something additional I need to do/install?
I cannot locate the folder "/usr" on hdfs.
Thanks for your help,
Maeve
Created 03-21-2016 12:24 PM
HDP installs are placed in /usr/hdp/version, so in case you are on HDP, look for /usr/hdp on your local filesystem not in HDFS. Then in Ambari, make sure you have pig client installed on the machines you're on. Look for that jar in the directory I specified earlier. @Maeve Ryan
Created 03-21-2016 07:04 PM
Ah - understood now. This worked! Thank you 🙂