Support Questions

Find answers, ask questions, and share your expertise

Running Pig scripts from HUE editor - Job gets killed

avatar
Explorer

Hi,

I am new to hdfs/pig and need quick help. I just installed the cloudera quickstart VM using VMWare.

On running this script(the upper text sample) - 

 

data = LOAD '/home/cloudera/midsummer.txt' as (text:CHARARRAY);
upper_case = FOREACH data GENERATE org.apache.pig.piggybank.evaluation.string.UPPER(text);
STORE upper_case INTO '/home/cloudera/midsummer2.txt';

 

It takes 3 to 4 minutes and runs the map part and does not do the reduce. I see the following error on the job workflow page:-

Cannot access: /user/hue/oozie/workspaces/hue-oozie-1452553957.19/${wf:appPath()}/pig-b118.pig/. Note: you are a Hue admin but not a HDFS superuser, "hdfs" or part of HDFS supergroup, "supergroup".

InvalidPathException: Invalid path name Invalid file name: /user/hue/oozie/workspaces/hue-oozie-1452553957.19/${wf:appPath()}/pig-b118.pig (error 400)

 

I logged on to Hue using the cloudera user and on seeing this error also gave hadoop group to cloudera as well. From Oozie I see the error - JA0189 - Main class [org.apache.oozie.action.hadoop.PigMain], exit code [2]

 

Your help will be appreciated.

Thanks 

Regards

Santhosh

 

1 ACCEPTED SOLUTION

avatar
Explorer

I should have used the following code. This specifies the correct directory that Hadoop understands.

 

data = LOAD '/user/cloudera/midsummer.txt' as (text:CHARARRAY);
upper_case = FOREACH data GENERATE UPPER(text);
STORE upper_case INTO '/user/cloudera/midsummerOutput2';

View solution in original post

6 REPLIES 6

avatar
Explorer

We would like to consider cloudera for development and ultimate commercial use if I can get over these initial hurdles. Just to add - I just want to run a simple pig script using the HUE editor on a newly installed cloudera quickstart VM.

- Should I login into HUE as cloudera?

- Any other additonal configuration or software needs to be upgraded?

 

avatar
Explorer

Update:- After some testing with a small data set. The following works (without the STORE clause):-

data = LOAD '/home/cloudera/smallfile.txt' as (text:CHARARRAY);
upper_case = FOREACH data GENERATE UPPER(text);

 

But once we add the STORE or the DUMP clause at the end of the script is when those errors are thrown. It is probably pointing to some permission issues between HUE and HDFS in the "reduce" process I guess.

 

I was hoping the cloudera quick start VM will work out of the box. Calling on the cloudera experts to help!!

Please point me to any knowledge base or workarounds to avoid this problem.

avatar
Explorer

Looks like these cloudera community help requests go into a black hole!!. I was able to find a solution myself. The following script works and was good enough for this trail. It appears that specifiying the directory paths in the pig script was the problem. I put all the scripts in my local directory and also told pig/hadoop to create to create the output directory under the current working directory.

 

data = LOAD 'midsummer.txt' as (text:CHARARRAY);
upper_case = FOREACH data GENERATE UPPER(text);
STORE upper_case INTO 'midsummerOutput';

 

avatar
Explorer

I should have used the following code. This specifies the correct directory that Hadoop understands.

 

data = LOAD '/user/cloudera/midsummer.txt' as (text:CHARARRAY);
upper_case = FOREACH data GENERATE UPPER(text);
STORE upper_case INTO '/user/cloudera/midsummerOutput2';

avatar
Community Manager

Congratulations on solving your issue @SGeorge. Also, rest assured, your posts on the community do not enter a black hole from which there is no return. The community is peer to peer based forum so there are cases where questions go unanswered or take some time to receive a reply. 

 

 


Cy Jervis, Manager, Community Program
Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.

avatar
Champion

Just quick info you can run pig in local mode as well as in mapreduce mode ,

By default, load looks for your data on HDFS in a tab-delimited file using the default load function PigStorage. 

 also if you start you pig -x which local mode it will look for local fs . 

Nice that you found the fix. @SGeorge ,