Created on 02-07-2018 08:03 AM - edited 09-16-2022 05:50 AM
Hi,
I am new to hdfs/pig and need quick help. I just installed the cloudera quickstart VM using VMWare.
On running this script(the upper text sample) -
data = LOAD '/home/cloudera/midsummer.txt' as (text:CHARARRAY);
upper_case = FOREACH data GENERATE org.apache.pig.piggybank.evaluation.string.UPPER(text);
STORE upper_case INTO '/home/cloudera/midsummer2.txt';
It takes 3 to 4 minutes and runs the map part and does not do the reduce. I see the following error on the job workflow page:-
Cannot access: /user/hue/oozie/workspaces/hue-oozie-1452553957.19/${wf:appPath()}/pig-b118.pig/. Note: you are a Hue admin but not a HDFS superuser, "hdfs" or part of HDFS supergroup, "supergroup".
InvalidPathException: Invalid path name Invalid file name: /user/hue/oozie/workspaces/hue-oozie-1452553957.19/${wf:appPath()}/pig-b118.pig (error 400)
I logged on to Hue using the cloudera user and on seeing this error also gave hadoop group to cloudera as well. From Oozie I see the error - JA0189 - Main class [org.apache.oozie.action.hadoop.PigMain], exit code [2]
Your help will be appreciated.
Thanks
Regards
Santhosh
Created 02-12-2018 08:24 AM
I should have used the following code. This specifies the correct directory that Hadoop understands.
data = LOAD '/user/cloudera/midsummer.txt' as (text:CHARARRAY);
upper_case = FOREACH data GENERATE UPPER(text);
STORE upper_case INTO '/user/cloudera/midsummerOutput2';
Created 02-07-2018 09:09 AM
We would like to consider cloudera for development and ultimate commercial use if I can get over these initial hurdles. Just to add - I just want to run a simple pig script using the HUE editor on a newly installed cloudera quickstart VM.
- Should I login into HUE as cloudera?
- Any other additonal configuration or software needs to be upgraded?
Created on 02-07-2018 04:11 PM - edited 02-07-2018 04:13 PM
Update:- After some testing with a small data set. The following works (without the STORE clause):-
data = LOAD '/home/cloudera/smallfile.txt' as (text:CHARARRAY);
upper_case = FOREACH data GENERATE UPPER(text);
But once we add the STORE or the DUMP clause at the end of the script is when those errors are thrown. It is probably pointing to some permission issues between HUE and HDFS in the "reduce" process I guess.
I was hoping the cloudera quick start VM will work out of the box. Calling on the cloudera experts to help!!
Please point me to any knowledge base or workarounds to avoid this problem.
Created 02-12-2018 07:51 AM
Looks like these cloudera community help requests go into a black hole!!. I was able to find a solution myself. The following script works and was good enough for this trail. It appears that specifiying the directory paths in the pig script was the problem. I put all the scripts in my local directory and also told pig/hadoop to create to create the output directory under the current working directory.
data = LOAD 'midsummer.txt' as (text:CHARARRAY);
upper_case = FOREACH data GENERATE UPPER(text);
STORE upper_case INTO 'midsummerOutput';
Created 02-12-2018 08:24 AM
I should have used the following code. This specifies the correct directory that Hadoop understands.
data = LOAD '/user/cloudera/midsummer.txt' as (text:CHARARRAY);
upper_case = FOREACH data GENERATE UPPER(text);
STORE upper_case INTO '/user/cloudera/midsummerOutput2';
Created 02-12-2018 01:46 PM
Congratulations on solving your issue @SGeorge. Also, rest assured, your posts on the community do not enter a black hole from which there is no return. The community is peer to peer based forum so there are cases where questions go unanswered or take some time to receive a reply.
Created on 02-13-2018 09:28 PM - edited 02-13-2018 09:31 PM
Just quick info you can run pig in local mode as well as in mapreduce mode ,
By default, load looks for your data on HDFS in a tab-delimited file using the default load function PigStorage.
also if you start you pig -x which local mode it will look for local fs .
Nice that you found the fix. @SGeorge ,