Support Questions

peterlandis · ‎03-06-2017

I'm a newbie at Pig scripting and just walking through some examples (Cloudera on demaind training to be specific). Anyway I load a file

hdfs dfs -put $ADIR/data/ad_data1.txt /dualcore/

Check that the directory has proper permissions via hdfs dfs -l /

I can see it's chmod 777 for /dualcore and also check the /dualcore/ad_data1.txt is also set properly in HDFS.

Now when I try to the pig -x local first_etl.pig script I get the following

ERROR: org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Input path does not exist: file:/dualcore/ad_data1.txt

QUESTION: The file is at the root /dualcore/ad_data1.txt. When I cat the file [hdfs dfs -cat /dualcore/ad_data1.txt] it displays the data. Do I need to specify something other than LOAD '/dualcore/ad_data1.txt' ?

SCRIPT:

data = LOAD '/dualcore/ad_data1.txt' using PigStorage(':') AS (keyword:chararray,
campaign_id:chararray,
date:chararray,
time:chararray,
display_site:chararray,
was_clicked:int,
cpc:int,
country:chararray,
placement:chararray);

reordered = FOREACH data GENERATE campaign_id,
date,
time,
UPPER(TRIM(keyword)),
display_site,
placement,
was_clicked,
cpc;

STORE reordered INTO '/dualcore/ad_data1/';

peterlandis · ‎03-06-2017

Argggg. Ok I need to find a wall and pound my head against it.

The issue was I was running the first_etl.pig as
pig -x local first_etl.pig which runs it locally expecting a local file and what I want is to run this on the Hadoop cluster. Running this as pig first_etl.pig fires this off and finds the file.

View solution in original post

peterlandis · ‎03-06-2017

Argggg. Ok I need to find a wall and pound my head against it.

The issue was I was running the first_etl.pig as
pig -x local first_etl.pig which runs it locally expecting a local file and what I want is to run this on the Hadoop cluster. Running this as pig first_etl.pig fires this off and finds the file.

Cloudera Community

Support Questions

Pig Load - ERROR 2118: Input path does not exist