Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

Pig Load - ERROR 2118: Input path does not exist

avatar
Explorer

I'm a newbie at Pig scripting and just walking through some examples (Cloudera on demaind training to be specific).  Anyway I load a file 

 

hdfs dfs -put $ADIR/data/ad_data1.txt /dualcore/

 

Check that the directory has proper permissions via hdfs dfs -l / 

I can see it's chmod 777 for /dualcore and also check the /dualcore/ad_data1.txt is also set properly in HDFS.

 

Now when I try to the pig -x local first_etl.pig script I get the following 

 

ERROR: org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Input path does not exist: file:/dualcore/ad_data1.txt

 

QUESTION:  The file is at the root /dualcore/ad_data1.txt.  When I cat the file [hdfs dfs -cat /dualcore/ad_data1.txt] it displays the data. Do I need to specify something other than LOAD '/dualcore/ad_data1.txt' ? 

 

SCRIPT:

data = LOAD '/dualcore/ad_data1.txt' using PigStorage(':') AS (keyword:chararray,
campaign_id:chararray,
date:chararray,
time:chararray,
display_site:chararray,
was_clicked:int,
cpc:int,
country:chararray,
placement:chararray);

reordered = FOREACH data GENERATE campaign_id,
date,
time,
UPPER(TRIM(keyword)),
display_site,
placement,
was_clicked,
cpc;

STORE reordered INTO '/dualcore/ad_data1/';

1 ACCEPTED SOLUTION

avatar
Explorer
Argggg. Ok I need to find a wall and pound my head against it.

The issue was I was running the first_etl.pig as
pig -x local first_etl.pig which runs it locally expecting a local file and what I want is to run this on the Hadoop cluster. Running this as pig first_etl.pig fires this off and finds the file.

View solution in original post

1 REPLY 1

avatar
Explorer
Argggg. Ok I need to find a wall and pound my head against it.

The issue was I was running the first_etl.pig as
pig -x local first_etl.pig which runs it locally expecting a local file and what I want is to run this on the Hadoop cluster. Running this as pig first_etl.pig fires this off and finds the file.