- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Pig Load - ERROR 2118: Input path does not exist
- Labels:
-
Apache Pig
-
HDFS
-
Training
Created on 03-06-2017 08:13 PM - edited 09-16-2022 08:43 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I'm a newbie at Pig scripting and just walking through some examples (Cloudera on demaind training to be specific). Anyway I load a file
hdfs dfs -put $ADIR/data/ad_data1.txt /dualcore/
Check that the directory has proper permissions via hdfs dfs -l /
I can see it's chmod 777 for /dualcore and also check the /dualcore/ad_data1.txt is also set properly in HDFS.
Now when I try to the pig -x local first_etl.pig script I get the following
ERROR: org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Input path does not exist: file:/dualcore/ad_data1.txt
QUESTION: The file is at the root /dualcore/ad_data1.txt. When I cat the file [hdfs dfs -cat /dualcore/ad_data1.txt] it displays the data. Do I need to specify something other than LOAD '/dualcore/ad_data1.txt' ?
SCRIPT:
data = LOAD '/dualcore/ad_data1.txt' using PigStorage(':') AS (keyword:chararray,
campaign_id:chararray,
date:chararray,
time:chararray,
display_site:chararray,
was_clicked:int,
cpc:int,
country:chararray,
placement:chararray);
reordered = FOREACH data GENERATE campaign_id,
date,
time,
UPPER(TRIM(keyword)),
display_site,
placement,
was_clicked,
cpc;
STORE reordered INTO '/dualcore/ad_data1/';
Created 03-06-2017 08:34 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The issue was I was running the first_etl.pig as
pig -x local first_etl.pig which runs it locally expecting a local file and what I want is to run this on the Hadoop cluster. Running this as pig first_etl.pig fires this off and finds the file.
Created 03-06-2017 08:34 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The issue was I was running the first_etl.pig as
pig -x local first_etl.pig which runs it locally expecting a local file and what I want is to run this on the Hadoop cluster. Running this as pig first_etl.pig fires this off and finds the file.
