Created on 03-06-2017 08:13 PM - edited 09-16-2022 08:43 AM
I'm a newbie at Pig scripting and just walking through some examples (Cloudera on demaind training to be specific). Anyway I load a file
hdfs dfs -put $ADIR/data/ad_data1.txt /dualcore/
Check that the directory has proper permissions via hdfs dfs -l /
I can see it's chmod 777 for /dualcore and also check the /dualcore/ad_data1.txt is also set properly in HDFS.
Now when I try to the pig -x local first_etl.pig script I get the following
ERROR: org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Input path does not exist: file:/dualcore/ad_data1.txt
QUESTION: The file is at the root /dualcore/ad_data1.txt. When I cat the file [hdfs dfs -cat /dualcore/ad_data1.txt] it displays the data. Do I need to specify something other than LOAD '/dualcore/ad_data1.txt' ?
SCRIPT:
data = LOAD '/dualcore/ad_data1.txt' using PigStorage(':') AS (keyword:chararray,
campaign_id:chararray,
date:chararray,
time:chararray,
display_site:chararray,
was_clicked:int,
cpc:int,
country:chararray,
placement:chararray);
reordered = FOREACH data GENERATE campaign_id,
date,
time,
UPPER(TRIM(keyword)),
display_site,
placement,
was_clicked,
cpc;
STORE reordered INTO '/dualcore/ad_data1/';
Created 03-06-2017 08:34 PM
Created 03-06-2017 08:34 PM