Support Questions
Find answers, ask questions, and share your expertise

Tutorial "How to Process Data with Apache Pig HCC Tutorial Tag: tutorial-150 and hdp-2.5.0" resulting in 2118 errors

Tutorial "How to Process Data with Apache Pig HCC Tutorial Tag: tutorial-150 and hdp-2.5.0" resulting in 2118 errors

New Contributor

Hi, I'm logged into an Azure VM HortonWorks Sandbox as maria-dev, I'm trying to work through the "How to Process Data with Apache Pig HCC Tutorial Tag: tutorial-150 and hdp-2.5.0", but am receiving two 2118 errors (e.g., "org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Input path does not exist: hdfs://sandbox.hortonworks.com:8020/user/maria_dev/timesheet.csv )

Even after just copying and pasting the code from the tutorial into the Pig script, this keeps happening. I verified that I've uploaded the drivers.csv and timesheet.csv files into HDFS via the Ambari interface. Can you help me resolve this problem? Thanks

5 REPLIES 5

Re: Tutorial "How to Process Data with Apache Pig HCC Tutorial Tag: tutorial-150 and hdp-2.5.0" resulting in 2118 errors

Cloudera Employee

This could be if you are the user "maria-dev" and not "maria_dev" ?

Re: Tutorial "How to Process Data with Apache Pig HCC Tutorial Tag: tutorial-150 and hdp-2.5.0" resulting in 2118 errors

New Contributor

Thanks for your reply. Sorry, "maria-dev" was a typo on my first post -- that is not a valid login (at least in my environment). I was actually logged in as "maria_dev", which appears to be the standard username for the sandbox.

Re: Tutorial "How to Process Data with Apache Pig HCC Tutorial Tag: tutorial-150 and hdp-2.5.0" resulting in 2118 errors

Could you share the link to the tutorial and the script that gave error?

Re: Tutorial "How to Process Data with Apache Pig HCC Tutorial Tag: tutorial-150 and hdp-2.5.0" resulting in 2118 errors

New Contributor

Dinesh,

The tutorial is:

http://hortonworks.com/hadoop-tutorial/how-to-process-data-with-apache-pig/

Here is the script, which I copied and pasted directly from the tutorial:

drivers = LOAD 'drivers.csv' USING PigStorage(','); raw_drivers = FILTER drivers BY $0>1; drivers_details = FOREACH raw_drivers GENERATE $0 AS driverId, $1 AS name; timesheet = LOAD 'timesheet.csv' USING PigStorage(','); raw_timesheet = FILTER timesheet by $0>1; timesheet_logged = FOREACH raw_timesheet GENERATE $0 AS driverId, $2 AS hours_logged, $3 AS miles_logged; grp_logged = GROUP timesheet_logged by driverId; sum_logged = FOREACH grp_logged GENERATE group as driverId, SUM(timesheet_logged.hours_logged) as sum_hourslogged, SUM(timesheet_logged.miles_logged) as sum_mileslogged; join_sum_logged = JOIN sum_logged by driverId, drivers_details by driverId; join_data = FOREACH join_sum_logged GENERATE $0 as driverId, $4 as name, $1 as hours_logged, $2 as miles_logged; dump join_data;

Thanks in advance for your help!

Gary

Re: Tutorial "How to Process Data with Apache Pig HCC Tutorial Tag: tutorial-150 and hdp-2.5.0" resulting in 2118 errors

Hi @Gary Spurrier

Did you consider , changing the path where your csv file is lying ?

drivers = LOAD '/user/maria_dev/drivers _info/drivers.csv' USING PigStorage(',');

It should work after that.

Cheers,

Ankur