Created 03-18-2017 06:26 PM
I'm trying to execute a pig script from Hortontonworks tutorial "Lab 3 - Pig Risk Factor Analysis Introduction". I'm trying with 'Execute on Tez' option checked. But the script shows running for long time and finally I had to kill the job (after more than 30Mins of execution).
I did syntax check before executing and no errors found. I didn't executed the script from shell yet.
Does anybody give me an idea that what might be causing this issue ?
Attached is the log and the script as as below :
a = LOAD 'geolocation' USING org.apache.hive.hcatalog.pig.HCatLoader(); b = FILTER a BY event != 'normal'; c = FOREACH b GENERATE driverid, event, (int) '1' as occurance; d = GROUP c BY driverid; e = FOREACH d GENERATE group as driverid, SUM(c.occurance) as t_occ; g = LOAD 'driver_mileage' USING org.apache.hive.hcatalog.pig.HCatLoader(); h = JOIN e BY driverid, g BY driverid; final_data = foreach h generate $0 as driverid, $1 as events, $3 as totmiles, (float) $3/$1 as riskfactor; STORE final_data INTO 'riskfactor' using org.apache.hive.hcatalog.pig.HCatStorer();
Created 03-20-2017 05:04 AM
@Rajesh Balamohan - The below apache jira talks about the issue when Pig on Tez AM use too much memory on a small cluster:
https://issues.apache.org/jira/browse/PIG-4948
To resolve this, as mentioned in the jira try setting below properties:
Created 03-18-2017 07:37 PM
Can you please post the yarn application logs for application_1489855879213_0002
Use the below command to get the application logs:
yarn logs -applicationId application_1489855879213_0002
Created 03-19-2017 02:59 PM
Created 03-19-2017 10:12 PM
There are no explicit errors in the application logs. Can you reproduce the issue?
Created 03-19-2017 06:31 AM
It looks like Tez AM container cannot be scheduled. Typically this is caused by not enough resource. Please check your cluster capacity to ensure you have enough resource for Tez container.
Created 03-19-2017 03:33 PM
I guess there is some problem with Tez config. As when I execute by checking ''Execute on Tez" it hungs and when unchecked (running as MapReduce) it works fine (attaching the log while executed as MapReduce).pig-script-success-when-exec-as-mapreduce.txt
Created 03-20-2017 03:52 AM
>>>
2017-03-18 22:43:27,002 [PigTezLauncher-0] INFO org.apache.hadoop.yarn.client.api.impl.YarnClientImpl - Submitted application application_1489855879213_0002 2017-03-18 22:43:27,005 [PigTezLauncher-0] INFO org.apache.tez.client.TezClient - The url to track the Tez Session: http://pc-1.thenet.edu:8088/proxy/application_1489855879213_0002/
>>>
I agree with zyang. Looking at the logs, it appears that there isn't enough capacity to launch. Can you verify the queue configs and cluster capacity?
Created 03-20-2017 05:04 AM
@Rajesh Balamohan - The below apache jira talks about the issue when Pig on Tez AM use too much memory on a small cluster:
https://issues.apache.org/jira/browse/PIG-4948
To resolve this, as mentioned in the jira try setting below properties:
Created 03-20-2017 04:27 PM
@Namit MaheshwariThank you very much for the guidance and appropriate answer. With the above settings the issue got resolved.