Support Questions

Find answers, ask questions, and share your expertise

Pig script execution on Tez gets hung

avatar
Contributor
pig-script-error.txt

I'm trying to execute a pig script from Hortontonworks tutorial "Lab 3 - Pig Risk Factor Analysis Introduction". I'm trying with 'Execute on Tez' option checked. But the script shows running for long time and finally I had to kill the job (after more than 30Mins of execution).

I did syntax check before executing and no errors found. I didn't executed the script from shell yet.

Does anybody give me an idea that what might be causing this issue ?

Attached is the log and the script as as below :

a = LOAD 'geolocation' USING org.apache.hive.hcatalog.pig.HCatLoader(); b = FILTER a BY event != 'normal'; c = FOREACH b GENERATE driverid, event, (int) '1' as occurance; d = GROUP c BY driverid; e = FOREACH d GENERATE group as driverid, SUM(c.occurance) as t_occ; g = LOAD 'driver_mileage' USING org.apache.hive.hcatalog.pig.HCatLoader(); h = JOIN e BY driverid, g BY driverid; final_data = foreach h generate $0 as driverid, $1 as events, $3 as totmiles, (float) $3/$1 as riskfactor; STORE final_data INTO 'riskfactor' using org.apache.hive.hcatalog.pig.HCatStorer();

1 ACCEPTED SOLUTION

avatar

@Rajesh Balamohan - The below apache jira talks about the issue when Pig on Tez AM use too much memory on a small cluster:

https://issues.apache.org/jira/browse/PIG-4948

To resolve this, as mentioned in the jira try setting below properties:

  • Set tez.am.resource.memory.mb to be the same as yarn.scheduler.minimum-allocation-mb the YARN minimum container size.
  • pig.tez.configure.am.memory to true

View solution in original post

8 REPLIES 8

avatar

Can you please post the yarn application logs for application_1489855879213_0002

Use the below command to get the application logs:

 yarn logs -applicationId application_1489855879213_0002 

avatar
Contributor

Thanks for replying. Attaching the yarn log retrieved from shell.

avatar

There are no explicit errors in the application logs. Can you reproduce the issue?

avatar
Contributor

It looks like Tez AM container cannot be scheduled. Typically this is caused by not enough resource. Please check your cluster capacity to ensure you have enough resource for Tez container.

avatar
Contributor

I guess there is some problem with Tez config. As when I execute by checking ''Execute on Tez" it hungs and when unchecked (running as MapReduce) it works fine (attaching the log while executed as MapReduce).pig-script-success-when-exec-as-mapreduce.txt

avatar
Rising Star

>>>

2017-03-18 22:43:27,002 [PigTezLauncher-0] INFO  org.apache.hadoop.yarn.client.api.impl.YarnClientImpl - Submitted application application_1489855879213_0002
2017-03-18 22:43:27,005 [PigTezLauncher-0] INFO  org.apache.tez.client.TezClient - The url to track the Tez Session: http://pc-1.thenet.edu:8088/proxy/application_1489855879213_0002/

>>>

I agree with zyang. Looking at the logs, it appears that there isn't enough capacity to launch. Can you verify the queue configs and cluster capacity?

avatar

@Rajesh Balamohan - The below apache jira talks about the issue when Pig on Tez AM use too much memory on a small cluster:

https://issues.apache.org/jira/browse/PIG-4948

To resolve this, as mentioned in the jira try setting below properties:

  • Set tez.am.resource.memory.mb to be the same as yarn.scheduler.minimum-allocation-mb the YARN minimum container size.
  • pig.tez.configure.am.memory to true

avatar
Contributor

@Namit MaheshwariThank you very much for the guidance and appropriate answer. With the above settings the issue got resolved.