- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Pig script execution on Tez gets hung
- Labels:
-
Apache Pig
-
Apache Tez
Created ‎03-18-2017 06:26 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I'm trying to execute a pig script from Hortontonworks tutorial "Lab 3 - Pig Risk Factor Analysis Introduction". I'm trying with 'Execute on Tez' option checked. But the script shows running for long time and finally I had to kill the job (after more than 30Mins of execution).
I did syntax check before executing and no errors found. I didn't executed the script from shell yet.
Does anybody give me an idea that what might be causing this issue ?
Attached is the log and the script as as below :
a = LOAD 'geolocation' USING org.apache.hive.hcatalog.pig.HCatLoader(); b = FILTER a BY event != 'normal'; c = FOREACH b GENERATE driverid, event, (int) '1' as occurance; d = GROUP c BY driverid; e = FOREACH d GENERATE group as driverid, SUM(c.occurance) as t_occ; g = LOAD 'driver_mileage' USING org.apache.hive.hcatalog.pig.HCatLoader(); h = JOIN e BY driverid, g BY driverid; final_data = foreach h generate $0 as driverid, $1 as events, $3 as totmiles, (float) $3/$1 as riskfactor; STORE final_data INTO 'riskfactor' using org.apache.hive.hcatalog.pig.HCatStorer();
Created ‎03-20-2017 05:04 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Rajesh Balamohan - The below apache jira talks about the issue when Pig on Tez AM use too much memory on a small cluster:
https://issues.apache.org/jira/browse/PIG-4948
To resolve this, as mentioned in the jira try setting below properties:
- Set tez.am.resource.memory.mb to be the same as yarn.scheduler.minimum-allocation-mb the YARN minimum container size.
- pig.tez.configure.am.memory to true
Created ‎03-18-2017 07:37 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Can you please post the yarn application logs for application_1489855879213_0002
Use the below command to get the application logs:
yarn logs -applicationId application_1489855879213_0002
Created ‎03-19-2017 02:59 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for replying. Attaching the yarn log retrieved from shell.
Created ‎03-19-2017 10:12 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
There are no explicit errors in the application logs. Can you reproduce the issue?
Created ‎03-19-2017 06:31 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It looks like Tez AM container cannot be scheduled. Typically this is caused by not enough resource. Please check your cluster capacity to ensure you have enough resource for Tez container.
Created ‎03-19-2017 03:33 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I guess there is some problem with Tez config. As when I execute by checking ''Execute on Tez" it hungs and when unchecked (running as MapReduce) it works fine (attaching the log while executed as MapReduce).pig-script-success-when-exec-as-mapreduce.txt
Created ‎03-20-2017 03:52 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
>>>
2017-03-18 22:43:27,002 [PigTezLauncher-0] INFO org.apache.hadoop.yarn.client.api.impl.YarnClientImpl - Submitted application application_1489855879213_0002 2017-03-18 22:43:27,005 [PigTezLauncher-0] INFO org.apache.tez.client.TezClient - The url to track the Tez Session: http://pc-1.thenet.edu:8088/proxy/application_1489855879213_0002/
>>>
I agree with zyang. Looking at the logs, it appears that there isn't enough capacity to launch. Can you verify the queue configs and cluster capacity?
Created ‎03-20-2017 05:04 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Rajesh Balamohan - The below apache jira talks about the issue when Pig on Tez AM use too much memory on a small cluster:
https://issues.apache.org/jira/browse/PIG-4948
To resolve this, as mentioned in the jira try setting below properties:
- Set tez.am.resource.memory.mb to be the same as yarn.scheduler.minimum-allocation-mb the YARN minimum container size.
- pig.tez.configure.am.memory to true
Created ‎03-20-2017 04:27 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Namit MaheshwariThank you very much for the guidance and appropriate answer. With the above settings the issue got resolved.
