- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Hive on Tez Progress of TaskAttempt is always 1.0 and it will never change to another state
- Labels:
-
Apache Hive
-
Apache Oozie
-
Apache Tez
Created on 05-08-2016 01:45 PM - edited 08-19-2019 01:47 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
hi guys,
My cluster is using hdp-2.3.2.0, anaylse data with hive on tez.
Jobs always go well, but sometimes job can't not be finished, the log of the taskattempt is always 1.0, such like this "org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1462372008131_4318_m_000000_0 is : 1.0", and it never change to another state if i do nothing.
And job progress on "All applications" ui shows 95%.
So I have to kill the job never finished by tez to avoid holding up next jobs, how can i do fix it , i have no idea with this problem
My sense is hive on tez job executed by oozie coordinators, 8 workflow action run concurrently at one time.
Please help . Thank you very much
tips about our cluster: node manager, datanode, regionserver, recource manager started on the same node
and attachement is more detail info about my issue and the container log, Thanks again
Created 06-12-2016 07:07 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Finally i find the way to solve this issue
It's beasue the yarn schedule assign cluster resource while working with oozie.
When oozie execute a workflow , oozie launcher will occupy 1 container, and then the occupation will set up the tez application for my oozie hive2 action.
But my cluster is less 40 nodes, it's a small cluster and oozie workflow task are a lot of.
This cause not enough containers can be used to complete the tez application, and oozie launcher will not release container until tez application completed, so the cluster resource lacked.
My solution is set 2 queue in yarn queue,one is the `default` queue, and another is `oozie ` only used to work for oozie launcher, and set `default` queue name to my oozie hive2 action
Thanks again for all of your replies, the inspire me
Created 05-09-2016 11:40 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It's not the ultimate solution, but you can reduce tez.session.am.dag.submit.timeout.secs to avoid job from being hold up for a long time.
Created 05-10-2016 03:50 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
hi, thanks for reply, i appreciate it
Our settings is tez.session.am.dag.submit.timeout.secs=600 and tez.session.client.timeout.secs=-1
Does it make something wrong.
Always our jobs can be accomplished very soon , data is not too large, but job's progress keep 1.0 once in a while
Thanks agian
Created 05-10-2016 09:15 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
you may want to set tez.session.client.timeout.secs to something like 180 other than -1 because -1 means it waits forever.
Also you may want to look into the application log to investigate why that job is being stuck.
Created 05-10-2016 09:29 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I think you are running out of disk space. Can you check that?
Created 06-12-2016 07:05 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
thanks for reply,i got the final solution, i will push it here.
It's about the yarn schedule worked with oozie.
When oozie execute a workflow , oozie launcher will occupy 1 container, and then the occupation will set up the tez application for my oozie hive2 action.
But my cluster is less 40 nodes, it's a small cluster and oozie workflow task are a lot of.
This cause not enough containers can be used to complete the tez application, and oozie launcher will not release container until tez application completed, so the cluster resource lacked.
My solution is set 2 queue in yarn queue,one is the `default` queue, and another is `oozie ` only used to work for oozie launcher, and set `default` queue name to my oozie hive2 action
Thank again for all of your replies, the inspire me
Created 07-22-2016 03:43 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
no,it's not about the disk space,yarn schedule it the key
Created 06-12-2016 07:07 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Finally i find the way to solve this issue
It's beasue the yarn schedule assign cluster resource while working with oozie.
When oozie execute a workflow , oozie launcher will occupy 1 container, and then the occupation will set up the tez application for my oozie hive2 action.
But my cluster is less 40 nodes, it's a small cluster and oozie workflow task are a lot of.
This cause not enough containers can be used to complete the tez application, and oozie launcher will not release container until tez application completed, so the cluster resource lacked.
My solution is set 2 queue in yarn queue,one is the `default` queue, and another is `oozie ` only used to work for oozie launcher, and set `default` queue name to my oozie hive2 action
Thanks again for all of your replies, the inspire me
