Support Questions

Find answers, ask questions, and share your expertise

Hive on Tez Progress of TaskAttempt is always 1.0 and it will never change to another state

avatar
Explorer

hi guys,

My cluster is using hdp-2.3.2.0, anaylse data with hive on tez.

Jobs always go well, but sometimes job can't not be finished, the log of the taskattempt is always 1.0, such like this "org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1462372008131_4318_m_000000_0 is : 1.0", and it never change to another state if i do nothing.

And job progress on "All applications" ui shows 95%.

4101-image.png

So I have to kill the job never finished by tez to avoid holding up next jobs, how can i do fix it , i have no idea with this problem

My sense is hive on tez job executed by oozie coordinators, 8 workflow action run concurrently at one time.

Please help . Thank you very much

tips about our cluster: node manager, datanode, regionserver, recource manager started on the same node

and attachement is more detail info about my issue and the container log, Thanks again

4100-rm-memory.png

container.txt

1 ACCEPTED SOLUTION

avatar
Explorer

Finally i find the way to solve this issue

It's beasue the yarn schedule assign cluster resource while working with oozie.

When oozie execute a workflow , oozie launcher will occupy 1 container, and then the occupation will set up the tez application for my oozie hive2 action.

But my cluster is less 40 nodes, it's a small cluster and oozie workflow task are a lot of.

This cause not enough containers can be used to complete the tez application, and oozie launcher will not release container until tez application completed, so the cluster resource lacked.

My solution is set 2 queue in yarn queue,one is the `default` queue, and another is `oozie ` only used to work for oozie launcher, and set `default` queue name to my oozie hive2 action

Thanks again for all of your replies, the inspire me

View solution in original post

7 REPLIES 7

avatar
Expert Contributor

It's not the ultimate solution, but you can reduce tez.session.am.dag.submit.timeout.secs to avoid job from being hold up for a long time.

avatar
Explorer

hi, thanks for reply, i appreciate it

Our settings is tez.session.am.dag.submit.timeout.secs=600 and tez.session.client.timeout.secs=-1

Does it make something wrong.

Always our jobs can be accomplished very soon , data is not too large, but job's progress keep 1.0 once in a while

Thanks agian

avatar
Expert Contributor

you may want to set tez.session.client.timeout.secs to something like 180 other than -1 because -1 means it waits forever.

Also you may want to look into the application log to investigate why that job is being stuck.

avatar
@苏 斌

I think you are running out of disk space. Can you check that?

avatar
Explorer

thanks for reply,i got the final solution, i will push it here.

It's about the yarn schedule worked with oozie.

When oozie execute a workflow , oozie launcher will occupy 1 container, and then the occupation will set up the tez application for my oozie hive2 action.

But my cluster is less 40 nodes, it's a small cluster and oozie workflow task are a lot of.

This cause not enough containers can be used to complete the tez application, and oozie launcher will not release container until tez application completed, so the cluster resource lacked.

My solution is set 2 queue in yarn queue,one is the `default` queue, and another is `oozie ` only used to work for oozie launcher, and set `default` queue name to my oozie hive2 action

Thank again for all of your replies, the inspire me

avatar
Explorer

no,it's not about the disk space,yarn schedule it the key

avatar
Explorer

Finally i find the way to solve this issue

It's beasue the yarn schedule assign cluster resource while working with oozie.

When oozie execute a workflow , oozie launcher will occupy 1 container, and then the occupation will set up the tez application for my oozie hive2 action.

But my cluster is less 40 nodes, it's a small cluster and oozie workflow task are a lot of.

This cause not enough containers can be used to complete the tez application, and oozie launcher will not release container until tez application completed, so the cluster resource lacked.

My solution is set 2 queue in yarn queue,one is the `default` queue, and another is `oozie ` only used to work for oozie launcher, and set `default` queue name to my oozie hive2 action

Thanks again for all of your replies, the inspire me