Support Questions

Yan · ‎05-08-2016

hi guys,

My cluster is using hdp-2.3.2.0, anaylse data with hive on tez.

Jobs always go well, but sometimes job can't not be finished, the log of the taskattempt is always 1.0, such like this "org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1462372008131_4318_m_000000_0 is : 1.0", and it never change to another state if i do nothing.

And job progress on "All applications" ui shows 95%.

So I have to kill the job never finished by tez to avoid holding up next jobs, how can i do fix it , i have no idea with this problem

My sense is hive on tez job executed by oozie coordinators, 8 workflow action run concurrently at one time.

Please help . Thank you very much

tips about our cluster: node manager, datanode, regionserver, recource manager started on the same node

and attachement is more detail info about my issue and the container log, Thanks again

container.txt

Yan · ‎06-12-2016

Finally i find the way to solve this issue

It's beasue the yarn schedule assign cluster resource while working with oozie.

When oozie execute a workflow , oozie launcher will occupy 1 container, and then the occupation will set up the tez application for my oozie hive2 action.

But my cluster is less 40 nodes, it's a small cluster and oozie workflow task are a lot of.

This cause not enough containers can be used to complete the tez application, and oozie launcher will not release container until tez application completed, so the cluster resource lacked.

My solution is set 2 queue in yarn queue,one is the `default` queue, and another is `oozie ` only used to work for oozie launcher, and set `default` queue name to my oozie hive2 action

Thanks again for all of your replies, the inspire me

View solution in original post

tsaito · ‎05-09-2016

It's not the ultimate solution, but you can reduce tez.session.am.dag.submit.timeout.secs to avoid job from being hold up for a long time.

Yan · ‎05-10-2016

hi, thanks for reply, i appreciate it

Our settings is tez.session.am.dag.submit.timeout.secs=600 and tez.session.client.timeout.secs=-1

Does it make something wrong.

Always our jobs can be accomplished very soon , data is not too large, but job's progress keep 1.0 once in a while

Thanks agian

tsaito · ‎05-10-2016

you may want to set tez.session.client.timeout.secs to something like 180 other than -1 because -1 means it waits forever.

Also you may want to look into the application log to investigate why that job is being stuck.

divakarreddy_a · ‎05-10-2016

@苏斌

I think you are running out of disk space. Can you check that?

Yan · ‎06-12-2016

thanks for reply,i got the final solution, i will push it here.

It's about the yarn schedule worked with oozie.

When oozie execute a workflow , oozie launcher will occupy 1 container, and then the occupation will set up the tez application for my oozie hive2 action.

But my cluster is less 40 nodes, it's a small cluster and oozie workflow task are a lot of.

This cause not enough containers can be used to complete the tez application, and oozie launcher will not release container until tez application completed, so the cluster resource lacked.

My solution is set 2 queue in yarn queue,one is the `default` queue, and another is `oozie ` only used to work for oozie launcher, and set `default` queue name to my oozie hive2 action

Thank again for all of your replies, the inspire me

Yan · ‎07-22-2016

no,it's not about the disk space,yarn schedule it the key

Yan · ‎06-12-2016

Finally i find the way to solve this issue

It's beasue the yarn schedule assign cluster resource while working with oozie.

When oozie execute a workflow , oozie launcher will occupy 1 container, and then the occupation will set up the tez application for my oozie hive2 action.

But my cluster is less 40 nodes, it's a small cluster and oozie workflow task are a lot of.

This cause not enough containers can be used to complete the tez application, and oozie launcher will not release container until tez application completed, so the cluster resource lacked.

My solution is set 2 queue in yarn queue,one is the `default` queue, and another is `oozie ` only used to work for oozie launcher, and set `default` queue name to my oozie hive2 action

Thanks again for all of your replies, the inspire me

Cloudera Community

Support Questions

Hive on Tez Progress of TaskAttempt is always 1.0 and it will never change to another state

Hive ACID - Current state

NullPointerException (but not always) in GroupBy i...

oozie job stuck in - 'Running ' state always

Apache oozie JA008 error - job state changed from ...

hive, hive on tez, high availability

How to connect to CDW (Impala) to return actively ...

Tutorial using Hive CLI, enforce queues for TEZ

Hive MetaStore always fails to start

Hive on tez cannot execute custom hook program!!!

Configuring Tez queue for Hive Warehouse Connector