I have a Hive query (coming from Nifi), it is an insert operation, and Tez (view from Ambari) has reported it running for >48 hours. This was preventing other operations from occuring, so I wanted to kill the query so the backlog could be processed.
I killed the associated Yarn Application. The Tez view from Ambari still shows the status as Running. Clicking through to the Application ID shows the status as Killed and Final Status as Undefined. Clicking through to the DAG shows the status as Killed, and the only listed Vertex is also Killed. Total Tasks for the DAG is 1.
However, despite all this the query is still marked as running in TEZ and other queries are blocked behind it. Even non-modifying queries like a simple SELECT COUNT(*) are unable to run. This has persisted across Yarn, Hive, Tez, and even entire machine restarts.
Is there some way to kill off the operations that Tez is showing? Killing the yarn application doesn't seem to be sufficient.
I'm experiencing this same issue btw. I'm wondering if we're running out of resources for tez. I have a support case open with Hortonworks and will let you know what comes of it. We usually have to restart Hive in order to for more Tez/Hive jobs to run.
In our case, restarting Hive (or the entire stack for that matter) didn't seem to do anything. After playing with some resource allocation, we were able to get some jobs to go through, but the ones that were killed still remain.
I'm starting to wonder if it's an issue with Tez not recognizing that jobs are being killed.
Came here to find how to kill queries that show as running in Tez. I am facing the same issue as documented above..Tez shows 4 queries running without ApplicationID and DAGID's.
I have restarted the whole suite, but still shows running.
Tez-client - 22.214.171.124-235
I'm having the same problem, I killed some jobs, but tez UI saus some of them are still running; no App ID or DAG ID. Query was launched from Hive
We are facing the same issue. Restarting the cluster doesn't seem to help either. Has anyone been able to figure out a way to clear those stale running queries?
In our case it was a resource allocation issue. We had our queues not set up optimally and we also changed some memory configurations around to allow the initial query process to run properly. Basically, the default queue didn't have any resources and we noticed that when a query first is submitted, there's a process that runs in the default queue before submitting the job to the queue specified.