Every time after I Execute any insert statement in Hive I see the following. Statement finishes successfully and new rows get inserted into a target table. However when I look into Yarn - Resource Manager UI I see that job, which was executing this Hive task, is still running. This job keeps running for a very long time, like 20 minutes, while insert operation itself finished in 45 seconds.
I saw this both in Sandbox and within a real cluster.
Will need the application logs for this job to determine the actual cause. Can you put in the o/p for :
yarn logs -applicationId <application_id>
I tried to execute your command, but it always fails with "Permission denied" exception. Even though I gave all permissions to the folder where the log is stored. So I am not sure how it is supposed to work.
However I pulled some logs from Ambari and attached them below.
These files are LogType Launch_Container and LogType stdout from Yarn.
I also saw a full log in HDFS. But it has unknown encoding and I do not know how to read it. I can send it to you by e-mail as web site did not allow me to attach this file to my message.
Hope this helps.
Can you run this as a "yarn" user itself and see if you can pull up the logs? The attached logs does not give any input on the time taken on the tasks here.
The described behavior could be the Tez AM container which can continue to be held after job completion. Can you confirm you are seeing this behavior only in Tez mode and not in MR mode