Created 08-11-2016 02:40 PM
Hi,
When I run an insert into command through beeline Hive/Tez requests 2 containers. Once beeline reports that the row was successfully inserted in to the table I see that the job created (seen in the YARN Manager UI) is still running and holds on to one of the two containers, when I terminate beeline the job listed in the Manager UI then lists as completed.
Why is this happening and how can I change my hadoop configuration to stop this happening?
Thanks,
Mike
Created 08-11-2016 02:40 PM
@mike harding check if you have tez reuse container turned on
tez.am.container.reuse.enabled=true |
Configuration that specifies whether a container should be reused. |
This allow other application to reuse tez containers to increase performance. turn it off if you are not interested in that functinoality.
Created 08-11-2016 02:40 PM
@mike harding check if you have tez reuse container turned on
tez.am.container.reuse.enabled=true |
Configuration that specifies whether a container should be reused. |
This allow other application to reuse tez containers to increase performance. turn it off if you are not interested in that functinoality.
Created 08-11-2016 04:57 PM
what is
tez.am.session.min.held-containers |
set to?
Created 08-12-2016 12:21 PM
..I found that in my ambari settings this was not specified - on setting this to 0 and setting tez.session.am.dag.submit.timeout.secs to a smaller amount gave me the behaviour i was looking for.
Created 08-11-2016 04:58 PM
@Sunile Manjee I checked this and it is false. The remaining container seems to be the application master. When I run the Hive jobs via MapReduce2 they complete fine, its just when they are run in Tez I see this behaviour.
Created 09-12-2016 01:01 PM
@mike harding to add to this, Tez by default first initializes an AM whereas MapReduce does so at submission only. This is the reason you see the behavior you describe. The tez container has a timeout setting as you stated and that will determine how long lived that initial AM is