Created 05-07-2016 05:04 PM
In documentation page for "Configure Hive and HiveServer2 for Tez" there are two properties that looks similar to me:
The only difference that I see is that when using "hive.server2.tez.default.queues" we can specify several queues so I guess jobs will be distributed over these queues. Hence, if we need all Hive jobs running in one queue we should use "tez.queue.name".
Am I missing something here ?
Created 05-07-2016 06:38 PM
Essentially hive.server2.tez.default.queues exists for pre initialized Tez sessions. Normally starting an Application Master takes around 10 seconds so the first query will be significantly slow. However you can set hive.server2.tez.initialize.default.sessions=true.
This will initialize hive.server2.tez.sessions.per.default.queue AMs for each of the queues which will then be used for query execution.
For most situations I would not bother with it too much since subsequent queries will reuse existing AMs ( which have an idle wait time ). However if you have strong SLAs you may want to use it.
the tez.queue.name is then the actual queue you want to execute in. If you hit one of the default queues the AM is already there and everything is faster. You might have distinct queues for big heavy and small interactive queries however you still need to set the queue yourself.