Support Questions

Find answers, ask questions, and share your expertise

Hi..whats the difference between "Start Tez session at Initialization" and hive.execution.engine=tez properties?

avatar
Expert Contributor

If both are not same how they are different? Can you please let me know?

1 ACCEPTED SOLUTION

avatar
Master Guru

"Let me put the question this way. If I have hive.execution.engine=tez; why do I need the property hive.server2.tez.initialize.default.sessions to set it to "True"? Whats the use-case for this property? I ran multiple tests but my hive.execution.engine property drives how the query works and not this default sessions property"

The default session parameter has nothing to do with the way the query is executed. It is for pre-creating Tez sessions. If this is false the first query on an empty system will take at least 20seconds to create a session.

Time for a Tez query:

Hiveserver prepare, compilation, ...: ~1sec

Not much you can do here however it continuously gets faster.

Initialize Tez Application Master ( Session 😞 ~10 seconds

To reduce that Hive can reuse Sessions, that are idle, AMs are kept for normally 120s after a query is run. Or you can instantiate default sessions if you cannot live with that delay.

Initialize Containers : 3-10s

The next step is to allocate the work containers to the Session, again Tez can reuse containers or you can preheat containers. ( pre allocate the containers )

The actual query

That depends on your data.

View solution in original post

5 REPLIES 5

avatar
Super Collaborator
@Srikaran Jangidi

Start Tez session at Initialization - Enables a user to use HiveServer2 without enabling Tez for HiveServer2. Users might potentially want to run queries with Tez without a pool of sessions.

Default value is False

hive.execution.engine=tez - This setting determines whether Hive queries will be executed using Tez or MapReduce.

Default value is - If this value is set to "mr," Hive queries will be executed using MapReduce. If this value is set to "tez," Hive queries will be executed using Tez. All queries executed through HiveServer2 will use the specified hive.execution.engine setting.

avatar
Expert Contributor

Let me put the question this way. If I have hive.execution.engine=tez; why do I need the property hive.server2.tez.initialize.default.sessions to set it to "True"? Whats the use-case for this property? I ran multiple tests but my hive.execution.engine property drives how the query works and not this default sessions property.

avatar
Master Guru

"Let me put the question this way. If I have hive.execution.engine=tez; why do I need the property hive.server2.tez.initialize.default.sessions to set it to "True"? Whats the use-case for this property? I ran multiple tests but my hive.execution.engine property drives how the query works and not this default sessions property"

The default session parameter has nothing to do with the way the query is executed. It is for pre-creating Tez sessions. If this is false the first query on an empty system will take at least 20seconds to create a session.

Time for a Tez query:

Hiveserver prepare, compilation, ...: ~1sec

Not much you can do here however it continuously gets faster.

Initialize Tez Application Master ( Session 😞 ~10 seconds

To reduce that Hive can reuse Sessions, that are idle, AMs are kept for normally 120s after a query is run. Or you can instantiate default sessions if you cannot live with that delay.

Initialize Containers : 3-10s

The next step is to allocate the work containers to the Session, again Tez can reuse containers or you can preheat containers. ( pre allocate the containers )

The actual query

That depends on your data.

avatar
Expert Contributor

@Benjamin Leonhardi

Thanks, This makes sense, so its always better to set the value to "True" rt?

avatar
Master Guru

Personally I like it off. It binds extra resources in the cluster and the second query will be fast anyway. You also need to know how many sessions you want in advance since it will redistrube queries to the precreated seasons. If you don't care about the first query on a cold system being slow keeping it off is the safer choice IMO