Support Questions
Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Innovation Accelerator group hub.

Hi..whats the difference between "Start Tez session at Initialization" and hive.execution.engine=tez properties?

Contributor

If both are not same how they are different? Can you please let me know?

1 ACCEPTED SOLUTION

"Let me put the question this way. If I have hive.execution.engine=tez; why do I need the property hive.server2.tez.initialize.default.sessions to set it to "True"? Whats the use-case for this property? I ran multiple tests but my hive.execution.engine property drives how the query works and not this default sessions property"

The default session parameter has nothing to do with the way the query is executed. It is for pre-creating Tez sessions. If this is false the first query on an empty system will take at least 20seconds to create a session.

Time for a Tez query:

Hiveserver prepare, compilation, ...: ~1sec

Not much you can do here however it continuously gets faster.

Initialize Tez Application Master ( Session 😞 ~10 seconds

To reduce that Hive can reuse Sessions, that are idle, AMs are kept for normally 120s after a query is run. Or you can instantiate default sessions if you cannot live with that delay.

Initialize Containers : 3-10s

The next step is to allocate the work containers to the Session, again Tez can reuse containers or you can preheat containers. ( pre allocate the containers )

The actual query

That depends on your data.

View solution in original post

5 REPLIES 5

Expert Contributor
@Srikaran Jangidi

Start Tez session at Initialization - Enables a user to use HiveServer2 without enabling Tez for HiveServer2. Users might potentially want to run queries with Tez without a pool of sessions.

Default value is False

hive.execution.engine=tez - This setting determines whether Hive queries will be executed using Tez or MapReduce.

Default value is - If this value is set to "mr," Hive queries will be executed using MapReduce. If this value is set to "tez," Hive queries will be executed using Tez. All queries executed through HiveServer2 will use the specified hive.execution.engine setting.

Contributor

Let me put the question this way. If I have hive.execution.engine=tez; why do I need the property hive.server2.tez.initialize.default.sessions to set it to "True"? Whats the use-case for this property? I ran multiple tests but my hive.execution.engine property drives how the query works and not this default sessions property.

"Let me put the question this way. If I have hive.execution.engine=tez; why do I need the property hive.server2.tez.initialize.default.sessions to set it to "True"? Whats the use-case for this property? I ran multiple tests but my hive.execution.engine property drives how the query works and not this default sessions property"

The default session parameter has nothing to do with the way the query is executed. It is for pre-creating Tez sessions. If this is false the first query on an empty system will take at least 20seconds to create a session.

Time for a Tez query:

Hiveserver prepare, compilation, ...: ~1sec

Not much you can do here however it continuously gets faster.

Initialize Tez Application Master ( Session 😞 ~10 seconds

To reduce that Hive can reuse Sessions, that are idle, AMs are kept for normally 120s after a query is run. Or you can instantiate default sessions if you cannot live with that delay.

Initialize Containers : 3-10s

The next step is to allocate the work containers to the Session, again Tez can reuse containers or you can preheat containers. ( pre allocate the containers )

The actual query

That depends on your data.

Contributor

@Benjamin Leonhardi

Thanks, This makes sense, so its always better to set the value to "True" rt?

Personally I like it off. It binds extra resources in the cluster and the second query will be fast anyway. You also need to know how many sessions you want in advance since it will redistrube queries to the precreated seasons. If you don't care about the first query on a cold system being slow keeping it off is the safer choice IMO