- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Varying vcores/ram for hive queries running Tez engine
- Labels:
-
Apache Hive
-
Apache Tez
Created ‎04-25-2016 08:37 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I was trying to benchmark some hive queries. I am using the tez execution engine. I varied the values of the following properties:
1. hive.tez.container.size
2. tez.task.resource.memory.mb
3. tez.task.resource.cpu.vcores
Changes in values for property 1 is reflected properly. However it seems that hive does not respect changes in values of property 3; it always allocates one vcore per requested container (RM is configured to use the DominantResourceCalculator). This got me thinking about the precedence of property values in hive and tez.
I have the following questions with respect to these configurations
1. Does hive respect the set values for the properties 2 and 3 at all?
2. If I set property 1 to a value say 2048 MB and property 2 is set to a value of say 1024 MB does this mean that I am wasting about a GB of memory for each spawned container?
3. Is there a property in hive similar to property 1 that allows me to use the 'set' command in the .hql file to specify the number of vcores to use per container?
4. Changes in value for the property tez.am.resource.cpu.vcores are reflected at runtime. However I do not observe the same behaviour with property 3. Are there other configurations that take precedence over it?
Your inputs and suggestions would be highly appreciated.
Thanks!
Created ‎04-25-2016 08:41 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Created ‎04-25-2016 09:29 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks Kuldeep!
To be clear, this means that there is no way to control the number of 'vcores' for every task container requested by the application master. The only valid configuration (for task containers) would be hive.tez.container.size, which again should be set accordingly so as to ensure that every spawned container on a particular m/c has access to EXACTLY ONE vcore on that m/c.
Please correct me if I'm wrong in understanding this.
Thanks!
Created ‎04-25-2016 09:34 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Tez tasks are mostly single threaded. Parallelization is achieved by tasks. So increasing the cores will not help you.
Created ‎04-25-2016 09:45 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks @Benjamin Leonhardi! A follow up question - why is it that the tez application master can have multiple vcores assigned to it? Does it spawn multiple threads for monitoring tasks?
Created ‎04-25-2016 09:59 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I would suppose so. Also he is working with the timeline server, communicating with the Resourcemanager and nodemanagers does logging, etc.
Created ‎08-29-2020 08:45 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Even I have the same question. How can I make use of free cores for Tez task and speed up the process?
Created ‎08-30-2020 12:39 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Pic 1 - Running containers is max'd at 50.
Pic 2- Free resources
Pic 3 - Tez Application in Default queue
I was able to get 4 vCore per container. The number of containers for Tez Application doesn't go beyond 50, though I have free vCores and Memory (Pic 2)
