- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Hive increase map join local task memory
- Labels:
-
Apache Hive
Created ‎11-24-2015 10:27 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Is there a way in HDP >= v2.2.4 to increase the local task memory? I'm aware of disabling/limiting map-only join sizes, but we want to increase, not limit it.
Depending on the environment, the memory allocation will shift, but it appears to be entirely to Yarn and Hive's discretion.
"Starting to launch local task to process map join;maximum memory = 255328256 => ~ 0.25 GB"
I've looked at/tried:
- hive.mapred.local.mem
- hive.mapjoin.localtask.max.memory.usage - this is simply a percentage of the local heap. I want to increase, not limit the mem.
- mapreduce.map.memory.mb - only effective for non-local tasks
I found documentation suggesting 'export HADOOP_HEAPSIZE="2048"' to change from the default, but this applied to the nodemanager.
Any way to configure this on a per-job basis?
EDIT
To avoid duplication, the info I'm referencing comes from here: https://support.pivotal.io/hc/en-us/articles/207750748-Unable-to-increase-hive-child-process-max-hea...
Sounds like a per-job solution is not currently available with this bug.
Created ‎02-03-2016 08:26 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It's a bug in Hive - you can disable hive.auto.convert.join or set the memory at a global level via HADOOP_HEAPSIZE, but it does not solve the question of setting the local task memory on a per-job basis.
Created ‎11-25-2015 01:18 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
What client are you using to run the query? If its Hive CLI then you can run export HADOOP_OPTS="-Xmx2048m" on the shell and then invoke the hive cli.
Created ‎11-25-2015 04:28 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Michael Miklavcic you have to increate tez container size: hive.tez.container.size and hive.tez.java.opts (should be 80% of container size) to have more memory available.
Then, you can increase hive.auto.convert.join.noconditionaltask.size to automatically convert mapjoins or set
hive.ignore.mapjoin.hint=false and use mapjoin hine (select /*+ MAPJOIN(dimension_table_name) */ ...)
Created ‎11-26-2015 01:09 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
For those upvoting this answer, this is the correct answer for increasing mem for mapper Yarn containers, but will not work in cases where Hive is optimizing by creating a local task. What happens is that it generates a hash table of values for the map-side join first on a local node, then uploads this to HDFS for distribution to all mappers that need the fast lookup table. It's the local task that is the problem here, and the only way to fix this is to bail on the map-side join optimization, or change your HADOOP_HEAPSIZE on a global level through Ambari. Not elegant, but it is a workaround.
Created ‎08-02-2016 08:07 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks @Guilherme Braccialli Increasing hive.auto.convert.join.noconditionaltask.size fixed our problem. UpVoted !
Created ‎04-02-2017 05:45 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @Alind Billore How much memory you increased for this property? I too face this issue with below settings.
set hive.auto.convert.join.noconditionaltask.size=3300000000;
Created ‎04-02-2017 03:58 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
now, i've solved this issue by setting below property. Now, all mapper/reducer's output will not be stored in memory. But, i need to revisit my table data and predicates(where clause) once again to check if any unnecessary data is fetched.
set hive.auto.convert.join=false;
Created ‎11-25-2015 06:23 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Doesn't seem to work. Did the following:
$ export HADOOP_OPTS="-Xmx1024m"
$ hive -f test.hql > results.txt
...
Starting to launch local task to process map join;maximum memory = 511180800 = 0.5111808GB
...
Created ‎11-25-2015 06:30 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Michael Miklavcic check hive.mapjoin.localtask.max.memory.usage, it's the percentage of memory dedicated to local mapjoin task.
Created ‎11-26-2015 01:03 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Guilherme Braccialli, that doesn't increase memory allocation for the local task. It's a percentage threshold before the job is automatically killed. It's already at 90% by default, so at this point the only option is to increase the local mem allocation. I tested the "HADOOP_HEAPSIZE" option from Ambari, and it works, but it's global.
