question Re: Hive increase map join local task memory in Archives of Support Questions (Read Only)

Hive increase map join local task memory

mmiklavcic — Wed, 25 Nov 2015 06:27:12 GMT

Is there a way in HDP >= v2.2.4 to increase the local task memory? I'm aware of disabling/limiting map-only join sizes, but we want to increase, not limit it.

Depending on the environment, the memory allocation will shift, but it appears to be entirely to Yarn and Hive's discretion.

"Starting to launch local task to process map join;maximum memory = 255328256 => ~ 0.25 GB"

I've looked at/tried:

hive.mapred.local.mem
hive.mapjoin.localtask.max.memory.usage - this is simply a percentage of the local heap. I want to increase, not limit the mem.
mapreduce.map.memory.mb - only effective for non-local tasks

I found documentation suggesting 'export HADOOP_HEAPSIZE="2048"' to change from the default, but this applied to the nodemanager.

Any way to configure this on a per-job basis?

EDIT

To avoid duplication, the info I'm referencing comes from here: https://support.pivotal.io/hc/en-us/articles/207750748-Unable-to-increase-hive-child-process-max-heap-when-attempting-hash-join

Sounds like a per-job solution is not currently available with this bug.

Re: Hive increase map join local task memory

deepesh1 — Wed, 25 Nov 2015 09:18:48 GMT

What client are you using to run the query? If its Hive CLI then you can run export HADOOP_OPTS="-Xmx2048m" on the shell and then invoke the hive cli.

Re: Hive increase map join local task memory

gbraccialli3 — Thu, 26 Nov 2015 00:28:31 GMT

@Michael Miklavcic you have to increate tez container size: hive.tez.container.size and hive.tez.java.opts (should be 80% of container size) to have more memory available.

Then, you can increase hive.auto.convert.join.noconditionaltask.size to automatically convert mapjoins or set

hive.ignore.mapjoin.hint=false and use mapjoin hine (select /*+ MAPJOIN(dimension_table_name) */ ...)

Re: Hive increase map join local task memory

mmiklavcic — Thu, 26 Nov 2015 02:23:18 GMT

Doesn't seem to work. Did the following:

$ export HADOOP_OPTS="-Xmx1024m"

$ hive -f test.hql > results.txt

...

Starting to launch local task to process map join;maximum memory = 511180800 = 0.5111808GB

...

Re: Hive increase map join local task memory

gbraccialli3 — Thu, 26 Nov 2015 02:30:34 GMT

@Michael Miklavcic check hive.mapjoin.localtask.max.memory.usage, it's the percentage of memory dedicated to local mapjoin task.

Re: Hive increase map join local task memory

mmiklavcic — Thu, 26 Nov 2015 09:03:55 GMT

@Guilherme Braccialli, that doesn't increase memory allocation for the local task. It's a percentage threshold before the job is automatically killed. It's already at 90% by default, so at this point the only option is to increase the local mem allocation. I tested the "HADOOP_HEAPSIZE" option from Ambari, and it works, but it's global.

Re: Hive increase map join local task memory

mmiklavcic — Thu, 26 Nov 2015 09:09:13 GMT

For those upvoting this answer, this is the correct answer for increasing mem for mapper Yarn containers, but will not work in cases where Hive is optimizing by creating a local task. What happens is that it generates a hash table of values for the map-side join first on a local node, then uploads this to HDFS for distribution to all mappers that need the fast lookup table. It's the local task that is the problem here, and the only way to fix this is to bail on the map-side join optimization, or change your HADOOP_HEAPSIZE on a global level through Ambari. Not elegant, but it is a workaround.

Re: Hive increase map join local task memory

gbraccialli3 — Thu, 26 Nov 2015 10:04:21 GMT

@Michael Miklavcic looks like hive/hadoop scripts always defines max heap size with ambari setting.

I debugged /usr/hdp/2.3.2.0-2950/hadoop/bin/hadoop.distro and got commands below, where you can change -Xmx and define your desired amount of memory.

export CLASSPATH=/usr/hdp/2.3.2.0-2950/hadoop/conf:/usr/hdp/2.3.2.0-2950/hadoop/lib/*:/usr/hdp/2.3.2.0-2950/hadoop/.//*:/usr/hdp/2.3.2.0-2950/hadoop-hdfs/./:/usr/hdp/2.3.2.0-2950/hadoop-hdfs/lib/*:/usr/hdp/2.3.2.0-2950/hadoop-hdfs/.//*:/usr/hdp/2.3.2.0-2950/hadoop-yarn/lib/*:/usr/hdp/2.3.2.0-2950/hadoop-yarn/.//*:/usr/hdp/2.3.2.0-2950/hadoop-mapreduce/lib/*:/usr/hdp/2.3.2.0-2950/hadoop-mapreduce/.//*:/usr/hdp/2.3.2.0-2950/atlas/hook/hive/*:/usr/hdp/2.3.2.0-2950/hive-hcatalog/share/hcatalog/hive-hcatalog-core-1.2.1.2.3.2.0-2950.jar:/usr/hdp/2.3.2.0-2950/hive-hcatalog/share/hcatalog/hive-hcatalog-server-extensions-1.2.1.2.3.2.0-2950.jar:/usr/hdp/2.3.2.0-2950/hive-hcatalog/share/webhcat/java-client/hive-webhcat-java-client-1.2.1.2.3.2.0-2950.jar:/usr/hdp/current/hive-client/conf:/usr/hdp/2.3.2.0-2950/hive/lib/accumulo-core-1.7.0.2.3.2.0-2950.jar:/usr/hdp/2.3.2.0-2950/hive/lib/accumulo-fate-1.7.0.2.3.2.0-2950.jar:/usr/hdp/2.3.2.0-2950/hive/lib/accumulo-start-1.7.0.2.3.2.0-2950.jar:/usr/hdp/2.3.2.0-2950/hive/lib/accumulo-trace-1.7.0.2.3.2.0-2950.jar:/usr/hdp/2.3.2.0-2950/hive/lib/activation-1.1.jar:/usr/hdp/2.3.2.0-2950/hive/lib/ant-1.9.1.jar:/usr/hdp/2.3.2.0-2950/hive/lib/ant-launcher-1.9.1.jar:/usr/hdp/2.3.2.0-2950/hive/lib/antlr-2.7.7.jar:/usr/hdp/2.3.2.0-2950/hive/lib/antlr-runtime-3.4.jar:/usr/hdp/2.3.2.0-2950/hive/lib/apache-log4j-extras-1.2.17.jar:/usr/hdp/2.3.2.0-2950/hive/lib/asm-commons-3.1.jar:/usr/hdp/2.3.2.0-2950/hive/lib/asm-tree-3.1.jar:/usr/hdp/2.3.2.0-2950/hive/lib/avro-1.7.5.jar:/usr/hdp/2.3.2.0-2950/hive/lib/bonecp-0.8.0.RELEASE.jar:/usr/hdp/2.3.2.0-2950/hive/lib/calcite-avatica-1.2.0.2.3.2.0-2950.jar:/usr/hdp/2.3.2.0-2950/hive/lib/calcite-core-1.2.0.2.3.2.0-2950.jar:/usr/hdp/2.3.2.0-2950/hive/lib/calcite-linq4j-1.2.0.2.3.2.0-2950.jar:/usr/hdp/2.3.2.0-2950/hive/lib/commons-cli-1.2.jar:/usr/hdp/2.3.2.0-2950/hive/lib/commons-codec-1.4.jar:/usr/hdp/2.3.2.0-2950/hive/lib/commons-collections-3.2.1.jar:/usr/hdp/2.3.2.0-2950/hive/lib/commons-compiler-2.7.6.jar:/usr/hdp/2.3.2.0-2950/hive/lib/commons-compress-1.4.1.jar:/usr/hdp/2.3.2.0-2950/hive/lib/commons-dbcp-1.4.jar:/usr/hdp/2.3.2.0-2950/hive/lib/commons-httpclient-3.0.1.jar:/usr/hdp/2.3.2.0-2950/hive/lib/commons-io-2.4.jar:/usr/hdp/2.3.2.0-2950/hive/lib/commons-lang-2.6.jar:/usr/hdp/2.3.2.0-2950/hive/lib/commons-logging-1.1.3.jar:/usr/hdp/2.3.2.0-2950/hive/lib/commons-math-2.1.jar:/usr/hdp/2.3.2.0-2950/hive/lib/commons-pool-1.5.4.jar:/usr/hdp/2.3.2.0-2950/hive/lib/commons-vfs2-2.0.jar:/usr/hdp/2.3.2.0-2950/hive/lib/curator-client-2.6.0.jar:/usr/hdp/2.3.2.0-2950/hive/lib/curator-framework-2.6.0.jar:/usr/hdp/2.3.2.0-2950/hive/lib/curator-recipes-2.6.0.jar:/usr/hdp/2.3.2.0-2950/hive/lib/datanucleus-api-jdo-3.2.6.jar:/usr/hdp/2.3.2.0-2950/hive/lib/datanucleus-core-3.2.10.jar:/usr/hdp/2.3.2.0-2950/hive/lib/datanucleus-rdbms-3.2.9.jar:/usr/hdp/2.3.2.0-2950/hive/lib/derby-10.10.2.0.jar:/usr/hdp/2.3.2.0-2950/hive/lib/eclipselink-2.5.2-M1.jar:/usr/hdp/2.3.2.0-2950/hive/lib/eigenbase-properties-1.1.5.jar:/usr/hdp/2.3.2.0-2950/hive/lib/geronimo-annotation_1.0_spec-1.1.1.jar:/usr/hdp/2.3.2.0-2950/hive/lib/geronimo-jaspic_1.0_spec-1.0.jar:/usr/hdp/2.3.2.0-2950/hive/lib/geronimo-jta_1.1_spec-1.1.1.jar:/usr/hdp/2.3.2.0-2950/hive/lib/groovy-all-2.1.6.jar:/usr/hdp/2.3.2.0-2950/hive/lib/gson-2.2.4.jar:/usr/hdp/2.3.2.0-2950/hive/lib/guava-14.0.1.jar:/usr/hdp/2.3.2.0-2950/hive/lib/hamcrest-core-1.1.jar:/usr/hdp/2.3.2.0-2950/hive/lib/hive-accumulo-handler-1.2.1.2.3.2.0-2950.jar:/usr/hdp/2.3.2.0-2950/hive/lib/hive-accumulo-handler.jar:/usr/hdp/2.3.2.0-2950/hive/lib/hive-ant-1.2.1.2.3.2.0-2950.jar:/usr/hdp/2.3.2.0-2950/hive/lib/hive-ant.jar:/usr/hdp/2.3.2.0-2950/hive/lib/hive-beeline-1.2.1.2.3.2.0-2950.jar:/usr/hdp/2.3.2.0-2950/hive/lib/hive-beeline.jar:/usr/hdp/2.3.2.0-2950/hive/lib/hive-cli-1.2.1.2.3.2.0-2950.jar:/usr/hdp/2.3.2.0-2950/hive/lib/hive-cli.jar:/usr/hdp/2.3.2.0-2950/hive/lib/hive-common-1.2.1.2.3.2.0-2950.jar:/usr/hdp/2.3.2.0-2950/hive/lib/hive-common.jar:/usr/hdp/2.3.2.0-2950/hive/lib/hive-contrib-1.2.1.2.3.2.0-2950.jar:/usr/hdp/2.3.2.0-2950/hive/lib/hive-contrib.jar:/usr/hdp/2.3.2.0-2950/hive/lib/hive-exec-1.2.1.2.3.2.0-2950.jar:/usr/hdp/2.3.2.0-2950/hive/lib/hive-exec.jar:/usr/hdp/2.3.2.0-2950/hive/lib/hive-hbase-handler-1.2.1.2.3.2.0-2950.jar:/usr/hdp/2.3.2.0-2950/hive/lib/hive-hbase-handler.jar:/usr/hdp/2.3.2.0-2950/hive/lib/hive-hwi-1.2.1.2.3.2.0-2950.jar:/usr/hdp/2.3.2.0-2950/hive/lib/hive-hwi.jar:/usr/hdp/2.3.2.0-2950/hive/lib/hive-jdbc-1.2.1.2.3.2.0-2950.jar:/usr/hdp/2.3.2.0-2950/hive/lib/hive-jdbc-1.2.1.2.3.2.0-2950-standalone.jar:/usr/hdp/2.3.2.0-2950/hive/lib/hive-jdbc.jar:/usr/hdp/2.3.2.0-2950/hive/lib/hive-metastore-1.2.1.2.3.2.0-2950.jar:/usr/hdp/2.3.2.0-2950/hive/lib/hive-metastore.jar:/usr/hdp/2.3.2.0-2950/hive/lib/hive-serde-1.2.1.2.3.2.0-2950.jar:/usr/hdp/2.3.2.0-2950/hive/lib/hive-serde.jar:/usr/hdp/2.3.2.0-2950/hive/lib/hive-service-1.2.1.2.3.2.0-2950.jar:/usr/hdp/2.3.2.0-2950/hive/lib/hive-service.jar:/usr/hdp/2.3.2.0-2950/hive/lib/hive-shims-0.20S-1.2.1.2.3.2.0-2950.jar:/usr/hdp/2.3.2.0-2950/hive/lib/hive-shims-0.23-1.2.1.2.3.2.0-2950.jar:/usr/hdp/2.3.2.0-2950/hive/lib/hive-shims-1.2.1.2.3.2.0-2950.jar:/usr/hdp/2.3.2.0-2950/hive/lib/hive-shims-common-1.2.1.2.3.2.0-2950.jar:/usr/hdp/2.3.2.0-2950/hive/lib/hive-shims-common.jar:/usr/hdp/2.3.2.0-2950/hive/lib/hive-shims.jar:/usr/hdp/2.3.2.0-2950/hive/lib/hive-shims-scheduler-1.2.1.2.3.2.0-2950.jar:/usr/hdp/2.3.2.0-2950/hive/lib/hive-shims-scheduler.jar:/usr/hdp/2.3.2.0-2950/hive/lib/hive-testutils-1.2.1.2.3.2.0-2950.jar:/usr/hdp/2.3.2.0-2950/hive/lib/hive-testutils.jar:/usr/hdp/2.3.2.0-2950/hive/lib/htrace-core-3.1.0-incubating.jar:/usr/hdp/2.3.2.0-2950/hive/lib/httpclient-4.4.jar:/usr/hdp/2.3.2.0-2950/hive/lib/httpcore-4.4.jar:/usr/hdp/2.3.2.0-2950/hive/lib/httpmime-4.2.5.jar:/usr/hdp/2.3.2.0-2950/hive/lib/ivy-2.4.0.jar:/usr/hdp/2.3.2.0-2950/hive/lib/janino-2.7.6.jar:/usr/hdp/2.3.2.0-2950/hive/lib/javax.persistence-2.1.0.jar:/usr/hdp/2.3.2.0-2950/hive/lib/jcommander-1.32.jar:/usr/hdp/2.3.2.0-2950/hive/lib/jdo-api-3.0.1.jar:/usr/hdp/2.3.2.0-2950/hive/lib/jetty-all-7.6.0.v20120127.jar:/usr/hdp/2.3.2.0-2950/hive/lib/jetty-all-server-7.6.0.v20120127.jar:/usr/hdp/2.3.2.0-2950/hive/lib/jline-2.12.jar:/usr/hdp/2.3.2.0-2950/hive/lib/joda-time-2.5.jar:/usr/hdp/2.3.2.0-2950/hive/lib/jpam-1.1.jar:/usr/hdp/2.3.2.0-2950/hive/lib/json-20090211.jar:/usr/hdp/2.3.2.0-2950/hive/lib/jsr305-3.0.0.jar:/usr/hdp/2.3.2.0-2950/hive/lib/jta-1.1.jar:/usr/hdp/2.3.2.0-2950/hive/lib/junit-4.11.jar:/usr/hdp/2.3.2.0-2950/hive/lib/libfb303-0.9.2.jar:/usr/hdp/2.3.2.0-2950/hive/lib/libthrift-0.9.2.jar:/usr/hdp/2.3.2.0-2950/hive/lib/log4j-1.2.16.jar:/usr/hdp/2.3.2.0-2950/hive/lib/mail-1.4.1.jar:/usr/hdp/2.3.2.0-2950/hive/lib/maven-scm-api-1.4.jar:/usr/hdp/2.3.2.0-2950/hive/lib/maven-scm-provider-svn-commons-1.4.jar:/usr/hdp/2.3.2.0-2950/hive/lib/maven-scm-provider-svnexe-1.4.jar:/usr/hdp/2.3.2.0-2950/hive/lib/mysql-connector-java.jar:/usr/hdp/2.3.2.0-2950/hive/lib/netty-3.7.0.Final.jar:/usr/hdp/2.3.2.0-2950/hive/lib/noggit-0.6.jar:/usr/hdp/2.3.2.0-2950/hive/lib/ojdbc6.jar:/usr/hdp/2.3.2.0-2950/hive/lib/opencsv-2.3.jar:/usr/hdp/2.3.2.0-2950/hive/lib/oro-2.0.8.jar:/usr/hdp/2.3.2.0-2950/hive/lib/paranamer-2.3.jar:/usr/hdp/2.3.2.0-2950/hive/lib/parquet-hadoop-bundle-1.6.0.jar:/usr/hdp/2.3.2.0-2950/hive/lib/pentaho-aggdesigner-algorithm-5.1.5-jhyde.jar:/usr/hdp/2.3.2.0-2950/hive/lib/plexus-utils-1.5.6.jar:/usr/hdp/2.3.2.0-2950/hive/lib/ranger-hive-plugin-0.5.0.2.3.2.0-2950.jar:/usr/hdp/2.3.2.0-2950/hive/lib/ranger-plugins-audit-0.5.0.2.3.2.0-2950.jar:/usr/hdp/2.3.2.0-2950/hive/lib/ranger-plugins-common-0.5.0.2.3.2.0-2950.jar:/usr/hdp/2.3.2.0-2950/hive/lib/ranger-plugins-cred-0.5.0.2.3.2.0-2950.jar:/usr/hdp/2.3.2.0-2950/hive/lib/ranger_solrj-0.5.0.2.3.2.0-2950.jar:/usr/hdp/2.3.2.0-2950/hive/lib/regexp-1.3.jar:/usr/hdp/2.3.2.0-2950/hive/lib/servlet-api-2.5.jar:/usr/hdp/2.3.2.0-2950/hive/lib/snappy-java-1.0.5.jar:/usr/hdp/2.3.2.0-2950/hive/lib/ST4-4.0.4.jar:/usr/hdp/2.3.2.0-2950/hive/lib/stax-api-1.0.1.jar:/usr/hdp/2.3.2.0-2950/hive/lib/stringtemplate-3.2.1.jar:/usr/hdp/2.3.2.0-2950/hive/lib/super-csv-2.2.0.jar:/usr/hdp/2.3.2.0-2950/hive/lib/tempus-fugit-1.1.jar:/usr/hdp/2.3.2.0-2950/hive/lib/velocity-1.5.jar:/usr/hdp/2.3.2.0-2950/hive/lib/xz-1.0.jar:/usr/hdp/2.3.2.0-2950/hive/lib/zookeeper-3.4.6.2.3.2.0-2950.jar:/usr/hdp/2.3.2.0-2950/spark/lib/spark-assembly-1.4.1.2.3.2.0-2950-hadoop2.7.1.2.3.2.0-2950.jar::/usr/hdp/current/hive-webhcat/share/hcatalog/hive-hcatalog-core.jar:/etc/hbase/conf:/usr/hdp/2.3.2.0-2950/hbase/lib/hbase-common-1.1.2.2.3.2.0-2950.jar:/usr/hdp/2.3.2.0-2950/hbase/lib/htrace-core-3.1.0-incubating.jar:/usr/hdp/2.3.2.0-2950/hbase/lib/hbase-server-1.1.2.2.3.2.0-2950.jar:/usr/hdp/2.3.2.0-2950/hbase/lib/netty-all-4.0.23.Final.jar:/usr/hdp/2.3.2.0-2950/hbase/lib/metrics-core-2.2.0.jar:/usr/hdp/2.3.2.0-2950/hbase/lib/hbase-protocol-1.1.2.2.3.2.0-2950.jar:/usr/hdp/2.3.2.0-2950/hbase/lib/hbase-client-1.1.2.2.3.2.0-2950.jar:/usr/hdp/2.3.2.0-2950/hbase/lib/hbase-hadoop-compat-1.1.2.2.3.2.0-2950.jar::/usr/share/java/mysql-connector-java-5.1.17.jar:/usr/share/java/mysql-connector-java-5.1.31-bin.jar:/usr/share/java/mysql-connector-java.jar:/usr/hdp/2.3.2.0-2950/tez/*:/usr/hdp/2.3.2.0-2950/tez/lib/*:/usr/hdp/2.3.2.0-2950/tez/conf




/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.91.x86_64/bin/java -Xmx3000m -Dhdp.version=2.3.2.0-2950 -Djava.net.preferIPv4Stack=true -Dhdp.version=2.3.2.0-2950 -Dhadoop.log.dir=/var/log/hadoop/root -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/usr/hdp/2.3.2.0-2950/hadoop -Dhadoop.id.str=root -Dhadoop.root.logger=INFO,console -Djava.library.path=:/usr/hdp/2.3.2.0-2950/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.3.2.0-2950/hadoop/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -XX:MaxPermSize=512m -Dhadoop.security.logger=INFO,NullAppender org.apache.hadoop.util.RunJar /usr/hdp/2.3.2.0-2950/hive/lib/hive-cli-1.2.1.2.3.2.0-2950.jar org.apache.hadoop.hive.cli.CliDriver --hiveconf hive.aux.jars.path=file:///usr/hdp/current/hive-webhcat/share/hcatalog/hive-hcatalog-core.jar

you can also check heap space allocated using:

jmap -heap YOUR-HIVE-CLIENT-PID

Re: Hive increase map join local task memory

gbraccialli3 — Thu, 26 Nov 2015 10:05:23 GMT

@Michael Miklavcic

check last answer, sorry I tried to post as comment, but it wasnt possible due to max character limit in comments.

Re: Hive increase map join local task memory

aervits — Wed, 03 Feb 2016 23:53:06 GMT

@Michael Miklavcic are you still having issues with this? Can you accept best answer or provide your own solution?

Re: Hive increase map join local task memory

mmiklavcic — Thu, 04 Feb 2016 04:26:31 GMT

It's a bug in Hive - you can disable hive.auto.convert.join or set the memory at a global level via HADOOP_HEAPSIZE, but it does not solve the question of setting the local task memory on a per-job basis.

Re: Hive increase map join local task memory

nsabharwal — Thu, 04 Feb 2016 04:29:51 GMT

@Michael Miklavcic I have accepted your answer. Thanks for posting the final answer!!

Re: Hive increase map join local task memory

alindbillore — Tue, 02 Aug 2016 15:07:37 GMT

Thanks @Guilherme Braccialli Increasing hive.auto.convert.join.noconditionaltask.size fixed our problem. UpVoted !

Re: Hive increase map join local task memory

gnanasekaran_g — Sat, 01 Apr 2017 20:14:16 GMT

Hi All, I too face this issue in production and here is my error and production hive settings.

Execution log at: /tmp/crhscrvs/crhscrvs_20170401171447_7fa9db9e-7265-4844-a325-0e11b8e2e2c5.log

2017-04-01 17:18:50 Starting to launch local task to process map join; maximum memory = 2130968576

2017-04-01 17:18:53 Processing rows: 200000 Hashtable size: 199999 Memory usage: 180192576 percentage: 0.085

2017-04-01 17:18:53 Processing rows: 300000 Hashtable size: 299999 Memory usage: 203985896 percentage: 0.096

2017-04-01 17:18:54 Processing rows: 400000 Hashtable size: 399999 Memory usage: 247108088 percentage: 0.116

2017-04-01 17:18:55 Processing rows: 500000 Hashtable size: 499999 Memory usage: 329110392 percentage: 0.154

2017-04-01 17:18:55 Processing rows: 600000 Hashtable size: 599999 Memory usage: 347313416 percentage: 0.163

2017-04-01 17:18:55 Processing rows: 700000 Hashtable size: 699999 Memory usage: 410839712 percentage: 0.193

2017-04-01 17:18:55 Processing rows: 800000 Hashtable size: 799999 Memory usage: 453803856 percentage: 0.213

2017-04-01 17:18:56 Processing rows: 900000 Hashtable size: 899999 Memory usage: 528026968 percentage: 0.248

2017-04-01 17:18:56 Processing rows: 1000000 Hashtable size: 999999 Memory usage: 564196224 percentage: 0.265

2017-04-01 17:18:56 Processing rows: 1100000 Hashtable size: 1099999 Memory usage: 592163176 percentage: 0.278

2017-04-01 17:18:57 Processing rows: 1200000 Hashtable size: 1199999 Memory usage: 658466272 percentage: 0.309

2017-04-01 17:18:57 Processing rows: 1300000 Hashtable size: 1299999 Memory usage: 699296984 percentage: 0.328

2017-04-01 17:18:57 Processing rows: 1400000 Hashtable size: 1399999 Memory usage: 759936160 percentage: 0.357

2017-04-01 17:18:58 Processing rows: 1500000 Hashtable size: 1499999 Memory usage: 846875144 percentage: 0.397

2017-04-01 17:18:59 Processing rows: 1600000 Hashtable size: 1599999 Memory usage: 863823240 percentage: 0.405

2017-04-01 17:18:59 Processing rows: 1700000 Hashtable size: 1699999 Memory usage: 923698304 percentage: 0.433

2017-04-01 17:19:00 Processing rows: 1800000 Hashtable size: 1799999 Memory usage: 998273304 percentage: 0.468

2017-04-01 17:19:00 Processing rows: 1900000 Hashtable size: 1899999 Memory usage: 1009902104 percentage: 0.474

2017-04-01 17:19:00 Processing rows: 2000000 Hashtable size: 1999999 Memory usage: 1080755328 percentage: 0.507

2017-04-01 17:19:01 Processing rows: 2100000 Hashtable size: 2099999 Memory usage: 1118238920 percentage: 0.525

2017-04-01 17:19:01 Processing rows: 2200000 Hashtable size: 2199999 Memory usage: 1147275760 percentage: 0.538

2017-04-01 17:19:01 Processing rows: 2300000 Hashtable size: 2299999 Memory usage: 1214495864 percentage: 0.57

Execution failed with exit status: 3

Obtaining error information

Task failed!

Task ID:

Stage-55

Logs:

/tmp/crhscrvs/hive.log

Production Hive Settings:

hive> set hive.cbo.enable;
hive.cbo.enable=true
hive> set hive.stats.autogather;
hive.stats.autogather=true
hive> set hive.stats.fetch.column.stats;
hive.stats.fetch.column.stats=false
hive> set hive.stats.fetch.partition.stats;
hive.stats.fetch.partition.stats=true
hive> set hive.tez.java.opts;
hive.tez.java.opts=-server -Xmx3072m -Djava.net.preferIPv4Stack=true -XX:NewRatio=8 -XX:+UseNUMA -XX:+UseParallelGC
hive> set hive.auto.convert.join.noconditionaltask;
hive.auto.convert.join.noconditionaltask=true
hive> set hive.auto.convert.join.noconditionaltask.size;
hive.auto.convert.join.noconditionaltask.size=1561644237
hive> set hive.exec.reducers.bytes.per.reducer;
hive.exec.reducers.bytes.per.reducer=269798605
hive> set hive.cli.print.header=true;

I'm running Hive on Tez in HDP cluster - 2.3.2.0

For two months, it woked good and due to sudden data growth, i'm facing this memory issue.

Exactly, am getting this error stacktrace:

hive> set tez.task.resource.memory.mb=16384;


hive> set tez.am.resource.memory.mb=16384;


hive> set hive.tez.container.size=16384;


hive> insert overwrite table
crhs_fmtrade_break_latest_user_commentary partition(source_system)


    > select break_id,
reporting_date, original_reporting_date, investigationstatus,
investigationcloseddate, userdefinedclassification,


    > freeformatcomments, systeminvestigationstatus,
commentaryuploaddatetime, comment_id, commentarysourcesystem from
v_fmtrade_unique_latest_commentary_view;


Query ID =
crhscrvs_20170401164631_46f52aa6-7548-4625-9bcd-6c27f97ac207


Total jobs = 1


Launching Job 1 out of 1


 


 


Status: Running (Executing on YARN cluster with App id
application_1490695811857_8269)


 


--------------------------------------------------------------------------------


       
VERTICES      STATUS  TOTAL  COMPLETED 
RUNNING  PENDING  FAILED  KILLED


--------------------------------------------------------------------------------


Map
1                
KILLED    
-1         
0        0      
-1       0       0


Map
11               
KILLED    
-1         
0        0      
-1       0       0


Map
13               
KILLED    
-1         
0       
0       -1      
0       0


Map 16 .........  
SUCCEEDED     
2         
2       
0       
0       0       0


Map
17               
KILLED    
-1         
0        0      
-1       0       0


Map 18      
         KILLED    
-1         
0       
0       -1      
0       0


Map
20               
FAILED    
-1         
0        0      
-1       0       0


Map 25 .........  
SUCCEEDED     
2         
2       
0       
0       0       0


Map
26               
KILLED    
-1         
0        0      
-1       0       0


Map
27               
FAILED    
-1         
0        0      
-1       0       0


Map 4 ..........  
SUCCEEDED     
2         
2       
0       
0       0       1


Map
5                
KILLED     -1   
      0       
0       -1      
0       0


Reducer
10           
KILLED     
1         
0       
0       
1       0       0


Reducer
12           
KILLED     
1         
0       
0       
1       0       0


Reducer
14           
KILLED     
2         
0     
  0       
2       0       0


Reducer
15           
KILLED     
1         
0       
0       
1       0       0


Reducer
19           
KILLED     
1         
0       
0       
1       0       0


Reducer
2            
KILLED     
1         
0        0       
1       0       0


Reducer 21           
KILLED   1009         
0        0    
1009       0      
0


Reducer
22           
KILLED   
185         
0        0     
185       0       0


Reducer
23           
KILLED     
1         
0       
0       
1       0       0


Reducer
24            KILLED  
1009         
0        0    
1009       0      
0


Reducer
3            
KILLED     
1         
0        0       
1       0       0


Reducer
7            
KILLED   1009         
0        0    
1009       0      
0


Reducer 9
            KILLED  
1009         
0        0    
1009       0      
0


--------------------------------------------------------------------------------


VERTICES: 03/25 
[>>--------------------------] 0%    ELAPSED TIME: 395.80
s


--------------------------------------------------------------------------------


Status: Failed


Vertex failed, vertexName=Map 20,
vertexId=vertex_1490695811857_8269_1_11, diagnostics=[Vertex
vertex_1490695811857_8269_1_11 [Map 20] killed/failed due
to:ROOT_INPUT_INIT_FAILURE, Vertex Input: f1 initializer failed,
vertex=vertex_1490695811857_8269_1_11 [Map 20], java.lang.RuntimeException:
serious problem


        at
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1025)


        at
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1052)


        at
org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:305)


        at
org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:407)


        at
org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:155)


        at
org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:246)


        at
org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:240)


        at
java.security.AccessController.doPrivileged(Native Method)


        at
javax.security.auth.Subject.doAs(Subject.java:422)


        at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)


        at
org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:240)


        at
org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:227)


        at
java.util.concurrent.FutureTask.run(FutureTask.java:266)


        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)


        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)


        at
java.lang.Thread.run(Thread.java:745)


Caused by: java.util.concurrent.ExecutionException:
java.io.IOException: Couldn't create proxy provider class org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider


        at
java.util.concurrent.FutureTask.report(FutureTask.java:122)


        at
java.util.concurrent.FutureTask.get(FutureTask.java:192)


        at
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1002)


        ... 15 more


Caused by: java.io.IOException: Couldn't create proxy
provider class
org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider


       at
org.apache.hadoop.hdfs.NameNodeProxies.createFailoverProxyProvider(NameNodeProxies.java:515)


        at
org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:170)


        at
org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:678)


        at
org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:619)


        at
org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:149)


        at
org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2653)


        at
org.apache.hadoop.fs.FileSystem.get(FileSystem.java:368)


        at
org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)


        at
org.apache.hadoop.hive.ql.io.AcidUtils.getAcidState(AcidUtils.java:354)


        at
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$FileGenerator.call(OrcInputFormat.java:638)


        at
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$FileGenerator.call(OrcInputFormat.java:624)


        ... 4 more


Caused by: java.lang.reflect.InvocationTargetException


        at
sun.reflect.GeneratedConstructorAccessor23.newInstance(Unknown Source)


        at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)


        at
java.lang.reflect.Constructor.newInstance(Constructor.java:422)


        at
org.apache.hadoop.hdfs.NameNodeProxies.createFailoverProxyProvider(NameNodeProxies.java:498)


        ... 14 more


Caused by: java.lang.OutOfMemoryError: GC overhead
limit exceeded


        at
java.util.concurrent.ConcurrentHashMap.putVal(ConcurrentHashMap.java:1019)


        at
java.util.concurrent.ConcurrentHashMap.putAll(ConcurrentHashMap.java:1084)


        at
java.util.concurrent.ConcurrentHashMap.<init>(ConcurrentHashMap.java:852)


        at
org.apache.hadoop.conf.Configuration.<init>(Configuration.java:713)


        at
org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider.<init>(ConfiguredFailoverProxyProvider.java:70)


        ... 18 more


]


Vertex failed, vertexName=Map 27,
vertexId=vertex_1490695811857_8269_1_17, diagnostics=[Vertex vertex_1490695811857_8269_1_17
[Map 27] killed/failed due to:ROOT_INPUT_INIT_FAILURE, Vertex Input: r31
initializer failed, vertex=vertex_1490695811857_8269_1_17 [Map 27],
java.lang.RuntimeException: serious problem


        at
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1025)


        at
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1052)


        at
org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:305)


        at
org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:407)


        at
org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:155)


        at
org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:246)


        at
org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:240)


        at
java.security.AccessController.doPrivileged(Native Method)


        at
javax.security.auth.Subject.doAs(Subject.java:422)


        at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)


        at
org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:240)


        at
org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:227)


        at
java.util.concurrent.FutureTask.run(FutureTask.java:266)


        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)


        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)


        at
java.lang.Thread.run(Thread.java:745)


Caused by: java.util.concurrent.ExecutionException:
java.io.IOException: Couldn't create proxy provider class
org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider


        at
java.util.concurrent.FutureTask.report(FutureTask.java:122)


        at
java.util.concurrent.FutureTask.get(FutureTask.java:192)


        at
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1002)


        ... 15 more


Caused by: java.io.IOException: Couldn't create proxy
provider class org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider


        at
org.apache.hadoop.hdfs.NameNodeProxies.createFailoverProxyProvider(NameNodeProxies.java:515)


        at
org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:170)


        at
org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:678)


        at
org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:619)


        at
org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:149)


        at
org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2653)


        at
org.apache.hadoop.fs.FileSystem.get(FileSystem.java:368)


        at
org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)


        at
org.apache.hadoop.hive.ql.io.AcidUtils.getAcidState(AcidUtils.java:354)


        at
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$FileGenerator.call(OrcInputFormat.java:638)


        at
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$FileGenerator.call(OrcInputFormat.java:624)


        ... 4 more


Caused by: java.lang.reflect.InvocationTargetException


        at
sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)


        at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)


        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)


        at
java.lang.reflect.Constructor.newInstance(Constructor.java:422)


        at
org.apache.hadoop.hdfs.NameNodeProxies.createFailoverProxyProvider(NameNodeProxies.java:498)


        ... 14 more


Caused by: java.lang.OutOfMemoryError: GC overhead
limit exceeded


        at
java.util.Hashtable$Entry.clone(Hashtable.java:1250)


        at
java.util.Hashtable.clone(Hashtable.java:550)


        at
org.apache.hadoop.conf.Configuration.<init>(Configuration.java:706)


        at
org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider.<init>(ConfiguredFailoverProxyProvider.java:70)


        ... 19 more


]

Re: Hive increase map join local task memory

gnanasekaran_g — Sun, 02 Apr 2017 12:45:46 GMT

Hi @Alind Billore How much memory you increased for this property? I too face this issue with below settings.

set hive.auto.convert.join.noconditionaltask.size=3300000000;

Re: Hive increase map join local task memory

adevaraj — Sun, 02 Apr 2017 20:53:45 GMT

I too hit the same issue, while running the query64.sql in the sample-queries-tpcds from hive-testbench tool. Setting the HADOOP_HEAPSIZE="2048" to either in the hadoop-env.sh or in the hive-env.sh didn't resolved the issue. Even after setting the HADOOP_HEAPSIZE environment variable the maximum memory of the hive local task didn't increased from 532742144.

But adding/appending "-Xmx1536m" in the HADOOP_CLIENT_OPTS in hadoop-env.sh, increased the hive's local task and the query also completed successfully. For example:

export HADOOP_CLIENT_OPTS="-Xmx1536m $HADOOP_CLIENT_OPTS"

Re: Hive increase map join local task memory

gnanasekaran_g — Sun, 02 Apr 2017 22:55:49 GMT

Now, i've solved this issue by setting below property. Now, all mapper/reducer's output will not be stored in memory. But, i need to revisit my table data and predicates(where clause) once again to check if any unnecessary data is fetched.

set hive.auto.convert.join=false;

Re: Hive increase map join local task memory

gnanasekaran_g — Sun, 02 Apr 2017 22:58:01 GMT

now, i've solved this issue by setting below property. Now, all mapper/reducer's output will not be stored in memory. But, i need to revisit my table data and predicates(where clause) once again to check if any unnecessary data is fetched.

set hive.auto.convert.join=false;