Created 11-24-2015 10:27 PM
Is there a way in HDP >= v2.2.4 to increase the local task memory? I'm aware of disabling/limiting map-only join sizes, but we want to increase, not limit it.
Depending on the environment, the memory allocation will shift, but it appears to be entirely to Yarn and Hive's discretion.
"Starting to launch local task to process map join;maximum memory = 255328256 => ~ 0.25 GB"
I've looked at/tried:
I found documentation suggesting 'export HADOOP_HEAPSIZE="2048"' to change from the default, but this applied to the nodemanager.
Any way to configure this on a per-job basis?
EDIT
To avoid duplication, the info I'm referencing comes from here: https://support.pivotal.io/hc/en-us/articles/207750748-Unable-to-increase-hive-child-process-max-hea...
Sounds like a per-job solution is not currently available with this bug.
Created 02-03-2016 08:26 PM
It's a bug in Hive - you can disable hive.auto.convert.join or set the memory at a global level via HADOOP_HEAPSIZE, but it does not solve the question of setting the local task memory on a per-job basis.
Created 11-26-2015 02:04 AM
@Michael Miklavcic looks like hive/hadoop scripts always defines max heap size with ambari setting.
I debugged /usr/hdp/2.3.2.0-2950/hadoop/bin/hadoop.distro and got commands below, where you can change -Xmx and define your desired amount of memory.
export CLASSPATH=/usr/hdp/2.3.2.0-2950/hadoop/conf:/usr/hdp/2.3.2.0-2950/hadoop/lib/*:/usr/hdp/2.3.2.0-2950/hadoop/.//*:/usr/hdp/2.3.2.0-2950/hadoop-hdfs/./:/usr/hdp/2.3.2.0-2950/hadoop-hdfs/lib/*:/usr/hdp/2.3.2.0-2950/hadoop-hdfs/.//*:/usr/hdp/2.3.2.0-2950/hadoop-yarn/lib/*:/usr/hdp/2.3.2.0-2950/hadoop-yarn/.//*:/usr/hdp/2.3.2.0-2950/hadoop-mapreduce/lib/*:/usr/hdp/2.3.2.0-2950/hadoop-mapreduce/.//*:/usr/hdp/2.3.2.0-2950/atlas/hook/hive/*:/usr/hdp/2.3.2.0-2950/hive-hcatalog/share/hcatalog/hive-hcatalog-core-1.2.1.2.3.2.0-2950.jar:/usr/hdp/2.3.2.0-2950/hive-hcatalog/share/hcatalog/hive-hcatalog-server-extensions-1.2.1.2.3.2.0-2950.jar:/usr/hdp/2.3.2.0-2950/hive-hcatalog/share/webhcat/java-client/hive-webhcat-java-client-1.2.1.2.3.2.0-2950.jar:/usr/hdp/current/hive-client/conf:/usr/hdp/2.3.2.0-2950/hive/lib/accumulo-core-1.7.0.2.3.2.0-2950.jar:/usr/hdp/2.3.2.0-2950/hive/lib/accumulo-fate-1.7.0.2.3.2.0-2950.jar:/usr/hdp/2.3.2.0-2950/hive/lib/accumulo-start-1.7.0.2.3.2.0-2950.jar:/usr/hdp/2.3.2.0-2950/hive/lib/accumulo-trace-1.7.0.2.3.2.0-2950.jar:/usr/hdp/2.3.2.0-2950/hive/lib/activation-1.1.jar:/usr/hdp/2.3.2.0-2950/hive/lib/ant-1.9.1.jar:/usr/hdp/2.3.2.0-2950/hive/lib/ant-launcher-1.9.1.jar:/usr/hdp/2.3.2.0-2950/hive/lib/antlr-2.7.7.jar:/usr/hdp/2.3.2.0-2950/hive/lib/antlr-runtime-3.4.jar:/usr/hdp/2.3.2.0-2950/hive/lib/apache-log4j-extras-1.2.17.jar:/usr/hdp/2.3.2.0-2950/hive/lib/asm-commons-3.1.jar:/usr/hdp/2.3.2.0-2950/hive/lib/asm-tree-3.1.jar:/usr/hdp/2.3.2.0-2950/hive/lib/avro-1.7.5.jar:/usr/hdp/2.3.2.0-2950/hive/lib/bonecp-0.8.0.RELEASE.jar:/usr/hdp/2.3.2.0-2950/hive/lib/calcite-avatica-1.2.0.2.3.2.0-2950.jar:/usr/hdp/2.3.2.0-2950/hive/lib/calcite-core-1.2.0.2.3.2.0-2950.jar:/usr/hdp/2.3.2.0-2950/hive/lib/calcite-linq4j-1.2.0.2.3.2.0-2950.jar:/usr/hdp/2.3.2.0-2950/hive/lib/commons-cli-1.2.jar:/usr/hdp/2.3.2.0-2950/hive/lib/commons-codec-1.4.jar:/usr/hdp/2.3.2.0-2950/hive/lib/commons-collections-3.2.1.jar:/usr/hdp/2.3.2.0-2950/hive/lib/commons-compiler-2.7.6.jar:/usr/hdp/2.3.2.0-2950/hive/lib/commons-compress-1.4.1.jar:/usr/hdp/2.3.2.0-2950/hive/lib/commons-dbcp-1.4.jar:/usr/hdp/2.3.2.0-2950/hive/lib/commons-httpclient-3.0.1.jar:/usr/hdp/2.3.2.0-2950/hive/lib/commons-io-2.4.jar:/usr/hdp/2.3.2.0-2950/hive/lib/commons-lang-2.6.jar:/usr/hdp/2.3.2.0-2950/hive/lib/commons-logging-1.1.3.jar:/usr/hdp/2.3.2.0-2950/hive/lib/commons-math-2.1.jar:/usr/hdp/2.3.2.0-2950/hive/lib/commons-pool-1.5.4.jar:/usr/hdp/2.3.2.0-2950/hive/lib/commons-vfs2-2.0.jar:/usr/hdp/2.3.2.0-2950/hive/lib/curator-client-2.6.0.jar:/usr/hdp/2.3.2.0-2950/hive/lib/curator-framework-2.6.0.jar:/usr/hdp/2.3.2.0-2950/hive/lib/curator-recipes-2.6.0.jar:/usr/hdp/2.3.2.0-2950/hive/lib/datanucleus-api-jdo-3.2.6.jar:/usr/hdp/2.3.2.0-2950/hive/lib/datanucleus-core-3.2.10.jar:/usr/hdp/2.3.2.0-2950/hive/lib/datanucleus-rdbms-3.2.9.jar:/usr/hdp/2.3.2.0-2950/hive/lib/derby-10.10.2.0.jar:/usr/hdp/2.3.2.0-2950/hive/lib/eclipselink-2.5.2-M1.jar:/usr/hdp/2.3.2.0-2950/hive/lib/eigenbase-properties-1.1.5.jar:/usr/hdp/2.3.2.0-2950/hive/lib/geronimo-annotation_1.0_spec-1.1.1.jar:/usr/hdp/2.3.2.0-2950/hive/lib/geronimo-jaspic_1.0_spec-1.0.jar:/usr/hdp/2.3.2.0-2950/hive/lib/geronimo-jta_1.1_spec-1.1.1.jar:/usr/hdp/2.3.2.0-2950/hive/lib/groovy-all-2.1.6.jar:/usr/hdp/2.3.2.0-2950/hive/lib/gson-2.2.4.jar:/usr/hdp/2.3.2.0-2950/hive/lib/guava-14.0.1.jar:/usr/hdp/2.3.2.0-2950/hive/lib/hamcrest-core-1.1.jar:/usr/hdp/2.3.2.0-2950/hive/lib/hive-accumulo-handler-1.2.1.2.3.2.0-2950.jar:/usr/hdp/2.3.2.0-2950/hive/lib/hive-accumulo-handler.jar:/usr/hdp/2.3.2.0-2950/hive/lib/hive-ant-1.2.1.2.3.2.0-2950.jar:/usr/hdp/2.3.2.0-2950/hive/lib/hive-ant.jar:/usr/hdp/2.3.2.0-2950/hive/lib/hive-beeline-1.2.1.2.3.2.0-2950.jar:/usr/hdp/2.3.2.0-2950/hive/lib/hive-beeline.jar:/usr/hdp/2.3.2.0-2950/hive/lib/hive-cli-1.2.1.2.3.2.0-2950.jar:/usr/hdp/2.3.2.0-2950/hive/lib/hive-cli.jar:/usr/hdp/2.3.2.0-2950/hive/lib/hive-common-1.2.1.2.3.2.0-2950.jar:/usr/hdp/2.3.2.0-2950/hive/lib/hive-common.jar:/usr/hdp/2.3.2.0-2950/hive/lib/hive-contrib-1.2.1.2.3.2.0-2950.jar:/usr/hdp/2.3.2.0-2950/hive/lib/hive-contrib.jar:/usr/hdp/2.3.2.0-2950/hive/lib/hive-exec-1.2.1.2.3.2.0-2950.jar:/usr/hdp/2.3.2.0-2950/hive/lib/hive-exec.jar:/usr/hdp/2.3.2.0-2950/hive/lib/hive-hbase-handler-1.2.1.2.3.2.0-2950.jar:/usr/hdp/2.3.2.0-2950/hive/lib/hive-hbase-handler.jar:/usr/hdp/2.3.2.0-2950/hive/lib/hive-hwi-1.2.1.2.3.2.0-2950.jar:/usr/hdp/2.3.2.0-2950/hive/lib/hive-hwi.jar:/usr/hdp/2.3.2.0-2950/hive/lib/hive-jdbc-1.2.1.2.3.2.0-2950.jar:/usr/hdp/2.3.2.0-2950/hive/lib/hive-jdbc-1.2.1.2.3.2.0-2950-standalone.jar:/usr/hdp/2.3.2.0-2950/hive/lib/hive-jdbc.jar:/usr/hdp/2.3.2.0-2950/hive/lib/hive-metastore-1.2.1.2.3.2.0-2950.jar:/usr/hdp/2.3.2.0-2950/hive/lib/hive-metastore.jar:/usr/hdp/2.3.2.0-2950/hive/lib/hive-serde-1.2.1.2.3.2.0-2950.jar:/usr/hdp/2.3.2.0-2950/hive/lib/hive-serde.jar:/usr/hdp/2.3.2.0-2950/hive/lib/hive-service-1.2.1.2.3.2.0-2950.jar:/usr/hdp/2.3.2.0-2950/hive/lib/hive-service.jar:/usr/hdp/2.3.2.0-2950/hive/lib/hive-shims-0.20S-1.2.1.2.3.2.0-2950.jar:/usr/hdp/2.3.2.0-2950/hive/lib/hive-shims-0.23-1.2.1.2.3.2.0-2950.jar:/usr/hdp/2.3.2.0-2950/hive/lib/hive-shims-1.2.1.2.3.2.0-2950.jar:/usr/hdp/2.3.2.0-2950/hive/lib/hive-shims-common-1.2.1.2.3.2.0-2950.jar:/usr/hdp/2.3.2.0-2950/hive/lib/hive-shims-common.jar:/usr/hdp/2.3.2.0-2950/hive/lib/hive-shims.jar:/usr/hdp/2.3.2.0-2950/hive/lib/hive-shims-scheduler-1.2.1.2.3.2.0-2950.jar:/usr/hdp/2.3.2.0-2950/hive/lib/hive-shims-scheduler.jar:/usr/hdp/2.3.2.0-2950/hive/lib/hive-testutils-1.2.1.2.3.2.0-2950.jar:/usr/hdp/2.3.2.0-2950/hive/lib/hive-testutils.jar:/usr/hdp/2.3.2.0-2950/hive/lib/htrace-core-3.1.0-incubating.jar:/usr/hdp/2.3.2.0-2950/hive/lib/httpclient-4.4.jar:/usr/hdp/2.3.2.0-2950/hive/lib/httpcore-4.4.jar:/usr/hdp/2.3.2.0-2950/hive/lib/httpmime-4.2.5.jar:/usr/hdp/2.3.2.0-2950/hive/lib/ivy-2.4.0.jar:/usr/hdp/2.3.2.0-2950/hive/lib/janino-2.7.6.jar:/usr/hdp/2.3.2.0-2950/hive/lib/javax.persistence-2.1.0.jar:/usr/hdp/2.3.2.0-2950/hive/lib/jcommander-1.32.jar:/usr/hdp/2.3.2.0-2950/hive/lib/jdo-api-3.0.1.jar:/usr/hdp/2.3.2.0-2950/hive/lib/jetty-all-7.6.0.v20120127.jar:/usr/hdp/2.3.2.0-2950/hive/lib/jetty-all-server-7.6.0.v20120127.jar:/usr/hdp/2.3.2.0-2950/hive/lib/jline-2.12.jar:/usr/hdp/2.3.2.0-2950/hive/lib/joda-time-2.5.jar:/usr/hdp/2.3.2.0-2950/hive/lib/jpam-1.1.jar:/usr/hdp/2.3.2.0-2950/hive/lib/json-20090211.jar:/usr/hdp/2.3.2.0-2950/hive/lib/jsr305-3.0.0.jar:/usr/hdp/2.3.2.0-2950/hive/lib/jta-1.1.jar:/usr/hdp/2.3.2.0-2950/hive/lib/junit-4.11.jar:/usr/hdp/2.3.2.0-2950/hive/lib/libfb303-0.9.2.jar:/usr/hdp/2.3.2.0-2950/hive/lib/libthrift-0.9.2.jar:/usr/hdp/2.3.2.0-2950/hive/lib/log4j-1.2.16.jar:/usr/hdp/2.3.2.0-2950/hive/lib/mail-1.4.1.jar:/usr/hdp/2.3.2.0-2950/hive/lib/maven-scm-api-1.4.jar:/usr/hdp/2.3.2.0-2950/hive/lib/maven-scm-provider-svn-commons-1.4.jar:/usr/hdp/2.3.2.0-2950/hive/lib/maven-scm-provider-svnexe-1.4.jar:/usr/hdp/2.3.2.0-2950/hive/lib/mysql-connector-java.jar:/usr/hdp/2.3.2.0-2950/hive/lib/netty-3.7.0.Final.jar:/usr/hdp/2.3.2.0-2950/hive/lib/noggit-0.6.jar:/usr/hdp/2.3.2.0-2950/hive/lib/ojdbc6.jar:/usr/hdp/2.3.2.0-2950/hive/lib/opencsv-2.3.jar:/usr/hdp/2.3.2.0-2950/hive/lib/oro-2.0.8.jar:/usr/hdp/2.3.2.0-2950/hive/lib/paranamer-2.3.jar:/usr/hdp/2.3.2.0-2950/hive/lib/parquet-hadoop-bundle-1.6.0.jar:/usr/hdp/2.3.2.0-2950/hive/lib/pentaho-aggdesigner-algorithm-5.1.5-jhyde.jar:/usr/hdp/2.3.2.0-2950/hive/lib/plexus-utils-1.5.6.jar:/usr/hdp/2.3.2.0-2950/hive/lib/ranger-hive-plugin-0.5.0.2.3.2.0-2950.jar:/usr/hdp/2.3.2.0-2950/hive/lib/ranger-plugins-audit-0.5.0.2.3.2.0-2950.jar:/usr/hdp/2.3.2.0-2950/hive/lib/ranger-plugins-common-0.5.0.2.3.2.0-2950.jar:/usr/hdp/2.3.2.0-2950/hive/lib/ranger-plugins-cred-0.5.0.2.3.2.0-2950.jar:/usr/hdp/2.3.2.0-2950/hive/lib/ranger_solrj-0.5.0.2.3.2.0-2950.jar:/usr/hdp/2.3.2.0-2950/hive/lib/regexp-1.3.jar:/usr/hdp/2.3.2.0-2950/hive/lib/servlet-api-2.5.jar:/usr/hdp/2.3.2.0-2950/hive/lib/snappy-java-1.0.5.jar:/usr/hdp/2.3.2.0-2950/hive/lib/ST4-4.0.4.jar:/usr/hdp/2.3.2.0-2950/hive/lib/stax-api-1.0.1.jar:/usr/hdp/2.3.2.0-2950/hive/lib/stringtemplate-3.2.1.jar:/usr/hdp/2.3.2.0-2950/hive/lib/super-csv-2.2.0.jar:/usr/hdp/2.3.2.0-2950/hive/lib/tempus-fugit-1.1.jar:/usr/hdp/2.3.2.0-2950/hive/lib/velocity-1.5.jar:/usr/hdp/2.3.2.0-2950/hive/lib/xz-1.0.jar:/usr/hdp/2.3.2.0-2950/hive/lib/zookeeper-3.4.6.2.3.2.0-2950.jar:/usr/hdp/2.3.2.0-2950/spark/lib/spark-assembly-1.4.1.2.3.2.0-2950-hadoop2.7.1.2.3.2.0-2950.jar::/usr/hdp/current/hive-webhcat/share/hcatalog/hive-hcatalog-core.jar:/etc/hbase/conf:/usr/hdp/2.3.2.0-2950/hbase/lib/hbase-common-1.1.2.2.3.2.0-2950.jar:/usr/hdp/2.3.2.0-2950/hbase/lib/htrace-core-3.1.0-incubating.jar:/usr/hdp/2.3.2.0-2950/hbase/lib/hbase-server-1.1.2.2.3.2.0-2950.jar:/usr/hdp/2.3.2.0-2950/hbase/lib/netty-all-4.0.23.Final.jar:/usr/hdp/2.3.2.0-2950/hbase/lib/metrics-core-2.2.0.jar:/usr/hdp/2.3.2.0-2950/hbase/lib/hbase-protocol-1.1.2.2.3.2.0-2950.jar:/usr/hdp/2.3.2.0-2950/hbase/lib/hbase-client-1.1.2.2.3.2.0-2950.jar:/usr/hdp/2.3.2.0-2950/hbase/lib/hbase-hadoop-compat-1.1.2.2.3.2.0-2950.jar::/usr/share/java/mysql-connector-java-5.1.17.jar:/usr/share/java/mysql-connector-java-5.1.31-bin.jar:/usr/share/java/mysql-connector-java.jar:/usr/hdp/2.3.2.0-2950/tez/*:/usr/hdp/2.3.2.0-2950/tez/lib/*:/usr/hdp/2.3.2.0-2950/tez/conf /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.91.x86_64/bin/java -Xmx3000m -Dhdp.version=2.3.2.0-2950 -Djava.net.preferIPv4Stack=true -Dhdp.version=2.3.2.0-2950 -Dhadoop.log.dir=/var/log/hadoop/root -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/usr/hdp/2.3.2.0-2950/hadoop -Dhadoop.id.str=root -Dhadoop.root.logger=INFO,console -Djava.library.path=:/usr/hdp/2.3.2.0-2950/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.3.2.0-2950/hadoop/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -XX:MaxPermSize=512m -Dhadoop.security.logger=INFO,NullAppender org.apache.hadoop.util.RunJar /usr/hdp/2.3.2.0-2950/hive/lib/hive-cli-1.2.1.2.3.2.0-2950.jar org.apache.hadoop.hive.cli.CliDriver --hiveconf hive.aux.jars.path=file:///usr/hdp/current/hive-webhcat/share/hcatalog/hive-hcatalog-core.jar
you can also check heap space allocated using:
jmap -heap YOUR-HIVE-CLIENT-PID
Created 11-26-2015 02:05 AM
check last answer, sorry I tried to post as comment, but it wasnt possible due to max character limit in comments.
Created 02-03-2016 03:53 PM
@Michael Miklavcic are you still having issues with this? Can you accept best answer or provide your own solution?
Created 02-03-2016 08:26 PM
It's a bug in Hive - you can disable hive.auto.convert.join or set the memory at a global level via HADOOP_HEAPSIZE, but it does not solve the question of setting the local task memory on a per-job basis.
Created 02-03-2016 08:29 PM
@Michael Miklavcic I have accepted your answer. Thanks for posting the final answer!!
Created 04-01-2017 01:14 PM
Hi All, I too face this issue in production and here is my error and production hive settings.
Execution log at: /tmp/crhscrvs/crhscrvs_20170401171447_7fa9db9e-7265-4844-a325-0e11b8e2e2c5.log
2017-04-01 17:18:50 Starting to launch local task to process map join; maximum memory = 2130968576
2017-04-01 17:18:53 Processing rows: 200000 Hashtable size: 199999 Memory usage: 180192576 percentage: 0.085
2017-04-01 17:18:53 Processing rows: 300000 Hashtable size: 299999 Memory usage: 203985896 percentage: 0.096
2017-04-01 17:18:54 Processing rows: 400000 Hashtable size: 399999 Memory usage: 247108088 percentage: 0.116
2017-04-01 17:18:55 Processing rows: 500000 Hashtable size: 499999 Memory usage: 329110392 percentage: 0.154
2017-04-01 17:18:55 Processing rows: 600000 Hashtable size: 599999 Memory usage: 347313416 percentage: 0.163
2017-04-01 17:18:55 Processing rows: 700000 Hashtable size: 699999 Memory usage: 410839712 percentage: 0.193
2017-04-01 17:18:55 Processing rows: 800000 Hashtable size: 799999 Memory usage: 453803856 percentage: 0.213
2017-04-01 17:18:56 Processing rows: 900000 Hashtable size: 899999 Memory usage: 528026968 percentage: 0.248
2017-04-01 17:18:56 Processing rows: 1000000 Hashtable size: 999999 Memory usage: 564196224 percentage: 0.265
2017-04-01 17:18:56 Processing rows: 1100000 Hashtable size: 1099999 Memory usage: 592163176 percentage: 0.278
2017-04-01 17:18:57 Processing rows: 1200000 Hashtable size: 1199999 Memory usage: 658466272 percentage: 0.309
2017-04-01 17:18:57 Processing rows: 1300000 Hashtable size: 1299999 Memory usage: 699296984 percentage: 0.328
2017-04-01 17:18:57 Processing rows: 1400000 Hashtable size: 1399999 Memory usage: 759936160 percentage: 0.357
2017-04-01 17:18:58 Processing rows: 1500000 Hashtable size: 1499999 Memory usage: 846875144 percentage: 0.397
2017-04-01 17:18:59 Processing rows: 1600000 Hashtable size: 1599999 Memory usage: 863823240 percentage: 0.405
2017-04-01 17:18:59 Processing rows: 1700000 Hashtable size: 1699999 Memory usage: 923698304 percentage: 0.433
2017-04-01 17:19:00 Processing rows: 1800000 Hashtable size: 1799999 Memory usage: 998273304 percentage: 0.468
2017-04-01 17:19:00 Processing rows: 1900000 Hashtable size: 1899999 Memory usage: 1009902104 percentage: 0.474
2017-04-01 17:19:00 Processing rows: 2000000 Hashtable size: 1999999 Memory usage: 1080755328 percentage: 0.507
2017-04-01 17:19:01 Processing rows: 2100000 Hashtable size: 2099999 Memory usage: 1118238920 percentage: 0.525
2017-04-01 17:19:01 Processing rows: 2200000 Hashtable size: 2199999 Memory usage: 1147275760 percentage: 0.538
2017-04-01 17:19:01 Processing rows: 2300000 Hashtable size: 2299999 Memory usage: 1214495864 percentage: 0.57
Execution failed with exit status: 3
Obtaining error information
Task failed!
Task ID:
Stage-55
Logs:
/tmp/crhscrvs/hive.log
Production Hive Settings:
hive> set hive.cbo.enable;
hive.cbo.enable=true
hive> set hive.stats.autogather;
hive.stats.autogather=true
hive> set hive.stats.fetch.column.stats;
hive.stats.fetch.column.stats=false
hive> set hive.stats.fetch.partition.stats;
hive.stats.fetch.partition.stats=true
hive> set hive.tez.java.opts;
hive.tez.java.opts=-server -Xmx3072m -Djava.net.preferIPv4Stack=true -XX:NewRatio=8 -XX:+UseNUMA -XX:+UseParallelGC
hive> set hive.auto.convert.join.noconditionaltask;
hive.auto.convert.join.noconditionaltask=true
hive> set hive.auto.convert.join.noconditionaltask.size;
hive.auto.convert.join.noconditionaltask.size=1561644237
hive> set hive.exec.reducers.bytes.per.reducer;
hive.exec.reducers.bytes.per.reducer=269798605
hive> set hive.cli.print.header=true;
I'm running Hive on Tez in HDP cluster - 2.3.2.0
For two months, it woked good and due to sudden data growth, i'm facing this memory issue.
Exactly, am getting this error stacktrace:
hive> set tez.task.resource.memory.mb=16384; hive> set tez.am.resource.memory.mb=16384; hive> set hive.tez.container.size=16384; hive> insert overwrite table crhs_fmtrade_break_latest_user_commentary partition(source_system) > select break_id, reporting_date, original_reporting_date, investigationstatus, investigationcloseddate, userdefinedclassification, > freeformatcomments, systeminvestigationstatus, commentaryuploaddatetime, comment_id, commentarysourcesystem from v_fmtrade_unique_latest_commentary_view; Query ID = crhscrvs_20170401164631_46f52aa6-7548-4625-9bcd-6c27f97ac207 Total jobs = 1 Launching Job 1 out of 1 Status: Running (Executing on YARN cluster with App id application_1490695811857_8269) -------------------------------------------------------------------------------- VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED -------------------------------------------------------------------------------- Map 1 KILLED -1 0 0 -1 0 0 Map 11 KILLED -1 0 0 -1 0 0 Map 13 KILLED -1 0 0 -1 0 0 Map 16 ......... SUCCEEDED 2 2 0 0 0 0 Map 17 KILLED -1 0 0 -1 0 0 Map 18 KILLED -1 0 0 -1 0 0 Map 20 FAILED -1 0 0 -1 0 0 Map 25 ......... SUCCEEDED 2 2 0 0 0 0 Map 26 KILLED -1 0 0 -1 0 0 Map 27 FAILED -1 0 0 -1 0 0 Map 4 .......... SUCCEEDED 2 2 0 0 0 1 Map 5 KILLED -1 0 0 -1 0 0 Reducer 10 KILLED 1 0 0 1 0 0 Reducer 12 KILLED 1 0 0 1 0 0 Reducer 14 KILLED 2 0 0 2 0 0 Reducer 15 KILLED 1 0 0 1 0 0 Reducer 19 KILLED 1 0 0 1 0 0 Reducer 2 KILLED 1 0 0 1 0 0 Reducer 21 KILLED 1009 0 0 1009 0 0 Reducer 22 KILLED 185 0 0 185 0 0 Reducer 23 KILLED 1 0 0 1 0 0 Reducer 24 KILLED 1009 0 0 1009 0 0 Reducer 3 KILLED 1 0 0 1 0 0 Reducer 7 KILLED 1009 0 0 1009 0 0 Reducer 9 KILLED 1009 0 0 1009 0 0 -------------------------------------------------------------------------------- VERTICES: 03/25 [>>--------------------------] 0% ELAPSED TIME: 395.80 s -------------------------------------------------------------------------------- Status: Failed Vertex failed, vertexName=Map 20, vertexId=vertex_1490695811857_8269_1_11, diagnostics=[Vertex vertex_1490695811857_8269_1_11 [Map 20] killed/failed due to:ROOT_INPUT_INIT_FAILURE, Vertex Input: f1 initializer failed, vertex=vertex_1490695811857_8269_1_11 [Map 20], java.lang.RuntimeException: serious problem at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1025) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1052) at org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:305) at org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:407) at org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:155) at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:246) at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:240) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:240) at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:227) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.util.concurrent.ExecutionException: java.io.IOException: Couldn't create proxy provider class org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:192) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1002) ... 15 more Caused by: java.io.IOException: Couldn't create proxy provider class org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider at org.apache.hadoop.hdfs.NameNodeProxies.createFailoverProxyProvider(NameNodeProxies.java:515) at org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:170) at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:678) at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:619) at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:149) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2653) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:368) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295) at org.apache.hadoop.hive.ql.io.AcidUtils.getAcidState(AcidUtils.java:354) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$FileGenerator.call(OrcInputFormat.java:638) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$FileGenerator.call(OrcInputFormat.java:624) ... 4 more Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.GeneratedConstructorAccessor23.newInstance(Unknown Source) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:422) at org.apache.hadoop.hdfs.NameNodeProxies.createFailoverProxyProvider(NameNodeProxies.java:498) ... 14 more Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded at java.util.concurrent.ConcurrentHashMap.putVal(ConcurrentHashMap.java:1019) at java.util.concurrent.ConcurrentHashMap.putAll(ConcurrentHashMap.java:1084) at java.util.concurrent.ConcurrentHashMap.<init>(ConcurrentHashMap.java:852) at org.apache.hadoop.conf.Configuration.<init>(Configuration.java:713) at org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider.<init>(ConfiguredFailoverProxyProvider.java:70) ... 18 more ] Vertex failed, vertexName=Map 27, vertexId=vertex_1490695811857_8269_1_17, diagnostics=[Vertex vertex_1490695811857_8269_1_17 [Map 27] killed/failed due to:ROOT_INPUT_INIT_FAILURE, Vertex Input: r31 initializer failed, vertex=vertex_1490695811857_8269_1_17 [Map 27], java.lang.RuntimeException: serious problem at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1025) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1052) at org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:305) at org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:407) at org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:155) at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:246) at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:240) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:240) at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:227) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.util.concurrent.ExecutionException: java.io.IOException: Couldn't create proxy provider class org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:192) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1002) ... 15 more Caused by: java.io.IOException: Couldn't create proxy provider class org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider at org.apache.hadoop.hdfs.NameNodeProxies.createFailoverProxyProvider(NameNodeProxies.java:515) at org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:170) at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:678) at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:619) at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:149) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2653) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:368) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295) at org.apache.hadoop.hive.ql.io.AcidUtils.getAcidState(AcidUtils.java:354) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$FileGenerator.call(OrcInputFormat.java:638) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$FileGenerator.call(OrcInputFormat.java:624) ... 4 more Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:422) at org.apache.hadoop.hdfs.NameNodeProxies.createFailoverProxyProvider(NameNodeProxies.java:498) ... 14 more Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded at java.util.Hashtable$Entry.clone(Hashtable.java:1250) at java.util.Hashtable.clone(Hashtable.java:550) at org.apache.hadoop.conf.Configuration.<init>(Configuration.java:706) at org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider.<init>(ConfiguredFailoverProxyProvider.java:70) ... 19 more ]
Created 04-02-2017 01:53 PM
I too hit the same issue, while running the query64.sql in the sample-queries-tpcds from hive-testbench tool. Setting the HADOOP_HEAPSIZE="2048" to either in the hadoop-env.sh or in the hive-env.sh didn't resolved the issue. Even after setting the HADOOP_HEAPSIZE environment variable the maximum memory of the hive local task didn't increased from 532742144.
But adding/appending "-Xmx1536m" in the HADOOP_CLIENT_OPTS in hadoop-env.sh, increased the hive's local task and the query also completed successfully. For example:
export HADOOP_CLIENT_OPTS="-Xmx1536m $HADOOP_CLIENT_OPTS"
Created 04-02-2017 03:55 PM
Now, i've solved this issue by setting below property. Now, all mapper/reducer's output will not be stored in memory. But, i need to revisit my table data and predicates(where clause) once again to check if any unnecessary data is fetched.
set hive.auto.convert.join=false;