Created 03-22-2016 01:13 AM
I am trying to do hive bench marking(https://github.com/hortonworks/hive-testbench)
but when I run setup script it loads data is some table but fails after sometime fails with the following error:
OK Time taken: 0.264 seconds + '[' X = X ']' + FORMAT=orc + i=1 + total=24 + DATABASE=tpcds_bin_partitioned_orc_2 + for t in '${FACTS}' + echo 'Optimizing table store_sales (1/24).' Optimizing table store_sales (1/24). + COMMAND='hive -i settings/load-partitioned.sql -f ddl-tpcds/bin_partitioned/store_sales.sql -d DB=tpcds_bin_partitioned_orc_2 -d SCALE=2 -d SOURCE=tpcds_text_2 -d BUCKETS=1 -d RETURN_BUCKETS=1 -d FILE=orc' + runcommand 'hive -i settings/load-partitioned.sql -f ddl-tpcds/bin_partitioned/store_sales.sql -d DB=tpcds_bin_partitioned_orc_2 -d SCALE=2 -d SOURCE=tpcds_text_2 -d BUCKETS=1 -d RETURN_BUCKETS=1 -d FILE=orc' + '[' XON '!=' X ']' + hive -i settings/load-partitioned.sql -f ddl-tpcds/bin_partitioned/store_sales.sql -d DB=tpcds_bin_partitioned_orc_2 -d SCALE=2 -d SOURCE=tpcds_text_2 -d BUCKETS=1 -d RETURN_BUCKETS=1 -d FILE=orc WARNING: Use "yarn jar" to launch YARN applications. Logging initialized using configuration in file:/etc/hive/2.4.0.0-169/0/hive-log4j.properties ...
OK Time taken: 0.948 seconds OK Time taken: 0.238 seconds OK Time taken: 0.629 seconds OK Time taken: 0.248 seconds Query ID = hdfs_20160322014240_60c3f689-816d-409e-b8c7-c6ea636fa12a Total jobs = 1 Launching Job 1 out of 1 Dag submit failed due to Invalid TaskLaunchCmdOpts defined for Vertex Map 1 : Invalid/conflicting GC options found, cmdOpts="-server -Djava.net.preferIPv4Stack=true -Dhdp.version=2.4.0.0-169 -XX:+PrintGCDetails -verbose:gc -XX:+PrintGCTimeStamps -XX:+UseNUMA -XX:+UseParallelGC -XX:+PrintGCDetails -verbose:gc -XX:+PrintGCTimeStamps -XX:+UseNUMA -XX:+UseG1GC -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp/ -Dlog4j.configuratorClass=org.apache.tez.common.TezLog4jConfigurator -Dlog4j.configuration=tez-container-log4j.properties -Dyarn.app.container.log.dir=<LOG_DIR> -Dtez.root.logger=INFO,CLA " stack trace: [org.apache.tez.dag.api.DAG.createDag(DAG.java:859), org.apache.tez.client.TezClientUtils.prepareAndCreateDAGPlan(TezClientUtils.java:694), org.apache.tez.client.TezClient.submitDAGSession(TezClient.java:487), org.apache.tez.client.TezClient.submitDAG(TezClient.java:434), org.apache.hadoop.hive.ql.exec.tez.TezTask.submit(TezTask.java:439), org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:180), org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160), org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:89), org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:75)] retrying... FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.tez.TezTask + '[' 1 -ne 0 ']' + echo 'Command failed, try '\''export DEBUG_SCRIPT=ON'\'' and re-running' Command failed, try 'export DEBUG_SCRIPT=ON' and re-running + exit 1
Not sure what is wrong.
Anyhelp is appreciated.
Created 04-29-2016 04:55 PM
So, I found out that the problem was caused by hive-testbench/settings/load-partitioned.sql. This file is used as init file for hive on generating TPC-DS data. It has some configs for hive, including hive.tez.java.opts.
set hive.tez.java.opts=-XX:+PrintGCDetails -verbose:gc -XX:+PrintGCTimeStamps -XX:+UseNUMA -XX:+UseG1GC -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp/;
This config conflicts with default HDP Hive config.
Two ways to solve it:
1. Change hive.tez.java.opts in hive-testbench/settings/load-partitioned.sql to use UseParallelGC (recommended).
set hive.tez.java.opts=-XX:+PrintGCDetails -verbose:gc -XX:+PrintGCTimeStamps -XX:+UseNUMA -XX:+UseParallelGC -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp/;
or
2. Set hive config on Ambari to use UseG1GC java garbage collecto
tez.am.launch.cmd-opts: -XX:+PrintGCDetails -verbose:gc -XX:+PrintGCTimeStamps -XX:+UseNUMA --XX:+UseG1GC
tez.task.launch.cmd-opts: -XX:+PrintGCDetails -verbose:gc -XX:+PrintGCTimeStamps -XX:+UseNUMA -XX:+UseG1GC
hive.tez.java.opt: -server -Djava.net.preferIPv4Stack=true -XX:NewRatio=8 -XX:+UseNUMA -XX:+UseG1GC -XX:+PrintGCDetails -verbose:gc -XX:+PrintGCTimeStamps
Created 04-08-2016 04:46 AM
Same here! Deployed Hortonworks Data Platform on Google Cloud using bdutil. Ambari is ok. When I try to gen data using tpcds setup it fails with same error as you though.
Did you find any solution?
Created 04-29-2016 04:55 PM
So, I found out that the problem was caused by hive-testbench/settings/load-partitioned.sql. This file is used as init file for hive on generating TPC-DS data. It has some configs for hive, including hive.tez.java.opts.
set hive.tez.java.opts=-XX:+PrintGCDetails -verbose:gc -XX:+PrintGCTimeStamps -XX:+UseNUMA -XX:+UseG1GC -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp/;
This config conflicts with default HDP Hive config.
Two ways to solve it:
1. Change hive.tez.java.opts in hive-testbench/settings/load-partitioned.sql to use UseParallelGC (recommended).
set hive.tez.java.opts=-XX:+PrintGCDetails -verbose:gc -XX:+PrintGCTimeStamps -XX:+UseNUMA -XX:+UseParallelGC -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp/;
or
2. Set hive config on Ambari to use UseG1GC java garbage collecto
tez.am.launch.cmd-opts: -XX:+PrintGCDetails -verbose:gc -XX:+PrintGCTimeStamps -XX:+UseNUMA --XX:+UseG1GC
tez.task.launch.cmd-opts: -XX:+PrintGCDetails -verbose:gc -XX:+PrintGCTimeStamps -XX:+UseNUMA -XX:+UseG1GC
hive.tez.java.opt: -server -Djava.net.preferIPv4Stack=true -XX:NewRatio=8 -XX:+UseNUMA -XX:+UseG1GC -XX:+PrintGCDetails -verbose:gc -XX:+PrintGCTimeStamps
Created 04-30-2016 09:45 AM
+1 Another solution is to comment out hive.tez.java.opts in that sql file and manage the GC from Ambari.