Support Questions

Find answers, ask questions, and share your expertise

Not able to run hive benchmark test

avatar
Contributor

I am trying to do hive bench marking(https://github.com/hortonworks/hive-testbench)

but when I run setup script it loads data is some table but fails after sometime fails with the following error:

OK Time taken: 0.264 seconds + '[' X = X ']' + FORMAT=orc + i=1 + total=24 + DATABASE=tpcds_bin_partitioned_orc_2 + for t in '${FACTS}' + echo 'Optimizing table store_sales (1/24).' Optimizing table store_sales (1/24). + COMMAND='hive -i settings/load-partitioned.sql -f ddl-tpcds/bin_partitioned/store_sales.sql -d DB=tpcds_bin_partitioned_orc_2 -d SCALE=2 -d SOURCE=tpcds_text_2 -d BUCKETS=1 -d RETURN_BUCKETS=1 -d FILE=orc' + runcommand 'hive -i settings/load-partitioned.sql -f ddl-tpcds/bin_partitioned/store_sales.sql -d DB=tpcds_bin_partitioned_orc_2 -d SCALE=2 -d SOURCE=tpcds_text_2 -d BUCKETS=1 -d RETURN_BUCKETS=1 -d FILE=orc' + '[' XON '!=' X ']' + hive -i settings/load-partitioned.sql -f ddl-tpcds/bin_partitioned/store_sales.sql -d DB=tpcds_bin_partitioned_orc_2 -d SCALE=2 -d SOURCE=tpcds_text_2 -d BUCKETS=1 -d RETURN_BUCKETS=1 -d FILE=orc WARNING: Use "yarn jar" to launch YARN applications. Logging initialized using configuration in file:/etc/hive/2.4.0.0-169/0/hive-log4j.properties ...

OK Time taken: 0.948 seconds OK Time taken: 0.238 seconds OK Time taken: 0.629 seconds OK Time taken: 0.248 seconds Query ID = hdfs_20160322014240_60c3f689-816d-409e-b8c7-c6ea636fa12a Total jobs = 1 Launching Job 1 out of 1 Dag submit failed due to Invalid TaskLaunchCmdOpts defined for Vertex Map 1 : Invalid/conflicting GC options found, cmdOpts="-server -Djava.net.preferIPv4Stack=true -Dhdp.version=2.4.0.0-169 -XX:+PrintGCDetails -verbose:gc -XX:+PrintGCTimeStamps -XX:+UseNUMA -XX:+UseParallelGC -XX:+PrintGCDetails -verbose:gc -XX:+PrintGCTimeStamps -XX:+UseNUMA -XX:+UseG1GC -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp/ -Dlog4j.configuratorClass=org.apache.tez.common.TezLog4jConfigurator -Dlog4j.configuration=tez-container-log4j.properties -Dyarn.app.container.log.dir=<LOG_DIR> -Dtez.root.logger=INFO,CLA " stack trace: [org.apache.tez.dag.api.DAG.createDag(DAG.java:859), org.apache.tez.client.TezClientUtils.prepareAndCreateDAGPlan(TezClientUtils.java:694), org.apache.tez.client.TezClient.submitDAGSession(TezClient.java:487), org.apache.tez.client.TezClient.submitDAG(TezClient.java:434), org.apache.hadoop.hive.ql.exec.tez.TezTask.submit(TezTask.java:439), org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:180), org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160), org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:89), org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:75)] retrying... FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.tez.TezTask + '[' 1 -ne 0 ']' + echo 'Command failed, try '\''export DEBUG_SCRIPT=ON'\'' and re-running' Command failed, try 'export DEBUG_SCRIPT=ON' and re-running + exit 1

Not sure what is wrong.

Anyhelp is appreciated.

1 ACCEPTED SOLUTION

avatar
New Contributor

So, I found out that the problem was caused by hive-testbench/settings/load-partitioned.sql. This file is used as init file for hive on generating TPC-DS data. It has some configs for hive, including hive.tez.java.opts.

set hive.tez.java.opts=-XX:+PrintGCDetails -verbose:gc -XX:+PrintGCTimeStamps -XX:+UseNUMA -XX:+UseG1GC -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp/;

This config conflicts with default HDP Hive config.

Two ways to solve it:

1. Change hive.tez.java.opts in hive-testbench/settings/load-partitioned.sql to use UseParallelGC (recommended).

set hive.tez.java.opts=-XX:+PrintGCDetails -verbose:gc -XX:+PrintGCTimeStamps -XX:+UseNUMA -XX:+UseParallelGC -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp/;

or

2. Set hive config on Ambari to use UseG1GC java garbage collecto

tez.am.launch.cmd-opts: -XX:+PrintGCDetails -verbose:gc -XX:+PrintGCTimeStamps -XX:+UseNUMA --XX:+UseG1GC

tez.task.launch.cmd-opts: -XX:+PrintGCDetails -verbose:gc -XX:+PrintGCTimeStamps -XX:+UseNUMA -XX:+UseG1GC

hive.tez.java.opt: -server -Djava.net.preferIPv4Stack=true -XX:NewRatio=8 -XX:+UseNUMA -XX:+UseG1GC -XX:+PrintGCDetails -verbose:gc -XX:+PrintGCTimeStamps

View solution in original post

3 REPLIES 3

avatar
New Contributor

Same here! Deployed Hortonworks Data Platform on Google Cloud using bdutil. Ambari is ok. When I try to gen data using tpcds setup it fails with same error as you though.

Did you find any solution?

avatar
New Contributor

So, I found out that the problem was caused by hive-testbench/settings/load-partitioned.sql. This file is used as init file for hive on generating TPC-DS data. It has some configs for hive, including hive.tez.java.opts.

set hive.tez.java.opts=-XX:+PrintGCDetails -verbose:gc -XX:+PrintGCTimeStamps -XX:+UseNUMA -XX:+UseG1GC -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp/;

This config conflicts with default HDP Hive config.

Two ways to solve it:

1. Change hive.tez.java.opts in hive-testbench/settings/load-partitioned.sql to use UseParallelGC (recommended).

set hive.tez.java.opts=-XX:+PrintGCDetails -verbose:gc -XX:+PrintGCTimeStamps -XX:+UseNUMA -XX:+UseParallelGC -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp/;

or

2. Set hive config on Ambari to use UseG1GC java garbage collecto

tez.am.launch.cmd-opts: -XX:+PrintGCDetails -verbose:gc -XX:+PrintGCTimeStamps -XX:+UseNUMA --XX:+UseG1GC

tez.task.launch.cmd-opts: -XX:+PrintGCDetails -verbose:gc -XX:+PrintGCTimeStamps -XX:+UseNUMA -XX:+UseG1GC

hive.tez.java.opt: -server -Djava.net.preferIPv4Stack=true -XX:NewRatio=8 -XX:+UseNUMA -XX:+UseG1GC -XX:+PrintGCDetails -verbose:gc -XX:+PrintGCTimeStamps

avatar
Master Guru

+1 Another solution is to comment out hive.tez.java.opts in that sql file and manage the GC from Ambari.