Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Not able to run hive benchmark test

Solved Go to solution

Not able to run hive benchmark test

Explorer

I am trying to do hive bench marking(https://github.com/hortonworks/hive-testbench)

but when I run setup script it loads data is some table but fails after sometime fails with the following error:

OK Time taken: 0.264 seconds + '[' X = X ']' + FORMAT=orc + i=1 + total=24 + DATABASE=tpcds_bin_partitioned_orc_2 + for t in '${FACTS}' + echo 'Optimizing table store_sales (1/24).' Optimizing table store_sales (1/24). + COMMAND='hive -i settings/load-partitioned.sql -f ddl-tpcds/bin_partitioned/store_sales.sql -d DB=tpcds_bin_partitioned_orc_2 -d SCALE=2 -d SOURCE=tpcds_text_2 -d BUCKETS=1 -d RETURN_BUCKETS=1 -d FILE=orc' + runcommand 'hive -i settings/load-partitioned.sql -f ddl-tpcds/bin_partitioned/store_sales.sql -d DB=tpcds_bin_partitioned_orc_2 -d SCALE=2 -d SOURCE=tpcds_text_2 -d BUCKETS=1 -d RETURN_BUCKETS=1 -d FILE=orc' + '[' XON '!=' X ']' + hive -i settings/load-partitioned.sql -f ddl-tpcds/bin_partitioned/store_sales.sql -d DB=tpcds_bin_partitioned_orc_2 -d SCALE=2 -d SOURCE=tpcds_text_2 -d BUCKETS=1 -d RETURN_BUCKETS=1 -d FILE=orc WARNING: Use "yarn jar" to launch YARN applications. Logging initialized using configuration in file:/etc/hive/2.4.0.0-169/0/hive-log4j.properties ...

OK Time taken: 0.948 seconds OK Time taken: 0.238 seconds OK Time taken: 0.629 seconds OK Time taken: 0.248 seconds Query ID = hdfs_20160322014240_60c3f689-816d-409e-b8c7-c6ea636fa12a Total jobs = 1 Launching Job 1 out of 1 Dag submit failed due to Invalid TaskLaunchCmdOpts defined for Vertex Map 1 : Invalid/conflicting GC options found, cmdOpts="-server -Djava.net.preferIPv4Stack=true -Dhdp.version=2.4.0.0-169 -XX:+PrintGCDetails -verbose:gc -XX:+PrintGCTimeStamps -XX:+UseNUMA -XX:+UseParallelGC -XX:+PrintGCDetails -verbose:gc -XX:+PrintGCTimeStamps -XX:+UseNUMA -XX:+UseG1GC -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp/ -Dlog4j.configuratorClass=org.apache.tez.common.TezLog4jConfigurator -Dlog4j.configuration=tez-container-log4j.properties -Dyarn.app.container.log.dir=<LOG_DIR> -Dtez.root.logger=INFO,CLA " stack trace: [org.apache.tez.dag.api.DAG.createDag(DAG.java:859), org.apache.tez.client.TezClientUtils.prepareAndCreateDAGPlan(TezClientUtils.java:694), org.apache.tez.client.TezClient.submitDAGSession(TezClient.java:487), org.apache.tez.client.TezClient.submitDAG(TezClient.java:434), org.apache.hadoop.hive.ql.exec.tez.TezTask.submit(TezTask.java:439), org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:180), org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160), org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:89), org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:75)] retrying... FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.tez.TezTask + '[' 1 -ne 0 ']' + echo 'Command failed, try '\''export DEBUG_SCRIPT=ON'\'' and re-running' Command failed, try 'export DEBUG_SCRIPT=ON' and re-running + exit 1

Not sure what is wrong.

Anyhelp is appreciated.

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted

Re: Not able to run hive benchmark test

New Contributor

So, I found out that the problem was caused by hive-testbench/settings/load-partitioned.sql. This file is used as init file for hive on generating TPC-DS data. It has some configs for hive, including hive.tez.java.opts.

set hive.tez.java.opts=-XX:+PrintGCDetails -verbose:gc -XX:+PrintGCTimeStamps -XX:+UseNUMA -XX:+UseG1GC -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp/;

This config conflicts with default HDP Hive config.

Two ways to solve it:

1. Change hive.tez.java.opts in hive-testbench/settings/load-partitioned.sql to use UseParallelGC (recommended).

set hive.tez.java.opts=-XX:+PrintGCDetails -verbose:gc -XX:+PrintGCTimeStamps -XX:+UseNUMA -XX:+UseParallelGC -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp/;

or

2. Set hive config on Ambari to use UseG1GC java garbage collecto

tez.am.launch.cmd-opts: -XX:+PrintGCDetails -verbose:gc -XX:+PrintGCTimeStamps -XX:+UseNUMA --XX:+UseG1GC

tez.task.launch.cmd-opts: -XX:+PrintGCDetails -verbose:gc -XX:+PrintGCTimeStamps -XX:+UseNUMA -XX:+UseG1GC

hive.tez.java.opt: -server -Djava.net.preferIPv4Stack=true -XX:NewRatio=8 -XX:+UseNUMA -XX:+UseG1GC -XX:+PrintGCDetails -verbose:gc -XX:+PrintGCTimeStamps

View solution in original post

3 REPLIES 3
Highlighted

Re: Not able to run hive benchmark test

New Contributor

Same here! Deployed Hortonworks Data Platform on Google Cloud using bdutil. Ambari is ok. When I try to gen data using tpcds setup it fails with same error as you though.

Did you find any solution?

Highlighted

Re: Not able to run hive benchmark test

New Contributor

So, I found out that the problem was caused by hive-testbench/settings/load-partitioned.sql. This file is used as init file for hive on generating TPC-DS data. It has some configs for hive, including hive.tez.java.opts.

set hive.tez.java.opts=-XX:+PrintGCDetails -verbose:gc -XX:+PrintGCTimeStamps -XX:+UseNUMA -XX:+UseG1GC -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp/;

This config conflicts with default HDP Hive config.

Two ways to solve it:

1. Change hive.tez.java.opts in hive-testbench/settings/load-partitioned.sql to use UseParallelGC (recommended).

set hive.tez.java.opts=-XX:+PrintGCDetails -verbose:gc -XX:+PrintGCTimeStamps -XX:+UseNUMA -XX:+UseParallelGC -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp/;

or

2. Set hive config on Ambari to use UseG1GC java garbage collecto

tez.am.launch.cmd-opts: -XX:+PrintGCDetails -verbose:gc -XX:+PrintGCTimeStamps -XX:+UseNUMA --XX:+UseG1GC

tez.task.launch.cmd-opts: -XX:+PrintGCDetails -verbose:gc -XX:+PrintGCTimeStamps -XX:+UseNUMA -XX:+UseG1GC

hive.tez.java.opt: -server -Djava.net.preferIPv4Stack=true -XX:NewRatio=8 -XX:+UseNUMA -XX:+UseG1GC -XX:+PrintGCDetails -verbose:gc -XX:+PrintGCTimeStamps

View solution in original post

Highlighted

Re: Not able to run hive benchmark test

+1 Another solution is to comment out hive.tez.java.opts in that sql file and manage the GC from Ambari.

Don't have an account?
Coming from Hortonworks? Activate your account here