<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question hive testbench error when generating data in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/hive-testbench-error-when-generating-data/m-p/174172#M136435</link>
    <description>&lt;P&gt;I am evaluating the new LLAP feature of Hive.  I provisioned a new cluster in AWS using cloudbreak with the HDP 2.5 techpreview version and the EDW-ANALYTICS: APACHE HIVE 2 LLAP, APACHE ZEPPELIN configuration.&lt;/P&gt;&lt;P&gt;I logged into the master node and did a sudo to the hdfs user:&lt;/P&gt;&lt;BLOCKQUOTE&gt;&lt;P&gt;sudo -u hdfs -s&lt;/P&gt;&lt;P&gt;wget &lt;A href="https://github.com/hortonworks/hive-testbench/archive/hive14.zip"&gt;https://github.com/hortonworks/hive-testbench/archive/hive14.zip&lt;/A&gt;&lt;/P&gt;&lt;P&gt;unzip hive14.zip&lt;/P&gt;&lt;P&gt;cd hive-testbench-hive14&lt;/P&gt;&lt;P&gt;./tpcds-build.sh&lt;/P&gt;&lt;/BLOCKQUOTE&gt;&lt;P&gt;The build succeeds but when I try to generate data, I get an error loading the text into external tables:&lt;/P&gt;&lt;PRE&gt;







[hdfs@ip-10-0-3-85 hive-testbench-hive14]$ ./tpcds-setup.sh 10

ls: `/tmp/tpcds-generate/10': No such file or directory

Generating data at scale factor 10.

16/08/12 21:09:42 INFO impl.TimelineClientImpl: Timeline service address: &lt;A href="http://ip-10-0-3-85.us-west-2.compute.internal:8188/ws/v1/timeline/" target="_blank"&gt;http://ip-10-0-3-85.us-west-2.compute.internal:8188/ws/v1/timeline/&lt;/A&gt;

16/08/12 21:09:42 INFO client.RMProxy: Connecting to ResourceManager at ip-10-0-3-85.us-west-2.compute.internal/10.0.3.85:8050

16/08/12 21:09:42 INFO input.FileInputFormat: Total input paths to process : 1

16/08/12 21:09:43 INFO mapreduce.JobSubmitter: number of splits:10

16/08/12 21:09:43 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1471027682172_0026

16/08/12 21:09:43 INFO impl.YarnClientImpl: Submitted application application_1471027682172_0026

16/08/12 21:09:43 INFO mapreduce.Job: The url to track the job: &lt;A href="http://ip-10-0-3-85.us-west-2.compute.internal:8088/proxy/application_1471027682172_0026/" target="_blank"&gt;http://ip-10-0-3-85.us-west-2.compute.internal:8088/proxy/application_1471027682172_0026/&lt;/A&gt;

16/08/12 21:09:43 INFO mapreduce.Job: Running job: job_1471027682172_0026

16/08/12 21:09:49 INFO mapreduce.Job: Job job_1471027682172_0026 running in uber mode : false

16/08/12 21:09:49 INFO mapreduce.Job:  map 0% reduce 0%

16/08/12 21:10:01 INFO mapreduce.Job:  map 10% reduce 0%

16/08/12 21:10:02 INFO mapreduce.Job:  map 30% reduce 0%

16/08/12 21:10:03 INFO mapreduce.Job:  map 40% reduce 0%

16/08/12 21:10:04 INFO mapreduce.Job:  map 50% reduce 0%

16/08/12 21:13:20 INFO mapreduce.Job:  map 60% reduce 0%

16/08/12 21:13:23 INFO mapreduce.Job:  map 70% reduce 0%

16/08/12 21:14:23 INFO mapreduce.Job:  map 80% reduce 0%

16/08/12 21:14:27 INFO mapreduce.Job:  map 90% reduce 0%

16/08/12 21:14:40 INFO mapreduce.Job:  map 100% reduce 0%

16/08/12 21:24:06 INFO mapreduce.Job: Job job_1471027682172_0026 completed successfully

16/08/12 21:24:06 INFO mapreduce.Job: Counters: 30

        File System Counters

                FILE: Number of bytes read=0

                FILE: Number of bytes written=1441630

                FILE: Number of read operations=0

                FILE: Number of large read operations=0

                FILE: Number of write operations=0

                HDFS: Number of bytes read=4699

                HDFS: Number of bytes written=3718681220

                HDFS: Number of read operations=50

                HDFS: Number of large read operations=0

                HDFS: Number of write operations=89

        Job Counters

                Launched map tasks=10

                Other local map tasks=10

                Total time spent by all maps in occupied slots (ms)=2721848

                Total time spent by all reduces in occupied slots (ms)=0

                Total time spent by all map tasks (ms)=2721848

                Total vcore-milliseconds taken by all map tasks=2721848

                Total megabyte-milliseconds taken by all map tasks=4180758528

        Map-Reduce Framework

                Map input records=10

                Map output records=0

                Input split bytes=1380

                Spilled Records=0

                Failed Shuffles=0

                Merged Map outputs=0

                GC time elapsed (ms)=12044

                CPU time spent (ms)=1468280

                Physical memory (bytes) snapshot=2743345152

                Virtual memory (bytes) snapshot=21381529600

                Total committed heap usage (bytes)=2911895552

        File Input Format Counters

                Bytes Read=3319

        File Output Format Counters

                Bytes Written=0

TPC-DS text data generation complete.

Loading text data into external tables.

make: *** [date_dim] Error 1

make: *** Waiting for unfinished jobs....

make: *** [time_dim] Error 1

Data loaded into database tpcds_bin_partitioned_orc_10.
&lt;/PRE&gt;</description>
    <pubDate>Sat, 13 Aug 2016 04:39:41 GMT</pubDate>
    <dc:creator>Carolyn</dc:creator>
    <dc:date>2016-08-13T04:39:41Z</dc:date>
    <item>
      <title>hive testbench error when generating data</title>
      <link>https://community.cloudera.com/t5/Support-Questions/hive-testbench-error-when-generating-data/m-p/174172#M136435</link>
      <description>&lt;P&gt;I am evaluating the new LLAP feature of Hive.  I provisioned a new cluster in AWS using cloudbreak with the HDP 2.5 techpreview version and the EDW-ANALYTICS: APACHE HIVE 2 LLAP, APACHE ZEPPELIN configuration.&lt;/P&gt;&lt;P&gt;I logged into the master node and did a sudo to the hdfs user:&lt;/P&gt;&lt;BLOCKQUOTE&gt;&lt;P&gt;sudo -u hdfs -s&lt;/P&gt;&lt;P&gt;wget &lt;A href="https://github.com/hortonworks/hive-testbench/archive/hive14.zip"&gt;https://github.com/hortonworks/hive-testbench/archive/hive14.zip&lt;/A&gt;&lt;/P&gt;&lt;P&gt;unzip hive14.zip&lt;/P&gt;&lt;P&gt;cd hive-testbench-hive14&lt;/P&gt;&lt;P&gt;./tpcds-build.sh&lt;/P&gt;&lt;/BLOCKQUOTE&gt;&lt;P&gt;The build succeeds but when I try to generate data, I get an error loading the text into external tables:&lt;/P&gt;&lt;PRE&gt;







[hdfs@ip-10-0-3-85 hive-testbench-hive14]$ ./tpcds-setup.sh 10

ls: `/tmp/tpcds-generate/10': No such file or directory

Generating data at scale factor 10.

16/08/12 21:09:42 INFO impl.TimelineClientImpl: Timeline service address: &lt;A href="http://ip-10-0-3-85.us-west-2.compute.internal:8188/ws/v1/timeline/" target="_blank"&gt;http://ip-10-0-3-85.us-west-2.compute.internal:8188/ws/v1/timeline/&lt;/A&gt;

16/08/12 21:09:42 INFO client.RMProxy: Connecting to ResourceManager at ip-10-0-3-85.us-west-2.compute.internal/10.0.3.85:8050

16/08/12 21:09:42 INFO input.FileInputFormat: Total input paths to process : 1

16/08/12 21:09:43 INFO mapreduce.JobSubmitter: number of splits:10

16/08/12 21:09:43 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1471027682172_0026

16/08/12 21:09:43 INFO impl.YarnClientImpl: Submitted application application_1471027682172_0026

16/08/12 21:09:43 INFO mapreduce.Job: The url to track the job: &lt;A href="http://ip-10-0-3-85.us-west-2.compute.internal:8088/proxy/application_1471027682172_0026/" target="_blank"&gt;http://ip-10-0-3-85.us-west-2.compute.internal:8088/proxy/application_1471027682172_0026/&lt;/A&gt;

16/08/12 21:09:43 INFO mapreduce.Job: Running job: job_1471027682172_0026

16/08/12 21:09:49 INFO mapreduce.Job: Job job_1471027682172_0026 running in uber mode : false

16/08/12 21:09:49 INFO mapreduce.Job:  map 0% reduce 0%

16/08/12 21:10:01 INFO mapreduce.Job:  map 10% reduce 0%

16/08/12 21:10:02 INFO mapreduce.Job:  map 30% reduce 0%

16/08/12 21:10:03 INFO mapreduce.Job:  map 40% reduce 0%

16/08/12 21:10:04 INFO mapreduce.Job:  map 50% reduce 0%

16/08/12 21:13:20 INFO mapreduce.Job:  map 60% reduce 0%

16/08/12 21:13:23 INFO mapreduce.Job:  map 70% reduce 0%

16/08/12 21:14:23 INFO mapreduce.Job:  map 80% reduce 0%

16/08/12 21:14:27 INFO mapreduce.Job:  map 90% reduce 0%

16/08/12 21:14:40 INFO mapreduce.Job:  map 100% reduce 0%

16/08/12 21:24:06 INFO mapreduce.Job: Job job_1471027682172_0026 completed successfully

16/08/12 21:24:06 INFO mapreduce.Job: Counters: 30

        File System Counters

                FILE: Number of bytes read=0

                FILE: Number of bytes written=1441630

                FILE: Number of read operations=0

                FILE: Number of large read operations=0

                FILE: Number of write operations=0

                HDFS: Number of bytes read=4699

                HDFS: Number of bytes written=3718681220

                HDFS: Number of read operations=50

                HDFS: Number of large read operations=0

                HDFS: Number of write operations=89

        Job Counters

                Launched map tasks=10

                Other local map tasks=10

                Total time spent by all maps in occupied slots (ms)=2721848

                Total time spent by all reduces in occupied slots (ms)=0

                Total time spent by all map tasks (ms)=2721848

                Total vcore-milliseconds taken by all map tasks=2721848

                Total megabyte-milliseconds taken by all map tasks=4180758528

        Map-Reduce Framework

                Map input records=10

                Map output records=0

                Input split bytes=1380

                Spilled Records=0

                Failed Shuffles=0

                Merged Map outputs=0

                GC time elapsed (ms)=12044

                CPU time spent (ms)=1468280

                Physical memory (bytes) snapshot=2743345152

                Virtual memory (bytes) snapshot=21381529600

                Total committed heap usage (bytes)=2911895552

        File Input Format Counters

                Bytes Read=3319

        File Output Format Counters

                Bytes Written=0

TPC-DS text data generation complete.

Loading text data into external tables.

make: *** [date_dim] Error 1

make: *** Waiting for unfinished jobs....

make: *** [time_dim] Error 1

Data loaded into database tpcds_bin_partitioned_orc_10.
&lt;/PRE&gt;</description>
      <pubDate>Sat, 13 Aug 2016 04:39:41 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/hive-testbench-error-when-generating-data/m-p/174172#M136435</guid>
      <dc:creator>Carolyn</dc:creator>
      <dc:date>2016-08-13T04:39:41Z</dc:date>
    </item>
    <item>
      <title>Re: hive testbench error when generating data</title>
      <link>https://community.cloudera.com/t5/Support-Questions/hive-testbench-error-when-generating-data/m-p/174173#M136436</link>
      <description>&lt;P&gt;I figured out the answer by looking at the tpcds-setup.sh.  I saw the code below and set the DEBUG_SCRIPT environment variable to X to get debug output:&lt;/P&gt;&lt;P&gt;export DEBUG_SCRIPT=X&lt;/P&gt;&lt;P&gt;When I ran the script again, I saw the following error:&lt;/P&gt;&lt;PRE&gt;Dag submit failed due to Invalid TaskLaunchCmdOpts defined for Vertex Map 1 : Invalid/conflicting GC options found, cmdOpts="-server -Djava.net.preferIPv4Stack=true -Dhdp.version=2.5.0.0-1061 -XX:+PrintGCDetails -verbose:gc -XX:+PrintGCTimeStamps -XX:+UseNUMA -XX:+UseParallelGC -XX:+PrintGCDetails -verbose:gc -XX:+PrintGCTimeStamps -XX:+UseNUMA -XX:+UseG1GC -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp/ -Dlog4j.configuratorClass=org.apache.tez.common.TezLog4jConfigurator -Dlog4j.configuration=tez-container-log4j.properties -Dyarn.app.container.log.dir=&amp;lt;LOG_DIR&amp;gt; -Dtez.root.logger=INFO,CLA " stack trace: [org.apache.tez.dag.api.DAG.createDag(DAG.java:866), org.apache.tez.client.TezClientUtils.prepareAndCreateDAGPlan(TezClientUtils.java:694), org.apache.tez.client.TezClient.submitDAGSession(TezClient.java:520), org.apache.tez.client.TezClient.submitDAG(TezClient.java:466), org.apache.hadoop.hive.ql.exec.tez.TezTask.submit(TezTask.java:439), org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:180), org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160), org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:89), org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:75)] retrying...

FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.tez.TezTask

make: *** [date_dim] Error 1

make: *** Waiting for unfinished jobs....

Dag submit failed due to Invalid TaskLaunchCmdOpts defined for Vertex Map 1 : Invalid/conflicting GC options found, cmdOpts="-server -Djava.net.preferIPv4Stack=true -Dhdp.version=2.5.0.0-1061 -XX:+PrintGCDetails -verbose:gc -XX:+PrintGCTimeStamps -XX:+UseNUMA -XX:+UseParallelGC -XX:+PrintGCDetails -verbose:gc -XX:+PrintGCTimeStamps -XX:+UseNUMA -XX:+UseG1GC -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp/ -Dlog4j.configuratorClass=org.apache.tez.common.TezLog4jConfigurator -Dlog4j.configuration=tez-container-log4j.properties -Dyarn.app.container.log.dir=&amp;lt;LOG_DIR&amp;gt; -Dtez.root.logger=INFO,CLA " stack trace: [org.apache.tez.dag.api.DAG.createDag(DAG.java:866), org.apache.tez.client.TezClientUtils.prepareAndCreateDAGPlan(TezClientUtils.java:694), org.apache.tez.client.TezClient.submitDAGSession(TezClient.java:520), org.apache.tez.client.TezClient.submitDAG(TezClient.java:466), org.apache.hadoop.hive.ql.exec.tez.TezTask.submit(TezTask.java:439), org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:180), org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160), org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:89), org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:75)] retrying...

FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.tez.TezTask

make: *** [time_dim] Error 1

+ echo 'Data loaded into database tpcds_bin_partitioned_orc_10.'
&lt;/PRE&gt;&lt;P&gt;This lead me to the solution:&lt;/P&gt;&lt;P&gt;&lt;A href="http://Unable to run hive test bench "&gt;https://community.hortonworks.com/questions/23988/not-able-to-run-hive-benchmark-test.html&lt;/A&gt;&lt;/P&gt;&lt;P&gt;I used the first suggest solution below, reran the script and it is working:&lt;/P&gt;&lt;PRE&gt;1. Change hive.tez.java.opts in hive-testbench/settings/load-partitioned.sql to use UseParallelGC (recommended).
set hive.tez.java.opts=-XX:+PrintGCDetails -verbose:gc -XX:+PrintGCTimeStamps -XX:+UseNUMA -XX:+UseParallelGC -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp/;


&lt;/PRE&gt;</description>
      <pubDate>Sat, 13 Aug 2016 07:47:38 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/hive-testbench-error-when-generating-data/m-p/174173#M136436</guid>
      <dc:creator>Carolyn</dc:creator>
      <dc:date>2016-08-13T07:47:38Z</dc:date>
    </item>
    <item>
      <title>Re: hive testbench error when generating data</title>
      <link>https://community.cloudera.com/t5/Support-Questions/hive-testbench-error-when-generating-data/m-p/174174#M136437</link>
      <description>&lt;P&gt; that article is not available any longer.  not sure why.  which GC options where invalid or conflicting?&lt;/P&gt;</description>
      <pubDate>Tue, 04 Oct 2016 11:18:18 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/hive-testbench-error-when-generating-data/m-p/174174#M136437</guid>
      <dc:creator>sunile_manjee</dc:creator>
      <dc:date>2016-10-04T11:18:18Z</dc:date>
    </item>
    <item>
      <title>Re: hive testbench error when generating data</title>
      <link>https://community.cloudera.com/t5/Support-Questions/hive-testbench-error-when-generating-data/m-p/174175#M136438</link>
      <description>&lt;P&gt;In my case, "export DEBUG_SCRIPT=X" showed that I had permissioning issues. Hive user didn't have write permissions to the /tmp/hive folder on HDFS. Fixing that fixed this issue. &lt;/P&gt;</description>
      <pubDate>Thu, 10 May 2018 23:27:08 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/hive-testbench-error-when-generating-data/m-p/174175#M136438</guid>
      <dc:creator>fabidi89</dc:creator>
      <dc:date>2018-05-10T23:27:08Z</dc:date>
    </item>
    <item>
      <title>Re: hive testbench error when generating data</title>
      <link>https://community.cloudera.com/t5/Support-Questions/hive-testbench-error-when-generating-data/m-p/294397#M217231</link>
      <description>&lt;P&gt;how do you debug scripts? i use bash -x tpcds-setup.sh,but not find the error,and i use your method but it also report errors&lt;/P&gt;</description>
      <pubDate>Tue, 21 Apr 2020 06:25:32 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/hive-testbench-error-when-generating-data/m-p/294397#M217231</guid>
      <dc:creator>xiaobbai</dc:creator>
      <dc:date>2020-04-21T06:25:32Z</dc:date>
    </item>
  </channel>
</rss>

