<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Trying to generate data from hive_testbench throws &amp;quot;Malformed ORC file hdfs&amp;quot; Exception in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/Trying-to-generate-data-from-hive-testbench-throws-quot/m-p/135516#M98174</link>
    <description>&lt;P&gt;&lt;BR /&gt;you can change default behave by "set hive.default.fileformat=TextFile"&lt;/P&gt;</description>
    <pubDate>Wed, 03 Oct 2018 21:21:11 GMT</pubDate>
    <dc:creator>itxx00</dc:creator>
    <dc:date>2018-10-03T21:21:11Z</dc:date>
    <item>
      <title>Trying to generate data from hive_testbench throws "Malformed ORC file hdfs" Exception</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Trying-to-generate-data-from-hive-testbench-throws-quot/m-p/135511#M98169</link>
      <description>&lt;P&gt;I am working on setting and configuring hive_testbench. I applied all the required steps for the configurations but whenever I try to generate the data, I get the following exception:&lt;/P&gt;&lt;PRE&gt;Caused by: java.lang.RuntimeException: java.io.IOException: org.apache.hadoop.hive.ql.io.FileFormatException: Malformed ORC file hdfs://mycluster/tmp/tpcds-generate/100/date_dim/data-m-00099. Invalid postscript.
        at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:196)
        at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.&amp;lt;init&amp;gt;(TezGroupedSplitsInputFormat.java:135)
        at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat.getRecordReader(TezGroupedSplitsInputFormat.java:101)
        at org.apache.tez.mapreduce.lib.MRReaderMapred.setupOldRecordReader(MRReaderMapred.java:149)
        at org.apache.tez.mapreduce.lib.MRReaderMapred.setSplit(MRReaderMapred.java:80)
        at org.apache.tez.mapreduce.input.MRInput.initFromEventInternal(MRInput.java:650)
        at org.apache.tez.mapreduce.input.MRInput.initFromEvent(MRInput.java:621)
        at org.apache.tez.mapreduce.input.MRInputLegacy.checkAndAwaitRecordReaderInitialization(MRInputLegacy.java:145)
        at org.apache.tez.mapreduce.input.MRInputLegacy.init(MRInputLegacy.java:109)
        at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.getMRInput(MapRecordProcessor.java:408)
        at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:128)
        at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:149)
        ... 14 more
Caused by: java.io.IOException: org.apache.hadoop.hive.ql.io.FileFormatException: Malformed ORC file hdfs://mycluster/tmp/tpcds-generate/100/date_dim/data-m-00099. Invalid postscript.
        at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97)
        at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57)
        at org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:253)
        at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:193)
        ... 25 more
Caused by: org.apache.hadoop.hive.ql.io.FileFormatException: Malformed ORC file hdfs://mycluster/tmp/tpcds-generate/100/date_dim/data-m-00099. Invalid postscript.
        at org.apache.hadoop.hive.ql.io.orc.ReaderImpl.ensureOrcFooter(ReaderImpl.java:251)
        at org.apache.hadoop.hive.ql.io.orc.ReaderImpl.extractMetaInfoFromFooter(ReaderImpl.java:376)
        at org.apache.hadoop.hive.ql.io.orc.ReaderImpl.&amp;lt;init&amp;gt;(ReaderImpl.java:317)
        at org.apache.hadoop.hive.ql.io.orc.OrcFile.createReader(OrcFile.java:238)
        at org.apache.hadoop.hive.ql.io.orc.VectorizedOrcInputFormat.getRecordReader(VectorizedOrcInputFormat.java:175)
        at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.createVectorizedReader(OrcInputFormat.java:1239)
        at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getRecordReader(OrcInputFormat.java:1252)
        at org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:251)
        ... 26 more


&lt;/PRE&gt;&lt;P&gt;Also,tpcds_bin_partitioned_orc_100 DB is generated but remains empty due to these errors (i.e. no tables). I tried generating the data by only calling the script, and I tried running it with the FORMAT=textfile and format=orc options but I still get the same error. &lt;/P&gt;&lt;P&gt;Any idea how can I resolve this and generate the data in tpcds_bin_partitioned_orc_100 DB?&lt;/P&gt;</description>
      <pubDate>Mon, 05 Sep 2016 13:23:09 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Trying-to-generate-data-from-hive-testbench-throws-quot/m-p/135511#M98169</guid>
      <dc:creator>sarah_maadawy</dc:creator>
      <dc:date>2016-09-05T13:23:09Z</dc:date>
    </item>
    <item>
      <title>Re: Trying to generate data from hive_testbench throws "Malformed ORC file hdfs" Exception</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Trying-to-generate-data-from-hive-testbench-throws-quot/m-p/135512#M98170</link>
      <description>&lt;P&gt; Hi &lt;A rel="user" href="https://community.cloudera.com/users/11856/sarahmaadawy.html" nodeid="11856"&gt;@Sarah Maadawy&lt;/A&gt;. Did you run &lt;EM&gt;&lt;STRONG&gt;./tpcds-setup.sh 100&lt;/STRONG&gt;&lt;/EM&gt;?  That's 100 GB of data. Are you sure you wanted that much data? You might be running out of space.&lt;/P&gt;</description>
      <pubDate>Tue, 06 Sep 2016 03:08:27 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Trying-to-generate-data-from-hive-testbench-throws-quot/m-p/135512#M98170</guid>
      <dc:creator>SQLShaw</dc:creator>
      <dc:date>2016-09-06T03:08:27Z</dc:date>
    </item>
    <item>
      <title>Re: Trying to generate data from hive_testbench throws "Malformed ORC file hdfs" Exception</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Trying-to-generate-data-from-hive-testbench-throws-quot/m-p/135513#M98171</link>
      <description>&lt;P&gt;I tried with 10GB, I have enough space but I am still getting the same error&lt;/P&gt;</description>
      <pubDate>Tue, 06 Sep 2016 07:06:33 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Trying-to-generate-data-from-hive-testbench-throws-quot/m-p/135513#M98171</guid>
      <dc:creator>sarah_maadawy</dc:creator>
      <dc:date>2016-09-06T07:06:33Z</dc:date>
    </item>
    <item>
      <title>Re: Trying to generate data from hive_testbench throws "Malformed ORC file hdfs" Exception</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Trying-to-generate-data-from-hive-testbench-throws-quot/m-p/135514#M98172</link>
      <description>&lt;P&gt;I also tried to query the tables in tpcds_text_10 before generating the tables in tpcds_bin_partitioned_orc_10 and they through the same error. but that could make sense because they are originally created in text format and then changed to ord after that as per my understanding from the scripts&lt;/P&gt;</description>
      <pubDate>Tue, 06 Sep 2016 11:57:33 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Trying-to-generate-data-from-hive-testbench-throws-quot/m-p/135514#M98172</guid>
      <dc:creator>sarah_maadawy</dc:creator>
      <dc:date>2016-09-06T11:57:33Z</dc:date>
    </item>
    <item>
      <title>Re: Trying to generate data from hive_testbench throws "Malformed ORC file hdfs" Exception</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Trying-to-generate-data-from-hive-testbench-throws-quot/m-p/135515#M98173</link>
      <description>&lt;P&gt;turned out it is because hive by default creates the tables in ORC format and hive-testbench assumes that the default tables is in text format. I had to change the script in hive-testbench/ddl-tpcds/text/alltable.sql to be STORED AS TEXTFILE. &lt;/P&gt;</description>
      <pubDate>Tue, 06 Sep 2016 13:10:11 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Trying-to-generate-data-from-hive-testbench-throws-quot/m-p/135515#M98173</guid>
      <dc:creator>sarah_maadawy</dc:creator>
      <dc:date>2016-09-06T13:10:11Z</dc:date>
    </item>
    <item>
      <title>Re: Trying to generate data from hive_testbench throws "Malformed ORC file hdfs" Exception</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Trying-to-generate-data-from-hive-testbench-throws-quot/m-p/135516#M98174</link>
      <description>&lt;P&gt;&lt;BR /&gt;you can change default behave by "set hive.default.fileformat=TextFile"&lt;/P&gt;</description>
      <pubDate>Wed, 03 Oct 2018 21:21:11 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Trying-to-generate-data-from-hive-testbench-throws-quot/m-p/135516#M98174</guid>
      <dc:creator>itxx00</dc:creator>
      <dc:date>2018-10-03T21:21:11Z</dc:date>
    </item>
  </channel>
</rss>

