<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Unable to run CTAS query using external table with gzipped data. in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/Unable-to-run-CTAS-query-using-external-table-with-gzipped/m-p/95081#M58451</link>
    <description>&lt;P&gt;One way you can achieve the transformation of your CSV data to ORC would be to do the following:&lt;/P&gt;&lt;P&gt;1. Register your CSV GZ data as a text table, something like:&lt;/P&gt;&lt;P&gt;create table &amp;lt;tablename&amp;gt;_txt (...) location '...';&lt;/P&gt;&lt;P&gt;2. Create equivalent ORC table&lt;/P&gt;&lt;P&gt;create table &amp;lt;tablename&amp;gt;_orc (...) stored as orc;&lt;/P&gt;&lt;P&gt;3. Populate the data into equivalent ORC table&lt;/P&gt;&lt;P&gt;insert overwrite table &amp;lt;tablename&amp;gt;_orc select * from &amp;lt;tablename&amp;gt;_txt;&lt;/P&gt;&lt;P&gt;I have used this in the past and worked for me.&lt;/P&gt;</description>
    <pubDate>Wed, 07 Oct 2015 22:24:29 GMT</pubDate>
    <dc:creator>deepesh1</dc:creator>
    <dc:date>2015-10-07T22:24:29Z</dc:date>
    <item>
      <title>Unable to run CTAS query using external table with gzipped data.</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Unable-to-run-CTAS-query-using-external-table-with-gzipped/m-p/95078#M58448</link>
      <description>&lt;P&gt;I think either this is function is not supported or I am missing something very basic.. but here is the issue - &lt;/P&gt;&lt;P&gt;1) Uploaded a GZipped CSV format file to HDFS - No issues&lt;/P&gt;&lt;P&gt;2) Created an external table using CSV Serde pointing LOCATION to the file in step 1 above. Once the table is created I am able to run queries without any problems. &lt;/P&gt;&lt;P&gt;3) Running a CTAS query with the exact same table layout but in ORC format causes the error below. &lt;/P&gt;&lt;P&gt;Please help !&lt;/P&gt;&lt;P&gt;------- Error ------- &lt;/P&gt;&lt;P&gt;Caused by: java.io.IOException: incorrect header check&lt;/P&gt;&lt;P&gt;        at org.apache.hadoop.io.compress.zlib.ZlibDecompressor.inflateBytesDirect(Native Method)&lt;/P&gt;&lt;P&gt;        at org.apache.hadoop.io.compress.zlib.ZlibDecompressor.decompress(ZlibDecompressor.java:228)&lt;/P&gt;&lt;P&gt;        at org.apache.hadoop.io.compress.DecompressorStream.decompress(DecompressorStream.java:91)&lt;/P&gt;&lt;P&gt;        at org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:85)&lt;/P&gt;&lt;P&gt;        at java.io.InputStream.read(InputStream.java:101)&lt;/P&gt;&lt;P&gt;        at org.apache.hadoop.util.LineReader.fillBuffer(LineReader.java:180)&lt;/P&gt;&lt;P&gt;        at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:216)&lt;/P&gt;&lt;P&gt;        at org.apache.hadoop.util.LineReader.readLine(LineReader.java:174)&lt;/P&gt;&lt;P&gt;        at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:246)&lt;/P&gt;&lt;P&gt;        at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:47)&lt;/P&gt;&lt;P&gt;        at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:350)&lt;/P&gt;&lt;P&gt;        ... 22 more&lt;/P&gt;&lt;P&gt;]], Vertex did not succeed due to OWN_TASK_FAILURE, failedTasks:1 killedTasks:0, Vertex vertex_1443886863664_0003_1_00 [Map 1] killed/failed due to:null]DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:0&lt;/P&gt;&lt;P&gt;        at org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:170)&lt;/P&gt;&lt;P&gt;        at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160)&lt;/P&gt;&lt;P&gt;        at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:88)&lt;/P&gt;&lt;P&gt;        at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1653)&lt;/P&gt;&lt;P&gt;        at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1412)&lt;/P&gt;&lt;P&gt;        at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1195)&lt;/P&gt;&lt;P&gt;        at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1059)&lt;/P&gt;&lt;P&gt;        at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1054)&lt;/P&gt;&lt;P&gt;        at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:154)&lt;/P&gt;&lt;P&gt;        ... 11 more&lt;/P&gt;</description>
      <pubDate>Wed, 07 Oct 2015 09:11:35 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Unable-to-run-CTAS-query-using-external-table-with-gzipped/m-p/95078#M58448</guid>
      <dc:creator>bsaini</dc:creator>
      <dc:date>2015-10-07T09:11:35Z</dc:date>
    </item>
    <item>
      <title>Re: Unable to run CTAS query using external table with gzipped data.</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Unable-to-run-CTAS-query-using-external-table-with-gzipped/m-p/95079#M58449</link>
      <description>&lt;P&gt;Could be a real bug. What is HDP/hive version you are using?&lt;/P&gt;</description>
      <pubDate>Wed, 07 Oct 2015 09:37:17 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Unable-to-run-CTAS-query-using-external-table-with-gzipped/m-p/95079#M58449</guid>
      <dc:creator>deepesh1</dc:creator>
      <dc:date>2015-10-07T09:37:17Z</dc:date>
    </item>
    <item>
      <title>Re: Unable to run CTAS query using external table with gzipped data.</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Unable-to-run-CTAS-query-using-external-table-with-gzipped/m-p/95080#M58450</link>
      <description>&lt;P&gt;Could you copy-paste the queries in 2) and 3) ?&lt;/P&gt;&lt;P&gt;You cannot create external table with CTAS (https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-CreateTableAsSelect(CTAS) ) but I am not sure this is the error you have. So pasting the queries would help.&lt;/P&gt;&lt;P&gt;Thanks.&lt;/P&gt;</description>
      <pubDate>Wed, 07 Oct 2015 13:15:47 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Unable-to-run-CTAS-query-using-external-table-with-gzipped/m-p/95080#M58450</guid>
      <dc:creator>sluangsay</dc:creator>
      <dc:date>2015-10-07T13:15:47Z</dc:date>
    </item>
    <item>
      <title>Re: Unable to run CTAS query using external table with gzipped data.</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Unable-to-run-CTAS-query-using-external-table-with-gzipped/m-p/95081#M58451</link>
      <description>&lt;P&gt;One way you can achieve the transformation of your CSV data to ORC would be to do the following:&lt;/P&gt;&lt;P&gt;1. Register your CSV GZ data as a text table, something like:&lt;/P&gt;&lt;P&gt;create table &amp;lt;tablename&amp;gt;_txt (...) location '...';&lt;/P&gt;&lt;P&gt;2. Create equivalent ORC table&lt;/P&gt;&lt;P&gt;create table &amp;lt;tablename&amp;gt;_orc (...) stored as orc;&lt;/P&gt;&lt;P&gt;3. Populate the data into equivalent ORC table&lt;/P&gt;&lt;P&gt;insert overwrite table &amp;lt;tablename&amp;gt;_orc select * from &amp;lt;tablename&amp;gt;_txt;&lt;/P&gt;&lt;P&gt;I have used this in the past and worked for me.&lt;/P&gt;</description>
      <pubDate>Wed, 07 Oct 2015 22:24:29 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Unable-to-run-CTAS-query-using-external-table-with-gzipped/m-p/95081#M58451</guid>
      <dc:creator>deepesh1</dc:creator>
      <dc:date>2015-10-07T22:24:29Z</dc:date>
    </item>
  </channel>
</rss>

