Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Unable to run CTAS query using external table with gzipped data.

avatar

I think either this is function is not supported or I am missing something very basic.. but here is the issue -

1) Uploaded a GZipped CSV format file to HDFS - No issues

2) Created an external table using CSV Serde pointing LOCATION to the file in step 1 above. Once the table is created I am able to run queries without any problems.

3) Running a CTAS query with the exact same table layout but in ORC format causes the error below.

Please help !

------- Error -------

Caused by: java.io.IOException: incorrect header check

at org.apache.hadoop.io.compress.zlib.ZlibDecompressor.inflateBytesDirect(Native Method)

at org.apache.hadoop.io.compress.zlib.ZlibDecompressor.decompress(ZlibDecompressor.java:228)

at org.apache.hadoop.io.compress.DecompressorStream.decompress(DecompressorStream.java:91)

at org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:85)

at java.io.InputStream.read(InputStream.java:101)

at org.apache.hadoop.util.LineReader.fillBuffer(LineReader.java:180)

at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:216)

at org.apache.hadoop.util.LineReader.readLine(LineReader.java:174)

at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:246)

at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:47)

at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:350)

... 22 more

]], Vertex did not succeed due to OWN_TASK_FAILURE, failedTasks:1 killedTasks:0, Vertex vertex_1443886863664_0003_1_00 [Map 1] killed/failed due to:null]DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:0

at org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:170)

at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160)

at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:88)

at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1653)

at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1412)

at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1195)

at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1059)

at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1054)

at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:154)

... 11 more

1 ACCEPTED SOLUTION

avatar

One way you can achieve the transformation of your CSV data to ORC would be to do the following:

1. Register your CSV GZ data as a text table, something like:

create table <tablename>_txt (...) location '...';

2. Create equivalent ORC table

create table <tablename>_orc (...) stored as orc;

3. Populate the data into equivalent ORC table

insert overwrite table <tablename>_orc select * from <tablename>_txt;

I have used this in the past and worked for me.

View solution in original post

3 REPLIES 3

avatar

Could be a real bug. What is HDP/hive version you are using?

avatar
Super Collaborator

Could you copy-paste the queries in 2) and 3) ?

You cannot create external table with CTAS (https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-CreateTableAsSelect(CTAS) ) but I am not sure this is the error you have. So pasting the queries would help.

Thanks.

avatar

One way you can achieve the transformation of your CSV data to ORC would be to do the following:

1. Register your CSV GZ data as a text table, something like:

create table <tablename>_txt (...) location '...';

2. Create equivalent ORC table

create table <tablename>_orc (...) stored as orc;

3. Populate the data into equivalent ORC table

insert overwrite table <tablename>_orc select * from <tablename>_txt;

I have used this in the past and worked for me.