Created 10-07-2015 02:11 AM
I think either this is function is not supported or I am missing something very basic.. but here is the issue -
1) Uploaded a GZipped CSV format file to HDFS - No issues
2) Created an external table using CSV Serde pointing LOCATION to the file in step 1 above. Once the table is created I am able to run queries without any problems.
3) Running a CTAS query with the exact same table layout but in ORC format causes the error below.
Please help !
------- Error -------
Caused by: java.io.IOException: incorrect header check
at org.apache.hadoop.io.compress.zlib.ZlibDecompressor.inflateBytesDirect(Native Method)
at org.apache.hadoop.io.compress.zlib.ZlibDecompressor.decompress(ZlibDecompressor.java:228)
at org.apache.hadoop.io.compress.DecompressorStream.decompress(DecompressorStream.java:91)
at org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:85)
at java.io.InputStream.read(InputStream.java:101)
at org.apache.hadoop.util.LineReader.fillBuffer(LineReader.java:180)
at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:216)
at org.apache.hadoop.util.LineReader.readLine(LineReader.java:174)
at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:246)
at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:47)
at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:350)
... 22 more
]], Vertex did not succeed due to OWN_TASK_FAILURE, failedTasks:1 killedTasks:0, Vertex vertex_1443886863664_0003_1_00 [Map 1] killed/failed due to:null]DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:0
at org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:170)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160)
at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:88)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1653)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1412)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1195)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1059)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1054)
at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:154)
... 11 more
Created 10-07-2015 03:24 PM
One way you can achieve the transformation of your CSV data to ORC would be to do the following:
1. Register your CSV GZ data as a text table, something like:
create table <tablename>_txt (...) location '...';
2. Create equivalent ORC table
create table <tablename>_orc (...) stored as orc;
3. Populate the data into equivalent ORC table
insert overwrite table <tablename>_orc select * from <tablename>_txt;
I have used this in the past and worked for me.
Created 10-07-2015 02:37 AM
Could be a real bug. What is HDP/hive version you are using?
Created 10-07-2015 06:15 AM
Could you copy-paste the queries in 2) and 3) ?
You cannot create external table with CTAS (https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-CreateTableAsSelect(CTAS) ) but I am not sure this is the error you have. So pasting the queries would help.
Thanks.
Created 10-07-2015 03:24 PM
One way you can achieve the transformation of your CSV data to ORC would be to do the following:
1. Register your CSV GZ data as a text table, something like:
create table <tablename>_txt (...) location '...';
2. Create equivalent ORC table
create table <tablename>_orc (...) stored as orc;
3. Populate the data into equivalent ORC table
insert overwrite table <tablename>_orc select * from <tablename>_txt;
I have used this in the past and worked for me.