Support Questions

bsaini · ‎10-07-2015

I think either this is function is not supported or I am missing something very basic.. but here is the issue -

1) Uploaded a GZipped CSV format file to HDFS - No issues

2) Created an external table using CSV Serde pointing LOCATION to the file in step 1 above. Once the table is created I am able to run queries without any problems.

3) Running a CTAS query with the exact same table layout but in ORC format causes the error below.

Please help !

------- Error -------

Caused by: java.io.IOException: incorrect header check

at org.apache.hadoop.io.compress.zlib.ZlibDecompressor.inflateBytesDirect(Native Method)

at org.apache.hadoop.io.compress.zlib.ZlibDecompressor.decompress(ZlibDecompressor.java:228)

at org.apache.hadoop.io.compress.DecompressorStream.decompress(DecompressorStream.java:91)

at org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:85)

at java.io.InputStream.read(InputStream.java:101)

at org.apache.hadoop.util.LineReader.fillBuffer(LineReader.java:180)

at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:216)

at org.apache.hadoop.util.LineReader.readLine(LineReader.java:174)

at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:246)

at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:47)

at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:350)

... 22 more

]], Vertex did not succeed due to OWN_TASK_FAILURE, failedTasks:1 killedTasks:0, Vertex vertex_1443886863664_0003_1_00 [Map 1] killed/failed due to:null]DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:0

at org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:170)

at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160)

at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:88)

at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1653)

at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1412)

at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1195)

at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1059)

at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1054)

at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:154)

... 11 more

deepesh1 · ‎10-07-2015

One way you can achieve the transformation of your CSV data to ORC would be to do the following:

1. Register your CSV GZ data as a text table, something like:

create table <tablename>_txt (...) location '...';

2. Create equivalent ORC table

create table <tablename>_orc (...) stored as orc;

3. Populate the data into equivalent ORC table

insert overwrite table <tablename>_orc select * from <tablename>_txt;

I have used this in the past and worked for me.

View solution in original post

deepesh1 · ‎10-07-2015

Could be a real bug. What is HDP/hive version you are using?

sluangsay · ‎10-07-2015

Could you copy-paste the queries in 2) and 3) ?

You cannot create external table with CTAS (https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-CreateTableAsSelect(CTAS) ) but I am not sure this is the error you have. So pasting the queries would help.

Thanks.

deepesh1 · ‎10-07-2015