Support Questions

Find answers, ask questions, and share your expertise

malformed ORC file format

avatar
Super Collaborator

here is my sqoop command .

sqoop job -Dmapreduce.job.user.classpath.first=true --create incjob2  -- import --connect "jdbc:oracle:thin:@(description=(address=(protocol=tcp)(host=patronQA)(port=1526))(connect_data=(service_name=patron)))" --username PATRON  --incremental append --check-column INSERT_TIME --table PATRON.UFM -split-by UFM.UFMID  --hcatalog-storage-stanza "stored as orcfile" --compression-codec snappy  --target-dir /user/sami

here is my create external table command

CREATE EXTERNAL TABLE IF NOT EXISTS ufm_orc (
..
..
 )
STORED AS ORC location '/user/sami'

here is the error , as you can see both table input and output format is ORC

SerDe Library:          org.apache.hadoop.hive.ql.io.orc.OrcSerde
InputFormat:            org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
OutputFormat:           org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
Compressed:             No
Num Buckets:            -1
Bucket Columns:         []
Sort Columns:           []
Storage Desc Params:
        serialization.format    1
Time taken: 0.495 seconds, Fetched: 217 row(s)

    > select ufmid,insert_time from ufm_orc limit 10;
OK
Failed with exception java.io.IOException:org.apache.hadoop.hive.ql.io.FileFormatException: Malformed ORC file hdfs://hadoop1.tolls.dot.state.fl.us:8020/user/sami/part-m-00000.snappy. Invalid postscript.
Time taken: 0.328 seconds
1 ACCEPTED SOLUTION

avatar
@Sami Ahmad

The sqoop output is generating a orc snappy file and the hive table you have created is a orc table without any compression.

Do create a table with compression type snappy.

CREATE TABLE mytable (...) STORED AS orc tblproperties ("orc.compress"="SNAPPY");

View solution in original post

1 REPLY 1

avatar
@Sami Ahmad

The sqoop output is generating a orc snappy file and the hive table you have created is a orc table without any compression.

Do create a table with compression type snappy.

CREATE TABLE mytable (...) STORED AS orc tblproperties ("orc.compress"="SNAPPY");