Created 09-22-2018 01:11 AM
here is my sqoop command .
sqoop job -Dmapreduce.job.user.classpath.first=true --create incjob2 -- import --connect "jdbc:oracle:thin:@(description=(address=(protocol=tcp)(host=patronQA)(port=1526))(connect_data=(service_name=patron)))" --username PATRON --incremental append --check-column INSERT_TIME --table PATRON.UFM -split-by UFM.UFMID --hcatalog-storage-stanza "stored as orcfile" --compression-codec snappy --target-dir /user/sami
here is my create external table command
CREATE EXTERNAL TABLE IF NOT EXISTS ufm_orc ( .. .. ) STORED AS ORC location '/user/sami'
here is the error , as you can see both table input and output format is ORC
SerDe Library: org.apache.hadoop.hive.ql.io.orc.OrcSerde InputFormat: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat OutputFormat: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat Compressed: No Num Buckets: -1 Bucket Columns: [] Sort Columns: [] Storage Desc Params: serialization.format 1 Time taken: 0.495 seconds, Fetched: 217 row(s) > select ufmid,insert_time from ufm_orc limit 10; OK Failed with exception java.io.IOException:org.apache.hadoop.hive.ql.io.FileFormatException: Malformed ORC file hdfs://hadoop1.tolls.dot.state.fl.us:8020/user/sami/part-m-00000.snappy. Invalid postscript. Time taken: 0.328 seconds
Created 09-22-2018 05:28 AM
The sqoop output is generating a orc snappy file and the hive table you have created is a orc table without any compression.
Do create a table with compression type snappy.
CREATE TABLE mytable (...) STORED AS orc tblproperties ("orc.compress"="SNAPPY");
Created 09-22-2018 05:28 AM
The sqoop output is generating a orc snappy file and the hive table you have created is a orc table without any compression.
Do create a table with compression type snappy.
CREATE TABLE mytable (...) STORED AS orc tblproperties ("orc.compress"="SNAPPY");