Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Load .Zip files to hive

avatar
New Contributor

Hi Team,

I want to load compressed .zip files (which has many csv files ex:abc.csv,def.csv...xyz.csv ) to hive table , when i tried using below query ..all the column values are loaded null. But if i use gzip compression and compress all the csv file to a .gz file and load ..data is loading fine. So can't we load zip files directly to hive ?

Query:

CREATE EXTERNAL TABLE `table_zip`( `v1` string, `v2` string, `v3` string, `v4` string, `v5` string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION '/data/test/'

2 REPLIES 2

avatar

ZIP files are not splittable and not a default hadoop input format.

You need an appropriate input format, see http://cutler.io/2012/07/hadoop-processing-zip-files-in-mapreduce/

I used it to load ZIP files with Spark (https://github.com/bernhard-42/spark-unzip)

avatar

... and be careful, since it is not splittable, every zipfile will be read by exactly one mapper (low parallelism)