Support Questions

Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Data Analytics (CDA) group hub.

Load .Zip files to hive

New Contributor

Hi Team,

I want to load compressed .zip files (which has many csv files ex:abc.csv,def.csv...xyz.csv ) to hive table , when i tried using below query ..all the column values are loaded null. But if i use gzip compression and compress all the csv file to a .gz file and load ..data is loading fine. So can't we load zip files directly to hive ?

Query:

CREATE EXTERNAL TABLE `table_zip`( `v1` string, `v2` string, `v3` string, `v4` string, `v5` string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION '/data/test/'

2 REPLIES 2

ZIP files are not splittable and not a default hadoop input format.

You need an appropriate input format, see http://cutler.io/2012/07/hadoop-processing-zip-files-in-mapreduce/

I used it to load ZIP files with Spark (https://github.com/bernhard-42/spark-unzip)

... and be careful, since it is not splittable, every zipfile will be read by exactly one mapper (low parallelism)

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.