- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Load .Zip files to hive
- Labels:
-
Apache Hive
Created ‎11-21-2016 11:32 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Team,
I want to load compressed .zip files (which has many csv files ex:abc.csv,def.csv...xyz.csv ) to hive table , when i tried using below query ..all the column values are loaded null. But if i use gzip compression and compress all the csv file to a .gz file and load ..data is loading fine. So can't we load zip files directly to hive ?
Query:
CREATE EXTERNAL TABLE `table_zip`( `v1` string, `v2` string, `v3` string, `v4` string, `v5` string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION '/data/test/'
Created ‎11-21-2016 11:58 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
ZIP files are not splittable and not a default hadoop input format.
You need an appropriate input format, see http://cutler.io/2012/07/hadoop-processing-zip-files-in-mapreduce/
I used it to load ZIP files with Spark (https://github.com/bernhard-42/spark-unzip)
Created ‎11-21-2016 11:59 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
... and be careful, since it is not splittable, every zipfile will be read by exactly one mapper (low parallelism)
