Created on 03-27-2017 07:40 AM - edited 09-16-2022 04:20 AM
My soure is coming as *.tgz. I would like to extract CSV files which are in it and store to HDFS row by row as a plain text. Could you please advise?
Created 03-27-2017 07:09 PM
You can use spooldir flume source to store tgz files and untar them and then process them using hive sink and those tables can be store in hdfs
Have a look at these docs:
http://henning.kropponline.de/2015/05/19/hivesink-for-flume/
https://flume.apache.org/FlumeUserGuide.html#hive-sink
Created 03-29-2017 01:47 AM
Hi,
let me be more precise. The input are files *.csv.gz (I was wrong with tgz)
so I have to somehow unzip file from *.csv.gz to *.csv and store in hdfs.
I am sorry I am not an expert so I didnt get the point of those links oyu sent me.