03-27-2017 07:40 AM
My soure is coming as *.tgz. I would like to extract CSV files which are in it and store to HDFS row by row as a plain text. Could you please advise?
03-27-2017 07:09 PM
You can use spooldir flume source to store tgz files and untar them and then process them using hive sink and those tables can be store in hdfs
Have a look at these docs:
http://henning.kropponline.de/2015/05/19/hivesink-for-flume/
https://flume.apache.org/FlumeUserGuide.html#hive-sink
03-29-2017 01:47 AM
Hi,
let me be more precise. The input are files *.csv.gz (I was wrong with tgz)
so I have to somehow unzip file from *.csv.gz to *.csv and store in hdfs.
I am sorry I am not an expert so I didnt get the point of those links oyu sent me.