My soure is coming as *.tgz. I would like to extract CSV files which are in it and store to HDFS row by row as a plain text. Could you please advise?
You can use spooldir flume source to store tgz files and untar them and then process them using hive sink and those tables can be store in hdfs
Have a look at these docs:
let me be more precise. The input are files *.csv.gz (I was wrong with tgz)
so I have to somehow unzip file from *.csv.gz to *.csv and store in hdfs.
I am sorry I am not an expert so I didnt get the point of those links oyu sent me.