Support Questions

Find answers, ask questions, and share your expertise
Check out our newest addition to the community, the Cloudera Data Analytics (CDA) group hub.

How to extract an incoming stream coming as .TGZ


My soure is coming as *.tgz. I would like to extract CSV files which are in it and store to HDFS row by row as a plain text. Could you please advise?



You can use spooldir  flume source to store tgz files and untar them and then process them using hive sink and those tables can be store in hdfs


Have a look at these docs:



let me be more precise. The input are files *.csv.gz (I was wrong with tgz)

so I have to somehow unzip file from *.csv.gz to *.csv and store in hdfs. 


I am sorry I am not an expert so I didnt get the point of those links oyu sent me.

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.