Support Questions

Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Data Analytics (CDA) group hub.

How to extract an incoming stream coming as .TGZ

Explorer

My soure is coming as *.tgz. I would like to extract CSV files which are in it and store to HDFS row by row as a plain text. Could you please advise?

2 REPLIES 2

Contributor

You can use spooldir  flume source to store tgz files and untar them and then process them using hive sink and those tables can be store in hdfs

 

Have a look at these docs:

 

http://henning.kropponline.de/2015/05/19/hivesink-for-flume/

 

https://flume.apache.org/FlumeUserGuide.html#hive-sink

Explorer

Hi,

let me be more precise. The input are files *.csv.gz (I was wrong with tgz)

so I have to somehow unzip file from *.csv.gz to *.csv and store in hdfs. 

 

I am sorry I am not an expert so I didnt get the point of those links oyu sent me.

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.