Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

How to extract an incoming stream coming as .TGZ

Highlighted

How to extract an incoming stream coming as .TGZ

Explorer

My soure is coming as *.tgz. I would like to extract CSV files which are in it and store to HDFS row by row as a plain text. Could you please advise?

2 REPLIES 2

Re: How to extract an incoming stream coming as .TGZ

Contributor

You can use spooldir  flume source to store tgz files and untar them and then process them using hive sink and those tables can be store in hdfs

 

Have a look at these docs:

 

http://henning.kropponline.de/2015/05/19/hivesink-for-flume/

 

https://flume.apache.org/FlumeUserGuide.html#hive-sink

Re: How to extract an incoming stream coming as .TGZ

Explorer

Hi,

let me be more precise. The input are files *.csv.gz (I was wrong with tgz)

so I have to somehow unzip file from *.csv.gz to *.csv and store in hdfs. 

 

I am sorry I am not an expert so I didnt get the point of those links oyu sent me.