Reply
Explorer
Posts: 8
Registered: ‎02-21-2017

How to extract an incoming stream coming as .TGZ

My soure is coming as *.tgz. I would like to extract CSV files which are in it and store to HDFS row by row as a plain text. Could you please advise?

Cloudera Employee
Posts: 20
Registered: ‎08-22-2014

Re: How to extract an incoming stream coming as .TGZ

You can use spooldir  flume source to store tgz files and untar them and then process them using hive sink and those tables can be store in hdfs

 

Have a look at these docs:

 

http://henning.kropponline.de/2015/05/19/hivesink-for-flume/

 

https://flume.apache.org/FlumeUserGuide.html#hive-sink

Highlighted
Explorer
Posts: 8
Registered: ‎02-21-2017

Re: How to extract an incoming stream coming as .TGZ

Hi,

let me be more precise. The input are files *.csv.gz (I was wrong with tgz)

so I have to somehow unzip file from *.csv.gz to *.csv and store in hdfs. 

 

I am sorry I am not an expert so I didnt get the point of those links oyu sent me.

Announcements