Support Questions
Find answers, ask questions, and share your expertise
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Flume : Data ingestion and load in to HDFS


Flume : Data ingestion and load in to HDFS


HI all,

I want to pull data from the SFTP server and load into HDFS.

Scenario 1:

I configured Agent as in, Source as `SFTP server`, SInk as `HDFS` and channel as `File-Channel`. But in this scenario, HDFS sink creates too many small files. I have searched for that. But I didn't get any specific solution for that. so I thought the second scenario.

Scenario 2:

In this scenario, I have configured two flume agent.

Agent1: Source >>> SFTP Server, Sink >>> file_roll , Channel >>> file-channel

Agent2: Source >>> SpoolDir, Sink >>> HDFS, Channel >>> file-channel

So first of all, I've configured one flume agent named Agent1 which will load data into Hadoop local from SFTP server. Now here the second step, I have configured the second agent name Agent2 which will load data from Hadoop local and store into HDFS. But whenever I started the second agent the main file renamed as .COMPLETE extension. And I am running twos agent simultaneously. Agent1 will load all data into one file (here rollcountInterval = 0) but Agent2 cannot load data from that particular file.

Can anybody help me out for this block?



Re: Flume : Data ingestion and load in to HDFS


@JAy PaTel

Can you share your flume config?