I want to pull data from the SFTP server and load into HDFS.
I configured Agent as in, Source as `SFTP server`, SInk as `HDFS` and channel as `File-Channel`. But in this scenario, HDFS sink creates too many small files. I have searched for that. But I didn't get any specific solution for that. so I thought the second scenario.
In this scenario, I have configured two flume agent.
Agent1: Source >>> SFTP Server, Sink >>> file_roll , Channel >>> file-channel
Agent2: Source >>> SpoolDir, Sink >>> HDFS, Channel >>> file-channel
So first of all, I've configured one flume agent named Agent1 which will load data into Hadoop local from SFTP server. Now here the second step, I have configured the second agent name Agent2 which will load data from Hadoop local and store into HDFS. But whenever I started the second agent the main file renamed as .COMPLETE extension. And I am running twos agent simultaneously. Agent1 will load all data into one file (here rollcountInterval = 0) but Agent2 cannot load data from that particular file.
Can anybody help me out for this block?