Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

How to append the streaming log data into an hdfs file in Flume? Does anyone have the MR source code to append the data to a file in hdfs

How to append the streaming log data into an hdfs file in Flume? Does anyone have the MR source code to append the data to a file in hdfs

New Contributor

Hi All,

I am trying to load data from local file system to HDFS using Flume. The streaming data is copied to local folder every millisecond. The data will be appended to existing file in the local folder and it should be reflected in HDFS using Flume. But the data is getting duplicated in the Sink. Without overwriting the existing file, the streaming data needs to be appended to existing file in HDFS using Flume.

Please do the needful.

# flume configuration for loading text files # Define a source, a channel, and a sink a1.sources = src1 a1.channels = chan1 a1.sinks = sink1 # Set the source type to Spooling Directory and set the directory # location to /home/flume/ingestion/ a1.sources.src1.type = spooldir a1.sources.src1.spoolDir = /home/flume/ingestion/ a1.sources.src1.basenameHeader = true # Configure the channel as simple in-memory queue a1.channels.chan1.type = memory a1.channels.chan1.capacity = 1000 # Define the HDFS sink and set its path to your target HDFS directory a1.sinks.sink1.type = hdfs a1.sinks.sink1.hdfs.path = hdfs://big.example.com:8020/user/flume/stage a1.sinks.sink1.hdfs.fileType = DataStream # Disable rollover functionallity as we want to keep the original files a1.sinks.sink1.rollCount = 0 a1.sinks.sink1.rollInterval = 0 a1.sinks.sink1.rollSize = 0 a1.sinks.sink1.idleTimeout = 0 # Set the files to their original name a1.sinks.sink1.hdfs.filePrefix = %{basename} # Connect source and sink a1.sources.src1.channels = chan1 a1.sinks.sink1.channel = chan1.

Please do the needful.

4 REPLIES 4

Re: How to append the streaming log data into an hdfs file in Flume? Does anyone have the MR source code to append the data to a file in hdfs

Rising Star

I don't see you are configuring Flume transactions, is there a reason for this?

Ideally how I would approach this problem is to use transactions and keep them big enough to not write micro files on HDFS, then keep rolling files per transaction.

Transaction size could be for e.g. 1000 (depends on the size of data your are trying to write)

Re: How to append the streaming log data into an hdfs file in Flume? Does anyone have the MR source code to append the data to a file in hdfs

New Contributor

Hi Sharma,

@ambud.sharma

I am trying to append the streaming log data into existing file in hdfs using Flume and I am using Tail Dir as Source which will append data to an existing file in HDFS. These are the configuration that i have used. File contains time stamp inside whether we can able to pull the data based on that time stamp.

Data will loaded to local folder continuously i.e to One File the incoming data will be appended continuously (every millisecond)

Flume Configuration for Text File

agent.sources = tailSrc agent.channels = memoryChannel agent.sinks = hdfsSink agent.sources.tailSrc.command = tail -f /home/testing/hdp/hadoop-2.7.3/logs/hadoop-testing-namenode-hadoop.log agent.sources.tailSrc.channels = memoryChannel agent.sinks.hdfsSink.type = hdfs agent.sinks.hdfsSink.hdfs.path =/flume/livedata/%y-%m-%d/%H/%M/%S agent.sinks.hdfsSink.useLocalTimeStamp = true agent.sinks.hdfsSink.channel = memoryChannel agent.channels.memoryChannel.type = memory agent.channels.memoryChannel.capacity = 100.

If any mistakes are there, Please help me with the new configuration that are need to be placed.

Re: How to append the streaming log data into an hdfs file in Flume? Does anyone have the MR source code to append the data to a file in hdfs

Expert Contributor

Hi Magesh,

By default Flume doesn't overwrite data existed in HDFS directory. You can find the updated data with timestamp.

Re: How to append the streaming log data into an hdfs file in Flume? Does anyone have the MR source code to append the data to a file in hdfs

New Contributor

Hi Mahesh,

@Mahesh Mallikarjunappa

I have the time stamp inside the file, then how could i use it to append the new records (new lines) to existing file in HDFS using Flume. Please do the needful.