Support Questions
Find answers, ask questions, and share your expertise

While using Exec source getting duplicates


While using Exec source getting duplicates

New Contributor

Hi, Guys


           I'm trying to stream my log files from diff web folders using flume to process and write to hdfs. Flume agent running fine but as i read in flume userguide exec source will tail file.when i start agent it reading what ever records available in logs after that it become idle mode if any new records return to log its not reading any records, after we save that log exec source reading log from first so we getting duplicates is ther any possibility to avoid there any way to read new log entries.


 Here My config File:


agent.sources = localsource
agent.channels = memoryChannel
agent.sinks = avro_Sink

agent.sources.localsource.type=exec = /bin/bash -c
agent.sources.localsource.command = tail -F /home/admin1/teja/Flumedata/Behaviourlog
#agent.sources.localsource.fileHeader = true

# The channel can be defined as follows.
agent.sources.localsource.channels = memoryChannel

#Specify the channel the sink should use = memoryChannel

# Each channel's type is defined.
agent.channels.memoryChannel.type = memory

# In this case, it specifies the capacity of the memory channel
agent.channels.memoryChannel.capacity = 10000
agent.channels.memoryChannel.transactionCapacity = 1000

# Each sink's type must be defined
agent.sinks.avro_Sink.type = avro
agent.sinks.avro_Sink.port= 8021
agent.sinks.avro_Sink.avro.batchSize = 100
agent.sinks.avro_Sink.avro.rollCount = 0
agent.sinks.avro_Sink.avro.rollSize = 143060835
agent.sinks.avro_Sink.avro.rollInterval = 0


agent.sources.localsource.interceptors = search-replace regex-filter = search_replace
# Remove leading alphanumeric characters in an event body. ###|##|# |

agent.sources.localsource.interceptors.regex-filter.type = regex_filter
###Remove full event body.
agent.sources.localsource.interceptors.regex-filter.regex = .*(pagenotfound.php).*
agent.sources.localsource.interceptors.regex-filter.excludeEvents = true


Please Help.


and I tried TailDir source in new flume version in this source im getting same duplicates problem, is flume wont read instantly if any data return to log .


Here My tailDir source config:

agent.sources.localsource.type = TAILDIR
agent.sources.localsource.positionFile = /home/admin1/teja/flume/taildir_position.json
agent.sources.localsource.filegroups = f1
agent.sources.localsource.filegroups.f1 = /home/admin1/teja/Flumedata/Behaviourlog
agent.sources.localsource.batchSize = 20
agent.sources.localsource.fileHeader = true




Re: While using Exec source getting duplicates


You may want to derecrese the millis in restartThrottle to see if its picking up the new log entriess .

Also add this parameter in your flume.config  - 

restart	 false	 Whether the executed cmd should be restarted if it dies

beacause if the above parameter is not set or false , restartThrottle has no effect . 

The source expects the command to continously procduce data and ingests its output

the Excec source is asynchronous in nature , there is a possibility of data loss if agent dies .

if data loss is something you want to avoid , you can use spoolingdir Source.