Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Flume exec tail -F

Highlighted

Flume exec tail -F

Explorer

I am seeing something odd with a simple flume agent.  The agent launches without error and creates the first file to begin collecting the events.  

 

  • -rw-r--r-- 3 choa choa 10226 2014-06-02 13:11 /user/choa/dev/philips_flume_exec/FlumeData.1401729107940.txt.tmp

 

but then the size of the file in HDFS never changes even though I can see the source file growing.

 

The second I stop the agent with [Ctrl + C], the file size in HDFS reflects the file growth.

 

  • -rw-r--r-- 3 choa choa 2406667 2014-06-02 13:40 /user/choa/dev/philips_flume_exec/FlumeData.1401729107940.txt

 

In earlier testing, I could see the file in HDFS grow in size or the idleTimeout parameter would create a new file in 10 minutes with no activity.

 

There are no errors and the events are being collected but they do not appear to be written to HDFS.  Additionally, the idleTimeout is being ignored.

 

Any thiughts on what might be hapening ?

See any errors in this conf file ? 

 

Thanks for any help.

 

CentOS - 6.5

CDH 5

 

 

 

agent-3.channels = ch-3
agent-3.sources = src-3
agent-3.sinks = sink-3

agent-3.channels.ch-3.type = memory
# must be > sink batch size
agent-3.channels.ch-3.capacity = 1000
# must match sink batchSize
agent-3.channels.ch-3.transactionCapacity = 200

agent-3.sources.src-3.type = exec
agent-3.sources.src-3.command = tail -F /mnt/windows/ebiz/devinerb_tohost.txt
agent-3.sources.src-3.channels = ch-3
agent-3.sources.src-3.inputCharset = ISO-8859-1
agent-3.sources.src-3.outputCharset = UTF-8
agent-3.sources.src-3.decodeErrorPolicy = FAIL

agent-3.sinks.sink-3.channel = ch-3
agent-3.sinks.sink-3.type = hdfs
agent-3.sinks.sink-3.hdfs.writeFormat = Text
agent-3.sinks.sink-3.hdfs.fileType = DataStream
agent-3.sinks.sink-3.hdfs.fileSuffix = .txt
agent-3.sinks.sink-3.hdfs.rollCount = 0
agent-3.sinks.sink-3.hdfs.rollInterval = 0
agent-3.sinks.sink-3.hdfs.idleTimeout = 600
# create new file at 1 GB (not sure this is necessary)
agent-3.sinks.sink-3.hdfs.rollSize = 10000000
# must match source transactionCapacity
agent-3.sinks.sink-3.hdfs.batchSize = 200
agent-3.sinks.sink-3.hdfs.path = /user/choa/dev/philips_flume_exec

 

 

1 REPLY 1
Highlighted

Re: Flume exec tail -F

Explorer

After some testing with Cloudera Help, the appropriate conf file looks like this.   It appears that the initial channel capacity was too low and hdfs could not keep up with the restarted event load.  also added the restart and restartThrottle parameters.

This has been running for several days now with no errors on testing.

 

agent-3.channels = ch-3
agent-3.sources = src-3
agent-3.sinks = sink-3

agent-3.channels.ch-3.type = memory
# must be > sink batch size
agent-3.channels.ch-3.capacity = 50000
# must match sink batchSize
agent-3.channels.ch-3.transactionCapacity = 200

agent-3.sources.src-3.type = exec
agent-3.sources.src-3.command = tail -F /path/to/file.txt
agent-3.sources.src-3.channels = ch-3
agent-3.sources.src-3.inputCharset = ISO-8859-1
agent-3.sources.src-3.outputCharset = UTF-8
agent-3.sources.src-3.decodeErrorPolicy = FAIL
agent-3.sources.src-3.logStdErr = true
agent-3.sources.src-3.restart = true
agent-3.sources.src-3.restartThrottle = 1000

agent-3.sinks.sink-3.channel = ch-3
agent-3.sinks.sink-3.type = hdfs
agent-3.sinks.sink-3.hdfs.writeFormat = Text
agent-3.sinks.sink-3.hdfs.fileType = DataStream
agent-3.sinks.sink-3.hdfs.fileSuffix = .txt
agent-3.sinks.sink-3.hdfs.rollCount = 0
agent-3.sinks.sink-3.hdfs.rollInterval = 0
agent-3.sinks.sink-3.hdfs.idleTimeout = 600
agent-3.sinks.sink-3.hdfs.rollSize = 10000000
# must match source transactionCapacity
agent-3.sinks.sink-3.hdfs.batchSize = 200
agent-3.sinks.sink-3.hdfs.path = /path/in/hdfs

 

Don't have an account?
Coming from Hortonworks? Activate your account here