Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Flume is leaving .tmp files,Flume leaving .tmp files in place

avatar
New Contributor

I have read several other threads on Flume .tmp files but our configuration is different. Here is a snapshot of a few files - you'll see the .tmp file and final file are there - but in different sizes. I am unclear why the .tmp files remain and why they are different in size from the final file. Sometimes tmp file is larger, sometimes smaller

-rw-r--r--hdfshdfs79.86 MB3128 MBFlumeData.1486544400005
-rw-r--r--hdfshdfs81.45 MB3128 MBFlumeData.1486544400005.tmp
-rw-r--r--hdfshdfs81.38 MB3128 MBFlumeData.1486544400006
-rw-r--r--hdfshdfs80.73 MB3128 MB

FlumeData.1486544400006.tmp

Configuration snippet - we tried a few different combinations and hit on this as a way to avoid time outs and network constraints - i see some of my notes didn't seem to remain consistent with the actual set values...

TwitterAgent.sinks.HDFS.channel = MemChannel

TwitterAgent.sinks.HDFS.type = hdfs

TwitterAgent.sinks.HDFS.hdfs.path = hdfs://[name node serverr]:8020/data/tweets/%Y/%m/%d/%H/

TwitterAgent.sinks.HDFS.hdfs.fileType = DataStream

TwitterAgent.sinks.HDFS.hdfs.writeFormat = Text

TwitterAgent.sinks.HDFS.hdfs.batchSize = 10000 # Set rollsize to 126 MB to be slightly less than the block size, 132120576

TwitterAgent.sinks.HDFS.hdfs.rollSize = 132120576 # Set rollcount to 0 so that it does not roll on number of events, but size of sink file

TwitterAgent.sinks.HDFS.hdfs.rollCount = 20000 # Added rollInterval to 06 minutes (3600 seconds) to cap out the time interval the data is in memory

TwitterAgent.sinks.HDFS.hdfs.rollInterval = 3600

TwitterAgent.sinks.HDFS.hdfs.useLocalTimeStamp = true

TwitterAgent.sinks.HDFS.hdfs.appendTimeout = 10000

TwitterAgent.sinks.HDFS.hdfs.callTimeout = 60000

TwitterAgent.sinks.HDFS.hdfs.threadsPoolSize = 100

TwitterAgent.channels.MemChannel.type = memory

TwitterAgent.channels.MemChannel.capacity = 10000 # Increased transactionCapacity to 1000 from 100 to see if this solves the memory problem

TwitterAgent.channels.MemChannel.transactionCapacity = 10000

,

I have read several other threads on Flume .tmp files but our configuration is different. Here is a snapshot of a few files - you'll see the .tmp file and final file are there - but in different sizes. I am unclear why the .tmp files remain and why they are different in size from the final file. Sometimes tmp file is larger, sometimes smaller

-rw-r--r--hdfshdfs79.86 MB3128 MBFlumeData.1486544400005
-rw-r--r--hdfshdfs81.45 MB3128 MBFlumeData.1486544400005.tmp
-rw-r--r--hdfshdfs81.38 MB3128 MBFlumeData.1486544400006
-rw-r--r--hdfshdfs80.73 MB3128 MB

FlumeData.1486544400006.tmp

Configuration snippet - we tried a few different combinations and hit on this as a way to avoid time outs and network constraints - i see some of my notes didn't seem to remain consistent with the actual set values...

TwitterAgent.sinks.HDFS.channel = MemChannel

TwitterAgent.sinks.HDFS.type = hdfs

TwitterAgent.sinks.HDFS.hdfs.path = hdfs://[name node serverr]:8020/data/tweets/%Y/%m/%d/%H/

TwitterAgent.sinks.HDFS.hdfs.fileType = DataStream

TwitterAgent.sinks.HDFS.hdfs.writeFormat = Text

TwitterAgent.sinks.HDFS.hdfs.batchSize = 10000 # Set rollsize to 126 MB to be slightly less than the block size, 132120576

TwitterAgent.sinks.HDFS.hdfs.rollSize = 132120576 # Set rollcount to 0 so that it does not roll on number of events, but size of sink file

TwitterAgent.sinks.HDFS.hdfs.rollCount = 20000 # Added rollInterval to 06 minutes (3600 seconds) to cap out the time interval the data is in memory

TwitterAgent.sinks.HDFS.hdfs.rollInterval = 3600

TwitterAgent.sinks.HDFS.hdfs.useLocalTimeStamp = true

TwitterAgent.sinks.HDFS.hdfs.appendTimeout = 10000

TwitterAgent.sinks.HDFS.hdfs.callTimeout = 60000

TwitterAgent.sinks.HDFS.hdfs.threadsPoolSize = 100

TwitterAgent.channels.MemChannel.type = memory

TwitterAgent.channels.MemChannel.capacity = 10000 # Increased transactionCapacity to 1000 from 100 to see if this solves the memory problem

TwitterAgent.channels.MemChannel.transactionCapacity = 10000

1 ACCEPTED SOLUTION

avatar
New Contributor

Thank you - turns out for us, I had mistakenly started up a second flume instance which somehow was colliding with the first. User error.

View solution in original post

2 REPLIES 2

avatar
Super Collaborator

@Cord thomas

Turn on debug logging and check the log file first

avatar
New Contributor

Thank you - turns out for us, I had mistakenly started up a second flume instance which somehow was colliding with the first. User error.