Support Questions

Find answers, ask questions, and share your expertise

Flume is leaving .tmp files,Flume leaving .tmp files in place

avatar
New Contributor

I have read several other threads on Flume .tmp files but our configuration is different. Here is a snapshot of a few files - you'll see the .tmp file and final file are there - but in different sizes. I am unclear why the .tmp files remain and why they are different in size from the final file. Sometimes tmp file is larger, sometimes smaller

-rw-r--r--hdfshdfs79.86 MB3128 MBFlumeData.1486544400005
-rw-r--r--hdfshdfs81.45 MB3128 MBFlumeData.1486544400005.tmp
-rw-r--r--hdfshdfs81.38 MB3128 MBFlumeData.1486544400006
-rw-r--r--hdfshdfs80.73 MB3128 MB

FlumeData.1486544400006.tmp

Configuration snippet - we tried a few different combinations and hit on this as a way to avoid time outs and network constraints - i see some of my notes didn't seem to remain consistent with the actual set values...

TwitterAgent.sinks.HDFS.channel = MemChannel

TwitterAgent.sinks.HDFS.type = hdfs

TwitterAgent.sinks.HDFS.hdfs.path = hdfs://[name node serverr]:8020/data/tweets/%Y/%m/%d/%H/

TwitterAgent.sinks.HDFS.hdfs.fileType = DataStream

TwitterAgent.sinks.HDFS.hdfs.writeFormat = Text

TwitterAgent.sinks.HDFS.hdfs.batchSize = 10000 # Set rollsize to 126 MB to be slightly less than the block size, 132120576

TwitterAgent.sinks.HDFS.hdfs.rollSize = 132120576 # Set rollcount to 0 so that it does not roll on number of events, but size of sink file

TwitterAgent.sinks.HDFS.hdfs.rollCount = 20000 # Added rollInterval to 06 minutes (3600 seconds) to cap out the time interval the data is in memory

TwitterAgent.sinks.HDFS.hdfs.rollInterval = 3600

TwitterAgent.sinks.HDFS.hdfs.useLocalTimeStamp = true

TwitterAgent.sinks.HDFS.hdfs.appendTimeout = 10000

TwitterAgent.sinks.HDFS.hdfs.callTimeout = 60000

TwitterAgent.sinks.HDFS.hdfs.threadsPoolSize = 100

TwitterAgent.channels.MemChannel.type = memory

TwitterAgent.channels.MemChannel.capacity = 10000 # Increased transactionCapacity to 1000 from 100 to see if this solves the memory problem

TwitterAgent.channels.MemChannel.transactionCapacity = 10000

,

I have read several other threads on Flume .tmp files but our configuration is different. Here is a snapshot of a few files - you'll see the .tmp file and final file are there - but in different sizes. I am unclear why the .tmp files remain and why they are different in size from the final file. Sometimes tmp file is larger, sometimes smaller

-rw-r--r--hdfshdfs79.86 MB3128 MBFlumeData.1486544400005
-rw-r--r--hdfshdfs81.45 MB3128 MBFlumeData.1486544400005.tmp
-rw-r--r--hdfshdfs81.38 MB3128 MBFlumeData.1486544400006
-rw-r--r--hdfshdfs80.73 MB3128 MB

FlumeData.1486544400006.tmp

Configuration snippet - we tried a few different combinations and hit on this as a way to avoid time outs and network constraints - i see some of my notes didn't seem to remain consistent with the actual set values...

TwitterAgent.sinks.HDFS.channel = MemChannel

TwitterAgent.sinks.HDFS.type = hdfs

TwitterAgent.sinks.HDFS.hdfs.path = hdfs://[name node serverr]:8020/data/tweets/%Y/%m/%d/%H/

TwitterAgent.sinks.HDFS.hdfs.fileType = DataStream

TwitterAgent.sinks.HDFS.hdfs.writeFormat = Text

TwitterAgent.sinks.HDFS.hdfs.batchSize = 10000 # Set rollsize to 126 MB to be slightly less than the block size, 132120576

TwitterAgent.sinks.HDFS.hdfs.rollSize = 132120576 # Set rollcount to 0 so that it does not roll on number of events, but size of sink file

TwitterAgent.sinks.HDFS.hdfs.rollCount = 20000 # Added rollInterval to 06 minutes (3600 seconds) to cap out the time interval the data is in memory

TwitterAgent.sinks.HDFS.hdfs.rollInterval = 3600

TwitterAgent.sinks.HDFS.hdfs.useLocalTimeStamp = true

TwitterAgent.sinks.HDFS.hdfs.appendTimeout = 10000

TwitterAgent.sinks.HDFS.hdfs.callTimeout = 60000

TwitterAgent.sinks.HDFS.hdfs.threadsPoolSize = 100

TwitterAgent.channels.MemChannel.type = memory

TwitterAgent.channels.MemChannel.capacity = 10000 # Increased transactionCapacity to 1000 from 100 to see if this solves the memory problem

TwitterAgent.channels.MemChannel.transactionCapacity = 10000

1 ACCEPTED SOLUTION

avatar
New Contributor

Thank you - turns out for us, I had mistakenly started up a second flume instance which somehow was colliding with the first. User error.

View solution in original post

2 REPLIES 2

avatar
Super Collaborator

@Cord thomas

Turn on debug logging and check the log file first

avatar
New Contributor

Thank you - turns out for us, I had mistakenly started up a second flume instance which somehow was colliding with the first. User error.