Created 02-08-2017 11:31 PM
I have read several other threads on Flume .tmp files but our configuration is different. Here is a snapshot of a few files - you'll see the .tmp file and final file are there - but in different sizes. I am unclear why the .tmp files remain and why they are different in size from the final file. Sometimes tmp file is larger, sometimes smaller
-rw-r--r-- | hdfs | hdfs | 79.86 MB | 3 | 128 MB | FlumeData.1486544400005 |
-rw-r--r-- | hdfs | hdfs | 81.45 MB | 3 | 128 MB | FlumeData.1486544400005.tmp |
-rw-r--r-- | hdfs | hdfs | 81.38 MB | 3 | 128 MB | FlumeData.1486544400006 |
-rw-r--r-- | hdfs | hdfs | 80.73 MB | 3 | 128 MB | FlumeData.1486544400006.tmp |
Configuration snippet - we tried a few different combinations and hit on this as a way to avoid time outs and network constraints - i see some of my notes didn't seem to remain consistent with the actual set values...
TwitterAgent.sinks.HDFS.channel = MemChannel
TwitterAgent.sinks.HDFS.type = hdfs
TwitterAgent.sinks.HDFS.hdfs.path = hdfs://[name node serverr]:8020/data/tweets/%Y/%m/%d/%H/
TwitterAgent.sinks.HDFS.hdfs.fileType = DataStream
TwitterAgent.sinks.HDFS.hdfs.writeFormat = Text
TwitterAgent.sinks.HDFS.hdfs.batchSize = 10000 # Set rollsize to 126 MB to be slightly less than the block size, 132120576
TwitterAgent.sinks.HDFS.hdfs.rollSize = 132120576 # Set rollcount to 0 so that it does not roll on number of events, but size of sink file
TwitterAgent.sinks.HDFS.hdfs.rollCount = 20000 # Added rollInterval to 06 minutes (3600 seconds) to cap out the time interval the data is in memory
TwitterAgent.sinks.HDFS.hdfs.rollInterval = 3600
TwitterAgent.sinks.HDFS.hdfs.useLocalTimeStamp = true
TwitterAgent.sinks.HDFS.hdfs.appendTimeout = 10000
TwitterAgent.sinks.HDFS.hdfs.callTimeout = 60000
TwitterAgent.sinks.HDFS.hdfs.threadsPoolSize = 100
TwitterAgent.channels.MemChannel.type = memory
TwitterAgent.channels.MemChannel.capacity = 10000 # Increased transactionCapacity to 1000 from 100 to see if this solves the memory problem
TwitterAgent.channels.MemChannel.transactionCapacity = 10000
,I have read several other threads on Flume .tmp files but our configuration is different. Here is a snapshot of a few files - you'll see the .tmp file and final file are there - but in different sizes. I am unclear why the .tmp files remain and why they are different in size from the final file. Sometimes tmp file is larger, sometimes smaller
-rw-r--r-- | hdfs | hdfs | 79.86 MB | 3 | 128 MB | FlumeData.1486544400005 |
-rw-r--r-- | hdfs | hdfs | 81.45 MB | 3 | 128 MB | FlumeData.1486544400005.tmp |
-rw-r--r-- | hdfs | hdfs | 81.38 MB | 3 | 128 MB | FlumeData.1486544400006 |
-rw-r--r-- | hdfs | hdfs | 80.73 MB | 3 | 128 MB | FlumeData.1486544400006.tmp |
Configuration snippet - we tried a few different combinations and hit on this as a way to avoid time outs and network constraints - i see some of my notes didn't seem to remain consistent with the actual set values...
TwitterAgent.sinks.HDFS.channel = MemChannel
TwitterAgent.sinks.HDFS.type = hdfs
TwitterAgent.sinks.HDFS.hdfs.path = hdfs://[name node serverr]:8020/data/tweets/%Y/%m/%d/%H/
TwitterAgent.sinks.HDFS.hdfs.fileType = DataStream
TwitterAgent.sinks.HDFS.hdfs.writeFormat = Text
TwitterAgent.sinks.HDFS.hdfs.batchSize = 10000 # Set rollsize to 126 MB to be slightly less than the block size, 132120576
TwitterAgent.sinks.HDFS.hdfs.rollSize = 132120576 # Set rollcount to 0 so that it does not roll on number of events, but size of sink file
TwitterAgent.sinks.HDFS.hdfs.rollCount = 20000 # Added rollInterval to 06 minutes (3600 seconds) to cap out the time interval the data is in memory
TwitterAgent.sinks.HDFS.hdfs.rollInterval = 3600
TwitterAgent.sinks.HDFS.hdfs.useLocalTimeStamp = true
TwitterAgent.sinks.HDFS.hdfs.appendTimeout = 10000
TwitterAgent.sinks.HDFS.hdfs.callTimeout = 60000
TwitterAgent.sinks.HDFS.hdfs.threadsPoolSize = 100
TwitterAgent.channels.MemChannel.type = memory
TwitterAgent.channels.MemChannel.capacity = 10000 # Increased transactionCapacity to 1000 from 100 to see if this solves the memory problem
TwitterAgent.channels.MemChannel.transactionCapacity = 10000
Created 02-15-2017 04:00 PM
Thank you - turns out for us, I had mistakenly started up a second flume instance which somehow was colliding with the first. User error.
Created 02-15-2017 03:52 PM
Created 02-15-2017 04:00 PM
Thank you - turns out for us, I had mistakenly started up a second flume instance which somehow was colliding with the first. User error.