Member since
02-08-2017
3
Posts
0
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1744 | 02-15-2017 04:00 PM |
10-17-2017
03:57 PM
Hi @Raf Mohammed, did you come up with a solution to your question? I have a similar situation and wondered where you landed. Thank you.
... View more
02-15-2017
04:00 PM
Thank you - turns out for us, I had mistakenly started up a second flume instance which somehow was colliding with the first. User error.
... View more
02-08-2017
11:31 PM
I have read several other threads on Flume .tmp files but our configuration is different. Here is a snapshot of a few files - you'll see the .tmp file and final file are there - but in different sizes. I am unclear why the .tmp files remain and why they are different in size from the final file. Sometimes tmp file is larger, sometimes smaller -rw-r--r-- hdfs hdfs 79.86 MB 3 128 MB FlumeData.1486544400005 -rw-r--r-- hdfs hdfs 81.45 MB 3 128 MB FlumeData.1486544400005.tmp -rw-r--r-- hdfs hdfs 81.38 MB 3 128 MB FlumeData.1486544400006 -rw-r--r-- hdfs hdfs 80.73 MB 3 128 MB FlumeData.1486544400006.tmp Configuration snippet - we tried a few different combinations and hit on this as a way to avoid time outs and network constraints - i see some of my notes didn't seem to remain consistent with the actual set values... TwitterAgent.sinks.HDFS.channel = MemChannel TwitterAgent.sinks.HDFS.type = hdfs TwitterAgent.sinks.HDFS.hdfs.path = hdfs://[name node serverr]:8020/data/tweets/%Y/%m/%d/%H/ TwitterAgent.sinks.HDFS.hdfs.fileType = DataStream TwitterAgent.sinks.HDFS.hdfs.writeFormat = Text TwitterAgent.sinks.HDFS.hdfs.batchSize = 10000 # Set rollsize to 126 MB to be slightly less than the block size, 132120576 TwitterAgent.sinks.HDFS.hdfs.rollSize = 132120576 # Set rollcount to 0 so that it does not roll on number of events, but size of sink file TwitterAgent.sinks.HDFS.hdfs.rollCount = 20000 # Added rollInterval to 06 minutes (3600 seconds) to cap out the time interval the data is in memory TwitterAgent.sinks.HDFS.hdfs.rollInterval = 3600 TwitterAgent.sinks.HDFS.hdfs.useLocalTimeStamp = true TwitterAgent.sinks.HDFS.hdfs.appendTimeout = 10000 TwitterAgent.sinks.HDFS.hdfs.callTimeout = 60000 TwitterAgent.sinks.HDFS.hdfs.threadsPoolSize = 100 TwitterAgent.channels.MemChannel.type = memory TwitterAgent.channels.MemChannel.capacity = 10000 # Increased transactionCapacity to 1000 from 100 to see if this solves the memory problem TwitterAgent.channels.MemChannel.transactionCapacity = 10000 , I have read several other threads on Flume .tmp files but our configuration is different. Here is a snapshot of a few files - you'll see the .tmp file and final file are there - but in different sizes. I am unclear why the .tmp files remain and why they are different in size from the final file. Sometimes tmp file is larger, sometimes smaller -rw-r--r-- hdfs hdfs 79.86 MB 3 128 MB FlumeData.1486544400005 -rw-r--r-- hdfs hdfs 81.45 MB 3 128 MB FlumeData.1486544400005.tmp -rw-r--r-- hdfs hdfs 81.38 MB 3 128 MB FlumeData.1486544400006 -rw-r--r-- hdfs hdfs 80.73 MB 3 128 MB FlumeData.1486544400006.tmp Configuration snippet - we tried a few different combinations and hit on this as a way to avoid time outs and network constraints - i see some of my notes didn't seem to remain consistent with the actual set values... TwitterAgent.sinks.HDFS.channel = MemChannel TwitterAgent.sinks.HDFS.type = hdfs TwitterAgent.sinks.HDFS.hdfs.path = hdfs://[name node serverr]:8020/data/tweets/%Y/%m/%d/%H/ TwitterAgent.sinks.HDFS.hdfs.fileType = DataStream TwitterAgent.sinks.HDFS.hdfs.writeFormat = Text TwitterAgent.sinks.HDFS.hdfs.batchSize = 10000
# Set rollsize to 126 MB to be slightly less than the block size, 132120576 TwitterAgent.sinks.HDFS.hdfs.rollSize = 132120576
# Set rollcount to 0 so that it does not roll on number of events, but size of sink file TwitterAgent.sinks.HDFS.hdfs.rollCount = 20000
# Added rollInterval to 06 minutes (3600 seconds) to cap out the time interval the data is in memory TwitterAgent.sinks.HDFS.hdfs.rollInterval = 3600 TwitterAgent.sinks.HDFS.hdfs.useLocalTimeStamp = true TwitterAgent.sinks.HDFS.hdfs.appendTimeout = 10000 TwitterAgent.sinks.HDFS.hdfs.callTimeout = 60000 TwitterAgent.sinks.HDFS.hdfs.threadsPoolSize = 100 TwitterAgent.channels.MemChannel.type = memory TwitterAgent.channels.MemChannel.capacity = 10000
# Increased transactionCapacity to 1000 from 100 to see if this solves the memory problem TwitterAgent.channels.MemChannel.transactionCapacity = 10000
... View more
Labels:
- Labels:
-
Apache Flume