<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Flume is leaving .tmp files,Flume leaving .tmp files in place in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Flume-is-leaving-tmp-files-Flume-leaving-tmp-files-in-place/m-p/167161#M53846</link>
    <description>&lt;P&gt;Thank you  - turns out for us, I had mistakenly started up a second flume instance which somehow was colliding with the first.  User error.  &lt;/P&gt;</description>
    <pubDate>Thu, 16 Feb 2017 00:00:37 GMT</pubDate>
    <dc:creator>cthomas1</dc:creator>
    <dc:date>2017-02-16T00:00:37Z</dc:date>
    <item>
      <title>Flume is leaving .tmp files,Flume leaving .tmp files in place</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Flume-is-leaving-tmp-files-Flume-leaving-tmp-files-in-place/m-p/167159#M53844</link>
      <description>&lt;P&gt;I have read several other threads on Flume .tmp files but our configuration is different.  Here is a snapshot of a few files - you'll see the .tmp file and final file are there - but in different sizes.  I am unclear why the .tmp files remain and why they are different in size from the final file.  Sometimes tmp file is larger, sometimes smaller&lt;/P&gt;&lt;TABLE&gt;&lt;TBODY&gt;&lt;TR&gt;&lt;TD&gt;-rw-r--r--&lt;/TD&gt;&lt;TD&gt;hdfs&lt;/TD&gt;&lt;TD&gt;hdfs&lt;/TD&gt;&lt;TD&gt;79.86 MB&lt;/TD&gt;&lt;TD&gt;3&lt;/TD&gt;&lt;TD&gt;128 MB&lt;/TD&gt;&lt;TD&gt;FlumeData.1486544400005&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;-rw-r--r--&lt;/TD&gt;&lt;TD&gt;hdfs&lt;/TD&gt;&lt;TD&gt;hdfs&lt;/TD&gt;&lt;TD&gt;81.45 MB&lt;/TD&gt;&lt;TD&gt;3&lt;/TD&gt;&lt;TD&gt;128 MB&lt;/TD&gt;&lt;TD&gt;FlumeData.1486544400005.tmp&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;-rw-r--r--&lt;/TD&gt;&lt;TD&gt;hdfs&lt;/TD&gt;&lt;TD&gt;hdfs&lt;/TD&gt;&lt;TD&gt;81.38 MB&lt;/TD&gt;&lt;TD&gt;3&lt;/TD&gt;&lt;TD&gt;128 MB&lt;/TD&gt;&lt;TD&gt;FlumeData.1486544400006&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;-rw-r--r--&lt;/TD&gt;&lt;TD&gt;hdfs&lt;/TD&gt;&lt;TD&gt;hdfs&lt;/TD&gt;&lt;TD&gt;80.73 MB&lt;/TD&gt;&lt;TD&gt;3&lt;/TD&gt;&lt;TD&gt;128 MB&lt;/TD&gt;&lt;TD&gt;&lt;P&gt;FlumeData.1486544400006.tmp&lt;/P&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;&lt;P&gt;Configuration snippet - we tried a few different combinations and hit on this as a way to avoid time outs and network constraints - i see some of my notes didn't seem to remain consistent with the actual set values...&lt;/P&gt;&lt;P&gt;TwitterAgent.sinks.HDFS.channel = MemChannel &lt;/P&gt;&lt;P&gt;TwitterAgent.sinks.HDFS.type = hdfs &lt;/P&gt;&lt;P&gt;TwitterAgent.sinks.HDFS.hdfs.path = hdfs://[name node serverr]:8020/data/tweets/%Y/%m/%d/%H/ &lt;/P&gt;&lt;P&gt;TwitterAgent.sinks.HDFS.hdfs.fileType = DataStream &lt;/P&gt;&lt;P&gt;TwitterAgent.sinks.HDFS.hdfs.writeFormat = Text &lt;/P&gt;&lt;P&gt;TwitterAgent.sinks.HDFS.hdfs.batchSize = 10000 # Set rollsize to 126 MB to be slightly less than the block size, 132120576 &lt;/P&gt;&lt;P&gt;TwitterAgent.sinks.HDFS.hdfs.rollSize = 132120576 # Set rollcount to 0 so that it does not roll on number of events, but size of sink file &lt;/P&gt;&lt;P&gt;TwitterAgent.sinks.HDFS.hdfs.rollCount = 20000 # Added rollInterval to 06 minutes (3600 seconds) to cap out the time interval the data is in memory &lt;/P&gt;&lt;P&gt;TwitterAgent.sinks.HDFS.hdfs.rollInterval = 3600 &lt;/P&gt;&lt;P&gt;TwitterAgent.sinks.HDFS.hdfs.useLocalTimeStamp = true &lt;/P&gt;&lt;P&gt;TwitterAgent.sinks.HDFS.hdfs.appendTimeout = 10000 &lt;/P&gt;&lt;P&gt;TwitterAgent.sinks.HDFS.hdfs.callTimeout = 60000 &lt;/P&gt;&lt;P&gt;TwitterAgent.sinks.HDFS.hdfs.threadsPoolSize = 100 &lt;/P&gt;&lt;P&gt;TwitterAgent.channels.MemChannel.type = memory &lt;/P&gt;&lt;P&gt;TwitterAgent.channels.MemChannel.capacity = 10000 # Increased transactionCapacity to 1000 from 100 to see if this solves the memory problem &lt;/P&gt;&lt;P&gt;TwitterAgent.channels.MemChannel.transactionCapacity = 10000&lt;/P&gt;,&lt;P&gt;I have read several other threads on Flume .tmp files but our configuration is different.  Here is a snapshot of a few files - you'll see the .tmp file and final file are there - but in different sizes.  I am unclear why the .tmp files remain and why they are different in size from the final file.  Sometimes tmp file is larger, sometimes smaller&lt;/P&gt;&lt;TABLE&gt;&lt;TBODY&gt;&lt;TR&gt;&lt;TD&gt;-rw-r--r--&lt;/TD&gt;&lt;TD&gt;hdfs&lt;/TD&gt;&lt;TD&gt;hdfs&lt;/TD&gt;&lt;TD&gt;79.86 MB&lt;/TD&gt;&lt;TD&gt;3&lt;/TD&gt;&lt;TD&gt;128 MB&lt;/TD&gt;&lt;TD&gt;FlumeData.1486544400005&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;-rw-r--r--&lt;/TD&gt;&lt;TD&gt;hdfs&lt;/TD&gt;&lt;TD&gt;hdfs&lt;/TD&gt;&lt;TD&gt;81.45 MB&lt;/TD&gt;&lt;TD&gt;3&lt;/TD&gt;&lt;TD&gt;128 MB&lt;/TD&gt;&lt;TD&gt;FlumeData.1486544400005.tmp&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;-rw-r--r--&lt;/TD&gt;&lt;TD&gt;hdfs&lt;/TD&gt;&lt;TD&gt;hdfs&lt;/TD&gt;&lt;TD&gt;81.38 MB&lt;/TD&gt;&lt;TD&gt;3&lt;/TD&gt;&lt;TD&gt;128 MB&lt;/TD&gt;&lt;TD&gt;FlumeData.1486544400006&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;-rw-r--r--&lt;/TD&gt;&lt;TD&gt;hdfs&lt;/TD&gt;&lt;TD&gt;hdfs&lt;/TD&gt;&lt;TD&gt;80.73 MB&lt;/TD&gt;&lt;TD&gt;3&lt;/TD&gt;&lt;TD&gt;128 MB&lt;/TD&gt;&lt;TD&gt;&lt;P&gt;FlumeData.1486544400006.tmp&lt;/P&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;&lt;P&gt;Configuration snippet - we tried a few different combinations and hit on this as a way to avoid time outs and network constraints - i see some of my notes didn't seem to remain consistent with the actual set values...&lt;/P&gt;&lt;P&gt;TwitterAgent.sinks.HDFS.channel = MemChannel &lt;/P&gt;&lt;P&gt;TwitterAgent.sinks.HDFS.type = hdfs &lt;/P&gt;&lt;P&gt;TwitterAgent.sinks.HDFS.hdfs.path = hdfs://[name node serverr]:8020/data/tweets/%Y/%m/%d/%H/ &lt;/P&gt;&lt;P&gt;TwitterAgent.sinks.HDFS.hdfs.fileType = DataStream &lt;/P&gt;&lt;P&gt;TwitterAgent.sinks.HDFS.hdfs.writeFormat = Text &lt;/P&gt;&lt;P&gt;TwitterAgent.sinks.HDFS.hdfs.batchSize = 10000
# Set rollsize to 126 MB to be slightly less than the block size, 132120576 &lt;/P&gt;&lt;P&gt;TwitterAgent.sinks.HDFS.hdfs.rollSize = 132120576
# Set rollcount to 0 so that it does not roll on number of events, but size of sink file &lt;/P&gt;&lt;P&gt;TwitterAgent.sinks.HDFS.hdfs.rollCount = 20000
# Added rollInterval to 06 minutes (3600 seconds) to cap out the time interval the data is in memory &lt;/P&gt;&lt;P&gt;TwitterAgent.sinks.HDFS.hdfs.rollInterval = 3600 &lt;/P&gt;&lt;P&gt;TwitterAgent.sinks.HDFS.hdfs.useLocalTimeStamp = true &lt;/P&gt;&lt;P&gt;TwitterAgent.sinks.HDFS.hdfs.appendTimeout = 10000 &lt;/P&gt;&lt;P&gt;TwitterAgent.sinks.HDFS.hdfs.callTimeout = 60000 &lt;/P&gt;&lt;P&gt;TwitterAgent.sinks.HDFS.hdfs.threadsPoolSize = 100 &lt;/P&gt;&lt;P&gt;TwitterAgent.channels.MemChannel.type = memory &lt;/P&gt;&lt;P&gt;TwitterAgent.channels.MemChannel.capacity = 10000
# Increased transactionCapacity to 1000 from 100 to see if this solves the memory problem &lt;/P&gt;&lt;P&gt;TwitterAgent.channels.MemChannel.transactionCapacity = 10000&lt;/P&gt;</description>
      <pubDate>Thu, 09 Feb 2017 07:31:41 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Flume-is-leaving-tmp-files-Flume-leaving-tmp-files-in-place/m-p/167159#M53844</guid>
      <dc:creator>cthomas1</dc:creator>
      <dc:date>2017-02-09T07:31:41Z</dc:date>
    </item>
    <item>
      <title>Re: Flume is leaving .tmp files,Flume leaving .tmp files in place</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Flume-is-leaving-tmp-files-Flume-leaving-tmp-files-in-place/m-p/167160#M53845</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/15924/cthomas1.html" nodeid="15924"&gt;@Cord thomas
&lt;/A&gt;&lt;/P&gt;&lt;P&gt;Turn on debug logging and check the log file first
&lt;A rel="user" href="https://community.cloudera.com/users/15924/cthomas1.html" nodeid="15924"&gt;&lt;/A&gt; &lt;/P&gt;</description>
      <pubDate>Wed, 15 Feb 2017 23:52:01 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Flume-is-leaving-tmp-files-Flume-leaving-tmp-files-in-place/m-p/167160#M53845</guid>
      <dc:creator>bluesmix</dc:creator>
      <dc:date>2017-02-15T23:52:01Z</dc:date>
    </item>
    <item>
      <title>Re: Flume is leaving .tmp files,Flume leaving .tmp files in place</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Flume-is-leaving-tmp-files-Flume-leaving-tmp-files-in-place/m-p/167161#M53846</link>
      <description>&lt;P&gt;Thank you  - turns out for us, I had mistakenly started up a second flume instance which somehow was colliding with the first.  User error.  &lt;/P&gt;</description>
      <pubDate>Thu, 16 Feb 2017 00:00:37 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Flume-is-leaving-tmp-files-Flume-leaving-tmp-files-in-place/m-p/167161#M53846</guid>
      <dc:creator>cthomas1</dc:creator>
      <dc:date>2017-02-16T00:00:37Z</dc:date>
    </item>
  </channel>
</rss>

