<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Flume HDFS Sink - File Roll Settings not Working in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Flume-HDFS-Sink-File-Roll-Settings-not-Working/m-p/44217#M38411</link>
    <description>&lt;P&gt;yes, thanks for the reply!&amp;nbsp; I figured out the same thing earlier today as I went back to the Flume User Guide and started copying and pasting the properties in again...&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;When I reviewed my config initiall, i didn't look before the attribute name to even see I was missing "hdfs".&amp;nbsp; Definitely an ID10T and PEBKAC error. &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks for keeping me honest!&lt;/P&gt;</description>
    <pubDate>Tue, 23 Aug 2016 00:43:02 GMT</pubDate>
    <dc:creator>tseader</dc:creator>
    <dc:date>2016-08-23T00:43:02Z</dc:date>
    <item>
      <title>Flume HDFS Sink - File Roll Settings not Working</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Flume-HDFS-Sink-File-Roll-Settings-not-Working/m-p/44149#M38409</link>
      <description>&lt;P&gt;Problem: When ingesting avro event data from Kafka, the HDFS Sink keeps rolling files when they are very small (hundreds of bytes), despite my Flume configuration. I have made the proper configuration settings I believe, and I'm at a bit of a loss.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Flume Config:&lt;/P&gt;&lt;PRE&gt;a1.channels = ch-1
a1.sources = src-1
a1.sinks = snk-1

a1.sources.src-1.type = org.apache.flume.source.kafka.KafkaSource
a1.sources.src-1.channels = ch-1
a1.sources.src-1.zookeeperConnect = &amp;lt;OMITTED&amp;gt;
a1.sources.src-1.topic = aTopic
a1.sources.src-1.groupID = aTopic

#Inject the Schema into the header so the AvroEventSerializer can pick it up
a1.sources.src-1.interceptors=i1
a1.sources.src-1.interceptors.i1.type = static
a1.sources.src-1.interceptors.i1.key=flume.avro.schema.url
a1.sources.src-1.interceptors.i1.value=hdfs://aNameService/data/schema/simpleSchema.avsc


a1.channels.ch-1.type = memory


a1.sinks.snk-1.type = hdfs
a1.sinks.snk-1.channel = ch-1
a1.sinks.snk-1.hdfs.path = /data/table
a1.sinks.snk-1.hdfs.filePrefix = events
a1.sinks.snk-1.hdfs.fileSuffix = .avro
a1.sinks.snk-1.hdfs.rollInterval = 0
#Expecting 100MB files before rolling
a1.sinks.snk-1.hdfs.rollSize = 100000000
a1.sinks.snk-1.rollCount = 0
a1.sinks.snk-1.hdfs.batchSize = 1000
a1.sinks.snk-1.hdfs.fileType = DataStream
a1.sinks.snk-1.serializer = org.apache.flume.sink.hdfs.AvroEventSerializer$Builder

&lt;/PRE&gt;&lt;P&gt;I'll also note that I tried adding other configuration settings that didn't help and I omitted any of them from this config to improve clarity. I also saw that the resolution for some people was to check the replication factor as that is a determining factor in the BucketWriter - I am receiving no errors in the logs relating to under replication.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Lastly, I am executing this from the command line and not through Cloudera Manager.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks for any help&lt;/P&gt;</description>
      <pubDate>Fri, 16 Sep 2022 10:35:46 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Flume-HDFS-Sink-File-Roll-Settings-not-Working/m-p/44149#M38409</guid>
      <dc:creator>tseader</dc:creator>
      <dc:date>2022-09-16T10:35:46Z</dc:date>
    </item>
    <item>
      <title>Re: Flume HDFS Sink - File Roll Settings not Working</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Flume-HDFS-Sink-File-Roll-Settings-not-Working/m-p/44205#M38410</link>
      <description>&lt;P&gt;This line is missing the hdfs prefix:&lt;/P&gt;&lt;PRE&gt;a1.sinks.snk-1.rollCount = 0&lt;/PRE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;It should be:&lt;/P&gt;&lt;PRE&gt;a1.sinks.snk-1.hdfs.rollCount = 0&lt;/PRE&gt;&lt;P&gt;Otherwise all your files will contain 10 events, which is the default hdfs.rollCount.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;-pd&lt;/P&gt;</description>
      <pubDate>Mon, 22 Aug 2016 16:28:52 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Flume-HDFS-Sink-File-Roll-Settings-not-Working/m-p/44205#M38410</guid>
      <dc:creator>pdvorak</dc:creator>
      <dc:date>2016-08-22T16:28:52Z</dc:date>
    </item>
    <item>
      <title>Re: Flume HDFS Sink - File Roll Settings not Working</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Flume-HDFS-Sink-File-Roll-Settings-not-Working/m-p/44217#M38411</link>
      <description>&lt;P&gt;yes, thanks for the reply!&amp;nbsp; I figured out the same thing earlier today as I went back to the Flume User Guide and started copying and pasting the properties in again...&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;When I reviewed my config initiall, i didn't look before the attribute name to even see I was missing "hdfs".&amp;nbsp; Definitely an ID10T and PEBKAC error. &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks for keeping me honest!&lt;/P&gt;</description>
      <pubDate>Tue, 23 Aug 2016 00:43:02 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Flume-HDFS-Sink-File-Roll-Settings-not-Working/m-p/44217#M38411</guid>
      <dc:creator>tseader</dc:creator>
      <dc:date>2016-08-23T00:43:02Z</dc:date>
    </item>
  </channel>
</rss>

