<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Isolation between Flume Channels? in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Isolation-between-Flume-Channels/m-p/32657#M8118</link>
    <description>&lt;P&gt;&lt;SPAN&gt;CDH5.2 installed with Cloudera Manager and Parcels&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Are Flume Channels isolated with each other? It seems when I have problem with a channel, other channel is affected.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I want to record and process Syslog data with Flume using 2 Channel+Sink (channels are replicating) as follows:&lt;/P&gt;&lt;OL&gt;&lt;LI&gt;&lt;SPAN&gt;Memory Channel + HDFSSink (hdfschannel+hdfssink) to write raw Syslog records to HDFS&lt;/SPAN&gt;&lt;/LI&gt;&lt;LI&gt;&lt;SPAN&gt;Optional File Channel + Avro Sink (avrochannel+avrosink) to send the Syslog records to Spark Streaming to further process. Since the processing can be reproduced using raw data, the Avro channel is optional.&lt;/SPAN&gt;&lt;/LI&gt;&lt;/OL&gt;&lt;P&gt;When Spark Streaming is running, the above works well. Data were handled by both sink correctly.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;However, when&amp;nbsp;the Spark Streaming job hanged / stopped, the avrochannnel had network related exceptions and ChannelFullException. This is understandable because the events could not be sent. The problem was that the amount of raw data logged by hdfschannel+hdfssink became around 1-2% of normal condition.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Is this expected? I don't understand why error with an optional channel affect others.&lt;BR /&gt;(Note: the use of File Channel was historical. But this seems not the cause of the behaviour anyway?)&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Fri, 16 Sep 2022 09:42:44 GMT</pubDate>
    <dc:creator>athtsang</dc:creator>
    <dc:date>2022-09-16T09:42:44Z</dc:date>
    <item>
      <title>Isolation between Flume Channels?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Isolation-between-Flume-Channels/m-p/32657#M8118</link>
      <description>&lt;P&gt;&lt;SPAN&gt;CDH5.2 installed with Cloudera Manager and Parcels&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Are Flume Channels isolated with each other? It seems when I have problem with a channel, other channel is affected.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I want to record and process Syslog data with Flume using 2 Channel+Sink (channels are replicating) as follows:&lt;/P&gt;&lt;OL&gt;&lt;LI&gt;&lt;SPAN&gt;Memory Channel + HDFSSink (hdfschannel+hdfssink) to write raw Syslog records to HDFS&lt;/SPAN&gt;&lt;/LI&gt;&lt;LI&gt;&lt;SPAN&gt;Optional File Channel + Avro Sink (avrochannel+avrosink) to send the Syslog records to Spark Streaming to further process. Since the processing can be reproduced using raw data, the Avro channel is optional.&lt;/SPAN&gt;&lt;/LI&gt;&lt;/OL&gt;&lt;P&gt;When Spark Streaming is running, the above works well. Data were handled by both sink correctly.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;However, when&amp;nbsp;the Spark Streaming job hanged / stopped, the avrochannnel had network related exceptions and ChannelFullException. This is understandable because the events could not be sent. The problem was that the amount of raw data logged by hdfschannel+hdfssink became around 1-2% of normal condition.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Is this expected? I don't understand why error with an optional channel affect others.&lt;BR /&gt;(Note: the use of File Channel was historical. But this seems not the cause of the behaviour anyway?)&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 16 Sep 2022 09:42:44 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Isolation-between-Flume-Channels/m-p/32657#M8118</guid>
      <dc:creator>athtsang</dc:creator>
      <dc:date>2022-09-16T09:42:44Z</dc:date>
    </item>
    <item>
      <title>Re: Isolation between Flume Channels?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Isolation-between-Flume-Channels/m-p/35867#M8119</link>
      <description>&lt;P&gt;Replying myself. I worked around this with Sink Groups and a Null Sink.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Relevant settings in flume.conf&lt;/P&gt;&lt;PRE&gt;a1.sinks = hdfssink avrosink nullsink

a1.sinkgroups = avrosinkgroup
a1.sinkgroups.avrosinkgroup.sinks = avrosink nullsink
a1.sinkgroups.avrosinkgroup.processor.type = failover
a1.sinkgroups.avrosinkgroup.processor.priority.avrosink = 100
a1.sinkgroups.avrosinkgroup.processor.priority.nullsink = 10

a1.sinks.nullsink.type = null
a1.sinks.nullsink.channel = avrochannel
a1.sinks.nullsink.batchsize = 10000&lt;/PRE&gt;&lt;P&gt;&amp;nbsp;&lt;BR /&gt;The end result is that avrochannel use the high priority avrosink (priority=100) normally. If this sink fails, it failover to the low prioirty nullsink, which simply&amp;nbsp;discard the events.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;PS:&lt;/P&gt;&lt;OL&gt;&lt;LI&gt;&lt;SPAN&gt;Upgraded to CDH5.5.1, which bundles Flume 1.6&lt;/SPAN&gt;&lt;/LI&gt;&lt;LI&gt;&lt;SPAN&gt;This works with Spark Streaming "Flume-style Push-based Approach" (sink type=avro), but not "Pull-based Approach using a Custom Sink" (sink type=org.apache.spark.streaming.flume.sink.SparkSink). Guess&amp;nbsp;the custom sink refuse to admit fail because of fault-tolerance guarantees. Reference:&amp;nbsp;&lt;/SPAN&gt;&lt;A href="http://spark.apache.org/docs/latest/streaming-flume-integration.html" target="_blank"&gt;http://spark.apache.org/docs/latest/streaming-flume-integration.html&lt;/A&gt;&lt;/LI&gt;&lt;/OL&gt;</description>
      <pubDate>Wed, 06 Jan 2016 03:27:32 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Isolation-between-Flume-Channels/m-p/35867#M8119</guid>
      <dc:creator>athtsang</dc:creator>
      <dc:date>2016-01-06T03:27:32Z</dc:date>
    </item>
  </channel>
</rss>

