<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Best Practice for Flume placement - data nodes vs dedicated nodes in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Best-Practice-for-Flume-placement-data-nodes-vs-dedicated/m-p/166699#M25022</link>
    <description>&lt;P&gt; We have 4 apps running Flume, and are experiencing performance issues and running out of file descriptors. We have 4 apps, running 4 instances each across 16 data nodes. They have approximate volumes:&lt;/P&gt;&lt;P&gt;App A  - 60 GB per month&lt;/P&gt;&lt;P&gt;App B - 150 KB per month&lt;/P&gt;&lt;P&gt;App C - 54 GB per day&lt;/P&gt;&lt;P&gt;App D - 330 GB per day&lt;/P&gt;&lt;P&gt;We have been advised to move these onto dedicated hosts (4 hosts running 1 agent for each app = 4 per node). My Questions are:&lt;/P&gt;&lt;P&gt;1. Is this a best practice for placement of Flume Agents?&lt;/P&gt;&lt;P&gt;2. With this cause downsides with data locality of HDFS files that are written out?&lt;/P&gt;</description>
    <pubDate>Wed, 13 Apr 2016 04:06:44 GMT</pubDate>
    <dc:creator>Jim_B</dc:creator>
    <dc:date>2016-04-13T04:06:44Z</dc:date>
    <item>
      <title>Best Practice for Flume placement - data nodes vs dedicated nodes</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Best-Practice-for-Flume-placement-data-nodes-vs-dedicated/m-p/166699#M25022</link>
      <description>&lt;P&gt; We have 4 apps running Flume, and are experiencing performance issues and running out of file descriptors. We have 4 apps, running 4 instances each across 16 data nodes. They have approximate volumes:&lt;/P&gt;&lt;P&gt;App A  - 60 GB per month&lt;/P&gt;&lt;P&gt;App B - 150 KB per month&lt;/P&gt;&lt;P&gt;App C - 54 GB per day&lt;/P&gt;&lt;P&gt;App D - 330 GB per day&lt;/P&gt;&lt;P&gt;We have been advised to move these onto dedicated hosts (4 hosts running 1 agent for each app = 4 per node). My Questions are:&lt;/P&gt;&lt;P&gt;1. Is this a best practice for placement of Flume Agents?&lt;/P&gt;&lt;P&gt;2. With this cause downsides with data locality of HDFS files that are written out?&lt;/P&gt;</description>
      <pubDate>Wed, 13 Apr 2016 04:06:44 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Best-Practice-for-Flume-placement-data-nodes-vs-dedicated/m-p/166699#M25022</guid>
      <dc:creator>Jim_B</dc:creator>
      <dc:date>2016-04-13T04:06:44Z</dc:date>
    </item>
    <item>
      <title>Re: Best Practice for Flume placement - data nodes vs dedicated nodes</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Best-Practice-for-Flume-placement-data-nodes-vs-dedicated/m-p/166700#M25023</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/2834/jbarnett.html" nodeid="2834"&gt;@jbarnett&lt;/A&gt;, (1) Yes, putting Flume on dedicated nodes is definitely the way to go. Both your Flume apps and your Data nodes will benefit from it, and you can scale Flume independently of the rest of the cluster. (2) Again, yes, there is a downside regarding HDFS locality but it's a small one in comparison to gains obtained by (1). And it only concerns HDFS sinks. Once you start using for example Kafka you will hava Kafka sinks and no concerns of that kind.&lt;/P&gt;</description>
      <pubDate>Wed, 13 Apr 2016 05:24:11 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Best-Practice-for-Flume-placement-data-nodes-vs-dedicated/m-p/166700#M25023</guid>
      <dc:creator>pminovic</dc:creator>
      <dc:date>2016-04-13T05:24:11Z</dc:date>
    </item>
  </channel>
</rss>

