<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Are there any recommendations or best practices for using Anti-virus with Hadoop servers? in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Are-there-any-recommendations-or-best-practices-for-using/m-p/94498#M7815</link>
    <description>&lt;P&gt;The best-practice is to avoid the use of active Anti-Virus (AV) systems that monitor access to the underlying disk systems being used for metadata storage by the following processes:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;Apache Hadoop&lt;UL&gt;&lt;LI&gt;HDFS Namenode&lt;/LI&gt;&lt;LI&gt;HDFS Datanode&lt;/LI&gt;&lt;LI&gt;YARN Resource Manager&lt;/LI&gt;&lt;LI&gt;YARN Node Manager&lt;/LI&gt;&lt;/UL&gt;&lt;/LI&gt;&lt;LI&gt;Apache Accumulo&lt;/LI&gt;&lt;LI&gt;Apache Flume&lt;/LI&gt;&lt;LI&gt;Apache HBase&lt;/LI&gt;&lt;LI&gt;Apache Kafka&lt;/LI&gt;&lt;LI&gt;Apache ZooKeeper&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;These processes store data structures only, and there is nothing stored by these processes that is executable by the underlying OS.  As these processes can be very active, potentially performing continuous writes against large files, the best performance requires direct, unimpeded access to the underlying filesystem, and any AV system that traps filesystem calls will have a negative impact on Hadoop system performance.&lt;/P&gt;&lt;P&gt;Some sites choose to implement AV "scans" that run periodically (like a weekly scan) on clients, gateway and "edge node" systems where users &amp;amp; developers connect and run local processes.  These scans do not interfere with cluster performance, but are important to safeguard the edge-connected systems that are the main clients of the cluster.&lt;/P&gt;</description>
    <pubDate>Wed, 30 Sep 2015 01:31:36 GMT</pubDate>
    <dc:creator>dkaiser</dc:creator>
    <dc:date>2015-09-30T01:31:36Z</dc:date>
    <item>
      <title>Are there any recommendations or best practices for using Anti-virus with Hadoop servers?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Are-there-any-recommendations-or-best-practices-for-using/m-p/94497#M7814</link>
      <description />
      <pubDate>Tue, 29 Sep 2015 22:55:41 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Are-there-any-recommendations-or-best-practices-for-using/m-p/94497#M7814</guid>
      <dc:creator>cspencer</dc:creator>
      <dc:date>2015-09-29T22:55:41Z</dc:date>
    </item>
    <item>
      <title>Re: Are there any recommendations or best practices for using Anti-virus with Hadoop servers?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Are-there-any-recommendations-or-best-practices-for-using/m-p/94498#M7815</link>
      <description>&lt;P&gt;The best-practice is to avoid the use of active Anti-Virus (AV) systems that monitor access to the underlying disk systems being used for metadata storage by the following processes:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;Apache Hadoop&lt;UL&gt;&lt;LI&gt;HDFS Namenode&lt;/LI&gt;&lt;LI&gt;HDFS Datanode&lt;/LI&gt;&lt;LI&gt;YARN Resource Manager&lt;/LI&gt;&lt;LI&gt;YARN Node Manager&lt;/LI&gt;&lt;/UL&gt;&lt;/LI&gt;&lt;LI&gt;Apache Accumulo&lt;/LI&gt;&lt;LI&gt;Apache Flume&lt;/LI&gt;&lt;LI&gt;Apache HBase&lt;/LI&gt;&lt;LI&gt;Apache Kafka&lt;/LI&gt;&lt;LI&gt;Apache ZooKeeper&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;These processes store data structures only, and there is nothing stored by these processes that is executable by the underlying OS.  As these processes can be very active, potentially performing continuous writes against large files, the best performance requires direct, unimpeded access to the underlying filesystem, and any AV system that traps filesystem calls will have a negative impact on Hadoop system performance.&lt;/P&gt;&lt;P&gt;Some sites choose to implement AV "scans" that run periodically (like a weekly scan) on clients, gateway and "edge node" systems where users &amp;amp; developers connect and run local processes.  These scans do not interfere with cluster performance, but are important to safeguard the edge-connected systems that are the main clients of the cluster.&lt;/P&gt;</description>
      <pubDate>Wed, 30 Sep 2015 01:31:36 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Are-there-any-recommendations-or-best-practices-for-using/m-p/94498#M7815</guid>
      <dc:creator>dkaiser</dc:creator>
      <dc:date>2015-09-30T01:31:36Z</dc:date>
    </item>
    <item>
      <title>Re: Are there any recommendations or best practices for using Anti-virus with Hadoop servers?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Are-there-any-recommendations-or-best-practices-for-using/m-p/94499#M7816</link>
      <description>&lt;P&gt;Just a note that YARN may need to execute things that are placed into its local cache on the NMs, its not purly a data storage. This is why you cant have directories that are YARN related mounted as NOEXEC in /etc/fstab... &lt;/P&gt;</description>
      <pubDate>Fri, 02 Oct 2015 22:56:59 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Are-there-any-recommendations-or-best-practices-for-using/m-p/94499#M7816</guid>
      <dc:creator>jniemiec</dc:creator>
      <dc:date>2015-10-02T22:56:59Z</dc:date>
    </item>
    <item>
      <title>Re: Are there any recommendations or best practices for using Anti-virus with Hadoop servers?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Are-there-any-recommendations-or-best-practices-for-using/m-p/94500#M7817</link>
      <description>&lt;P&gt;Sometimes, the requirement to have AV on the servers is unavoidable due to security policies that cannot be challenged. In that event, prepare for the need to add significantly more nodes, more memory and more cpus to get the same levels of performance.&lt;/P&gt;</description>
      <pubDate>Fri, 02 Oct 2015 23:12:28 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Are-there-any-recommendations-or-best-practices-for-using/m-p/94500#M7817</guid>
      <dc:creator>drussell</dc:creator>
      <dc:date>2015-10-02T23:12:28Z</dc:date>
    </item>
  </channel>
</rss>

