- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Are there any recommendations or best practices for using Anti-virus with Hadoop servers?
- Labels:
-
Apache Hadoop
Created ‎09-29-2015 03:55 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Created ‎09-29-2015 06:31 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The best-practice is to avoid the use of active Anti-Virus (AV) systems that monitor access to the underlying disk systems being used for metadata storage by the following processes:
- Apache Hadoop
- HDFS Namenode
- HDFS Datanode
- YARN Resource Manager
- YARN Node Manager
- Apache Accumulo
- Apache Flume
- Apache HBase
- Apache Kafka
- Apache ZooKeeper
These processes store data structures only, and there is nothing stored by these processes that is executable by the underlying OS. As these processes can be very active, potentially performing continuous writes against large files, the best performance requires direct, unimpeded access to the underlying filesystem, and any AV system that traps filesystem calls will have a negative impact on Hadoop system performance.
Some sites choose to implement AV "scans" that run periodically (like a weekly scan) on clients, gateway and "edge node" systems where users & developers connect and run local processes. These scans do not interfere with cluster performance, but are important to safeguard the edge-connected systems that are the main clients of the cluster.
Created ‎09-29-2015 06:31 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The best-practice is to avoid the use of active Anti-Virus (AV) systems that monitor access to the underlying disk systems being used for metadata storage by the following processes:
- Apache Hadoop
- HDFS Namenode
- HDFS Datanode
- YARN Resource Manager
- YARN Node Manager
- Apache Accumulo
- Apache Flume
- Apache HBase
- Apache Kafka
- Apache ZooKeeper
These processes store data structures only, and there is nothing stored by these processes that is executable by the underlying OS. As these processes can be very active, potentially performing continuous writes against large files, the best performance requires direct, unimpeded access to the underlying filesystem, and any AV system that traps filesystem calls will have a negative impact on Hadoop system performance.
Some sites choose to implement AV "scans" that run periodically (like a weekly scan) on clients, gateway and "edge node" systems where users & developers connect and run local processes. These scans do not interfere with cluster performance, but are important to safeguard the edge-connected systems that are the main clients of the cluster.
Created ‎10-02-2015 03:56 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Just a note that YARN may need to execute things that are placed into its local cache on the NMs, its not purly a data storage. This is why you cant have directories that are YARN related mounted as NOEXEC in /etc/fstab...
Created ‎10-02-2015 04:12 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Sometimes, the requirement to have AV on the servers is unavoidable due to security policies that cannot be challenged. In that event, prepare for the need to add significantly more nodes, more memory and more cpus to get the same levels of performance.
