Community Articles

Find and share helpful community-sourced technical articles.
avatar
Expert Contributor

Note: Cloudera does not support antivirus software of any kind.

 

This article contains general recommendations for excluding HDFS components and directories from antivirus scans and monitoring.

 

The three primary locations you will want to exclude from antivirus are:

  1. Data directories: These can be very large, and therefore take a long time to scan; they can also be very write-heavy, and therefore suffer performance impacts or failures if the antivirus holds up writes.
  2. Log directories: These are write-heavy.
  3. Scratch directories: These are internal locations used by some services for writing temporary data, and can also cause performance impacts or failures if the antivirus holds up writes.

Note: The directories in HDFS are user-configurable. I recommend you exclude these, especially the data directory for the DataNode and the meta directories for the NameNode and JournalNode. These details can be found in the “hdfs-site.xml” file:

 

 

 

# grep -A1 "dir" /etc/hadoop/conf/hdfs-site.xml

 

 

 

Consider excluding the following directories and all of their subdirectories:

 

Installation, Configuration, and Libraries

 

 

/usr/hdp

/etc/hadoop

/var/lib/hadoop-hdfs

 

 

Runtime and Logging

 

 

/var/run/hadoop

/var/log/hadoop

 

 

Scratch and Temp

 

 

/tmp/hadoop-hdfs

 

 

Note: HDFS, YARN, MapReduce and ZooKeeper are mutually interdependent and you are likely to experience unsatisfactory results if you fail to exclude the other components.

1,346 Views