Created on 06-01-2021 08:02 PM - edited 06-02-2021 05:33 AM
Note: Cloudera does not support Antivirus software of any kind.
This article contains recommendations for excluding CDP components and directories from AV scans and monitoring. It is important to note that these recommendations do not apply to each service, and further, some services will have additional items to exclude that are unique to them. These details will be addressed in individual articles dedicated to the service in question.
The three primary locations you will want to exclude from Antivirus are:
In general, consider excluding the following directories and all of their subdirectories:
/opt/cloudera
/etc/<component>
/var/lib/<component>
/var/run/<component>
/var/log/<component>
/var/tmp/<component>
/tmp/<component>
Note: The <component> does not only refer to the service name, as a given service may have multiple daemons with their own directories.
Example: cloudera-scm-agent and cloudera-scm-server
Across CDP services, there are also many user-configurable locations. Most of these can be found in Cloudera Manager properties with names like "service.scratch.dir" and "service.data.dir"; go to Cloudera Manager > Service > Configuration and search for any property containing "dir", all of which may be considered for exclusion. Instructions for specific services follow:
Note: Cloudera Manager has a special requirement in the form of a user-configurable database. I recommend you to exclude this database. However, the details of this database are set on installation; the database may be co-located with cloudera-scm-server, or on a remote host. Consult with your database administrators for details on the path where the database information is stored.
Consider excluding the following directories and all of their subdirectories:
/opt/cloudera/cm
/opt/cloudera/cm-agent /etc/cloudera-scm-agent /etc/cloudera-scm-server /var/lib/cloudera-host-monitor
/var/lib/cloudera-scm-agent
/var/lib/cloudera-scm-eventserver
/var/lib/cloudera-scm-server
/var/lib/cloudera-scm-server-db
/var/lib/cloudera-service-monitor
/var/run/cloudera-scm-agent /var/run/cloudera-scm-server
/var/run/cloudera-scm-server-db /var/log/cloudera-scm-agent
/var/log/cloudera-scm-alertpublisher
/var/log/cloudera-scm-eventserver
/var/log/cloudera-scm-firehose
/var/log/cloudera-scm-server
Note: The directories in HDFS are user-configurable. I recommend you exclude these, especially the data directory for the DataNode and the meta directories for the NameNode and JournalNode. These details can be found in the hdfs-site.xml file:
# grep -A1 "dir" /etc/hadoop/conf/hdfs-site.xml
Consider excluding the following directories and all of their subdirectories:
/opt/cloudera /var/lib/hadoop-hdfs
/var/log/hadoop-hdfs
/tmp/hadoop-hdfs
Note: HDFS, YARN, MapReduce, and ZooKeeper are mutually interdependent and you are likely to experience unsatisfactory results if you fail to exclude the other components.
Note: The directories YARN uses are user-configurable. I recommend you exclude them. These properties can be found in Cloudera Manager > YARN > Configuration:
yarn.nodemanager.local-dirs yarn.nodemanager.log-dirs yarn.nodemanager.recovery.dir yarn.timeline-service.leveldb-state-store.path yarn.timeline-service.leveldb-timeline-store.path
Consider excluding the following directories and all of their subdirectories:
/opt/cloudera /var/lib/hadoop-yarn
/var/log/hadoop-yarn
Note: HDFS, YARN, MapReduce, and ZooKeeper are mutually interdependent and you are likely to experience unsatisfactory results if you fail to also exclude the other components.
Note: Some directories in MapReduce are user-configurable. I recommend you exclude them. These properties can be found in Cloudera Manager > MapReduce > Configs, and this one, in particular, should be excluded:
mapreduce.jobhistory.recovery.store.leveldb.path
Consider excluding the following directories and all of their subdirectories:
/opt/cloudera /var/lib/hadoop-mapreduce
/var/log/hadoop-mapreduce
Note: HDFS, YARN, MapReduce, and ZooKeeper are mutually interdependent and you are likely to experience unsatisfactory results if you fail to also exclude the other components.
Note: ZooKeeper has a user-configurable data directory. I recommend you exclude it. This directory can be found by running the following command:
# grep dataDir /etc/zookeeper/conf/zoo.cfg
Consider excluding the following directories and all of their subdirectories:
/opt/cloudera
/var/log/zookeeper
Note: HDFS, YARN, MapReduce, and ZooKeeper are mutually interdependent and you are likely to experience unsatisfactory results if you fail to also exclude the other components.