Community Articles

Find and share helpful community-sourced technical articles.
Announcements
Celebrating as our community reaches 100,000 members! Thank you!
avatar
Expert Contributor

Note: Cloudera does not support Antivirus software of any kind.

This article contains recommendations for excluding CDH components and directories from AV scans and monitoring. It is important to note that these recommendations do not apply to each service, and further, some services will have additional items to exclude which are unique to them. These details will be addressed in individual articles dedicated to the service in question.

The three primary locations you will want to exclude from antivirus are:

  • Data directories: These can be very large, and therefore, take a long time to scan; they can also be very write-heavy, and therefore suffer performance impacts or failures if the AV holds up writes.
  • Log directories: These are write-heavy.
  • Scratch directories: These are internal locations used by some services for writing temporary data, and can also cause performance impacts or failures if the AV holds up writes.

In general, consider excluding the following directories and all of their subdirectories:

Installation, Configuration, and Libraries

/opt/cloudera
/etc/<component>
/var/lib/<component>

Runtime and Logging

/var/run/<component>
/var/log/<component>

Scratch and Temp

/var/tmp/<component>
/tmp/<component>

Note: The <component> does not only refer to the service name, as a given service may have multiple daemons with their own directories.

Example: cloudera-scm-server and cloudera-scm-agent.

Across CDH services, there are also many user-configurable locations. Most of these can be found in Cloudera Manager properties with names like service.scratch.dir and service.data.dir; go to CM > Service > Configurations and search for any property containing "dir", all of which may be considered for exclusion. Instructions for specific services follow:

Cloudera Manager:

Note: Cloudera Manager has a special requirement in the form of a user-configurable database. I recommend you exclude this database. However, the details of this database are set on installation; the database may be colocated with cloudera-scm-server, or on a remote host. Consult with your database administrators for details on the path where the database information is stored.

Consider excluding the following directories and all of their subdirectories:

Installation, Configuration, and Libraries

/opt/cloudera/cm
/opt/cloudera/cm-agent /etc/cloudera-scm-agent /etc/cloudera-scm-server /var/lib/cloudera-host-monitor
/var/lib/cloudera-scm-agent
/var/lib/cloudera-scm-eventserver
/var/lib/cloudera-scm-server
/var/lib/cloudera-scm-server-db
/var/lib/cloudera-service-monitor

Runtime and Logging

/var/run/cloudera-scm-agent
/var/run/cloudera-scm-server

/var/log/cloudera-scm-agent
/var/log/cloudera-scm-alertpublisher
/var/log/cloudera-scm-eventserver
/var/log/cloudera-scm-firehose
/var/log/cloudera-scm-server

HDFS:

Note: The directories in HDFS are user-configurable. I recommend you exclude these, especially the data directory for the DataNode and the meta directories for the NameNode and JournalNode. These details can be found in the hdfs-site.xml file:

# grep -A1 "dir" /etc/hadoop/conf/hdfs-site.xml

Consider excluding the following directories and all of their subdirectories:

Installation, Configuration, and Libraries

/opt/cloudera

/var/lib/hadoop-hdfs

Runtime and Logging

/var/log/hadoop-hdfs

Scratch and Temp

/tmp/hadoop-hdfs

Note: HDFS, YARN, MapReduce, and ZooKeeper are mutually interdependent and you are likely to experience unsatisfactory results if you fail to exclude the other components.

YARN:

Note: The directories YARN uses are user-configurable. I recommend you exclude them. These properties can be found in Cloudera Manager > YARN > Configuration:

yarn.nodemanager.local-dirs
yarn.nodemanager.log-dirs
yarn.nodemanager.recovery.dir

yarn.timeline-service.leveldb-state-store.path
yarn.timeline-service.leveldb-timeline-store.path

Consider excluding the following directories and all of their subdirectories:

Installation, Configuration, and Libraries

/opt/cloudera

/var/lib/hadoop-yarn

Runtime and Logging

 

/var/log/hadoop-yarn

 

Note: HDFS, YARN, MapReduce, and ZooKeeper are mutually interdependent and you are likely to experience unsatisfactory results if you fail to also exclude the other components.

MapReduce:

Note: Some directories in MapReduce are user-configurable. I recommend you exclude them. These properties can be found in Cloudera Manager > MapReduce > Configs, and this one, in particular, should be excluded:

mapreduce.jobhistory.recovery.store.leveldb.path

Consider excluding the following directories and all of their subdirectories:

Installation, Configuration, and Libraries

/opt/cloudera

/var/lib/hadoop-mapreduce

Runtime and Logging

 

/var/log/hadoop-mapreduce

 

Note: HDFS, YARN, MapReduce, and ZooKeeper are mutually interdependent and you are likely to experience unsatisfactory results if you fail to also exclude the other components.

ZooKeeper:

Note: ZooKeeper has a user-configurable data directory. I recommend you exclude it. This directory can be found by running the following command:

# grep dataDir /etc/zookeeper/conf/zoo.cfg

Consider excluding the following directories and all of their subdirectories:

Installation, Configuration, and Libraries

/opt/cloudera

Runtime and Logging

/var/log/zookeeper

Note: HDFS, YARN, MapReduce, and ZooKeeper are mutually interdependent and you are likely to experience unsatisfactory results if you fail to also exclude the other components.

553 Views
0 Kudos