Community Articles

Find and share helpful community-sourced technical articles.
avatar
Expert Contributor

Note: Cloudera does not support antivirus software of any kind.

This article contains recommendations for excluding HDP components and directories from AV scans and monitoring. It is important to note that these recommendations do not apply to each service, and further, some services will have additional items to exclude which are unique to them. These details will be addressed in individual articles dedicated to the service in question.

The three primary locations you will want to exclude from Antivirus are:

  • Data directories: These can be very large, and therefore take a long time to scan; they can also be very write-heavy, and therefore suffer performance impacts or failures if the AV holds up writes.
  • Log directories: These are write-heavy.
  • Scratch directories: These are internal locations used by some services for writing temporary data, and can also cause performance impacts or failures if the AV holds up writes.

In general, consider excluding the following directories and all of their subdirectories:

Installation, Configuration, and Libraries

/hadoop
/usr/hdp
/etc/hadoop
/etc/<component>
/var/lib/<component>

Runtime and Logging

/var/run/<component>
/var/log/<component>

Scratch and Temp

/var/tmp/<component>
/tmp/<component>

Note: The <component> does not only refer to the service name, as a given service may have multiple daemons with their own directories.

Example: ambari-agent and ambari-server.

Across HDP services there are also many user-configurable locations. Most of these can be found in Ambari properties with names like 'service.scratch.dir' and 'service.data.dir'; go to Ambari > Service > Configs > Advanced and search for any property containing "dir", all of which may be considered for exclusion. Instructions for specific services follow:

 

Ambari:

Note: Ambari has a special requirement in the form of a user-configurable database. I recommend you exclude this database. However, the details of this database are set on installation; the database may be colocated with ambari-server, or on a remote host. Consult with your database administrators for details on the path where the database information is stored; Ambari does not keep this information anywhere in its configuration. If you need details about which database Ambari is using, search for JDBC in the amber.properties file.

# grep 'jdbc' /etc/ambari-server/conf/ambari.properties

Consider excluding the following directories and all of their subdirectories:

Installation, Configuration, and Libraries

/usr/hdp
/usr/lib/ambari-agent
/usr/lib/ambari-server

/etc/hadoop
/etc/ambari-agent
/etc/ambari-server

/var/lib/ambari-agent
/var/lib/ambari-server

Runtime and Logging

/var/run/ambari-agent
/var/run/ambari-server

/var/log/ambari-agent
/var/log/ambari-server

HDFS:

Note: The directories in HDFS are user-configurable. I recommend you exclude these, especially the data directory for the DataNode and the meta directories for the NameNode and JournalNode. These details can be found in the 'hdfs-site.xml' file:

# grep -A1 "dir" /etc/hadoop/conf/hdfs-site.xml

Consider excluding the following directories and all of their subdirectories:

Installation, Configuration, and Libraries

/usr/hdp

/etc/hadoop

/var/lib/hadoop-hdfs

Runtime and Logging

/var/run/hadoop

/var/log/hadoop

Scratch and Temp

/tmp/hadoop-hdfs

Note: HDFS, YARN, MapReduce, and ZooKeeper are mutually interdependent and you are likely to experience unsatisfactory results if you fail to exclude the other components.

YARN:

Note: The directories YARN uses are user-configurable. I recommend you exclude them. These properties can be found in Ambari > YARN > Configs > Advanced:

yarn.nodemanager.local-dirs
yarn.nodemanager.log-dirs
yarn.nodemanager.recovery.dir

yarn.timeline-service.leveldb-state-store.path
yarn.timeline-service.leveldb-timeline-store.path

Consider excluding the following directories and all of their subdirectories:

Installation, Configuration, and Libraries

/usr/hdp

/etc/hadoop

/var/lib/hadoop-yarn

Runtime and Logging

/var/run/hadoop-yarn

/var/log/hadoop-yarn

Note: HDFS, YARN, MapReduce, and ZooKeeper are mutually interdependent and you are likely to experience unsatisfactory results if you fail to also exclude the other components.

MapReduce:

Note: Some directories in MapReduce are user-configurable. I recommend you exclude them. These properties can be found in Ambari > YARN > Configs > Advanced and this one, in particular, should be excluded:

mapreduce.jobhistory.recovery.store.leveldb.path

Consider excluding the following directories and all of their subdirectories:

Installation, Configuration, and Libraries

/usr/hdp

/etc/hadoop

/var/lib/hadoop-mapreduce

Runtime and Logging

/var/run/hadoop-mapreduce

/var/log/hadoop-mapreduce

Note: HDFS, YARN, MapReduce, and ZooKeeper are mutually interdependent and you are likely to experience unsatisfactory results if you fail to also exclude the other components.

ZooKeeper:

Note: ZooKeeper has a user-configurable data directory. I recommend you exclude it. This directory can be found by running the following command:

# grep dataDir /etc/zookeeper/conf/zoo.cfg

Consider excluding the following directories and all of their subdirectories:

Installation, Configuration, and Libraries

/usr/hdp

/etc/hadoop

Runtime and Logging

/var/run/zookeeper

/var/log/zookeeper

Note: HDFS, YARN, MapReduce, and ZooKeeper are mutually interdependent and you are likely to experience unsatisfactory results if you fail to also exclude the other components.

1,065 Views
0 Kudos
Comments
avatar
Cloudera Employee

Mcafee also lists launch_container.sh under vulnerability which can be ignored as launching containers is part of a yarn architecture.

 

Please be kind to go through the below point to understand more about yarn:

 

1. Client/user submits a job request.

2. Resource manager will check the input splits and will throw an error if it cannot be computed.

3. On successful computation, the resource manager will call the job to submit the procedure.

4. It will then find a node manager where it can launch app master

5. App master process will check the input splits and will make mapper and reducer tasks(Task IDs are given at this point)

6. App master will do the computation and based on that it will request resources in form of a container that will contain memory and CPU as requested by the App master.

7. On receiving the request, the node manager will use "launch_container.sh" as well as a few other job-related inputs to successfully launch a container to be used for task execution.

 

This is the expected behavior and should be considered a false vulnerability.