Created on 06-01-2021 08:19 PM
Note: Cloudera does not support antivirus software of any kind.
This article contains recommendations for excluding HDP components and directories from AV scans and monitoring. It is important to note that these recommendations do not apply to each service, and further, some services will have additional items to exclude which are unique to them. These details will be addressed in individual articles dedicated to the service in question.
The three primary locations you will want to exclude from Antivirus are:
In general, consider excluding the following directories and all of their subdirectories:
/hadoop
/usr/hdp
/etc/hadoop
/etc/<component>
/var/lib/<component>
/var/run/<component>
/var/log/<component>
/var/tmp/<component>
/tmp/<component>
Note: The <component> does not only refer to the service name, as a given service may have multiple daemons with their own directories.
Example: ambari-agent and ambari-server.
Across HDP services there are also many user-configurable locations. Most of these can be found in Ambari properties with names like 'service.scratch.dir' and 'service.data.dir'; go to Ambari > Service > Configs > Advanced and search for any property containing "dir", all of which may be considered for exclusion. Instructions for specific services follow:
Note: Ambari has a special requirement in the form of a user-configurable database. I recommend you exclude this database. However, the details of this database are set on installation; the database may be colocated with ambari-server, or on a remote host. Consult with your database administrators for details on the path where the database information is stored; Ambari does not keep this information anywhere in its configuration. If you need details about which database Ambari is using, search for JDBC in the amber.properties file.
# grep 'jdbc' /etc/ambari-server/conf/ambari.properties
Consider excluding the following directories and all of their subdirectories:
/usr/hdp /usr/lib/ambari-agent /usr/lib/ambari-server /etc/hadoop /etc/ambari-agent /etc/ambari-server /var/lib/ambari-agent /var/lib/ambari-server
/var/run/ambari-agent /var/run/ambari-server /var/log/ambari-agent /var/log/ambari-server
Note: The directories in HDFS are user-configurable. I recommend you exclude these, especially the data directory for the DataNode and the meta directories for the NameNode and JournalNode. These details can be found in the 'hdfs-site.xml' file:
# grep -A1 "dir" /etc/hadoop/conf/hdfs-site.xml
Consider excluding the following directories and all of their subdirectories:
/usr/hdp /etc/hadoop /var/lib/hadoop-hdfs
/var/run/hadoop /var/log/hadoop
/tmp/hadoop-hdfs
Note: HDFS, YARN, MapReduce, and ZooKeeper are mutually interdependent and you are likely to experience unsatisfactory results if you fail to exclude the other components.
Note: The directories YARN uses are user-configurable. I recommend you exclude them. These properties can be found in Ambari > YARN > Configs > Advanced:
yarn.nodemanager.local-dirs yarn.nodemanager.log-dirs yarn.nodemanager.recovery.dir yarn.timeline-service.leveldb-state-store.path yarn.timeline-service.leveldb-timeline-store.path
Consider excluding the following directories and all of their subdirectories:
/usr/hdp /etc/hadoop /var/lib/hadoop-yarn
/var/run/hadoop-yarn /var/log/hadoop-yarn
Note: HDFS, YARN, MapReduce, and ZooKeeper are mutually interdependent and you are likely to experience unsatisfactory results if you fail to also exclude the other components.
Note: Some directories in MapReduce are user-configurable. I recommend you exclude them. These properties can be found in Ambari > YARN > Configs > Advanced and this one, in particular, should be excluded:
mapreduce.jobhistory.recovery.store.leveldb.path
Consider excluding the following directories and all of their subdirectories:
/usr/hdp /etc/hadoop /var/lib/hadoop-mapreduce
/var/run/hadoop-mapreduce /var/log/hadoop-mapreduce
Note: HDFS, YARN, MapReduce, and ZooKeeper are mutually interdependent and you are likely to experience unsatisfactory results if you fail to also exclude the other components.
Note: ZooKeeper has a user-configurable data directory. I recommend you exclude it. This directory can be found by running the following command:
# grep dataDir /etc/zookeeper/conf/zoo.cfg
Consider excluding the following directories and all of their subdirectories:
/usr/hdp /etc/hadoop
/var/run/zookeeper /var/log/zookeeper
Note: HDFS, YARN, MapReduce, and ZooKeeper are mutually interdependent and you are likely to experience unsatisfactory results if you fail to also exclude the other components.
Created on 06-25-2021 01:14 AM
Mcafee also lists launch_container.sh under vulnerability which can be ignored as launching containers is part of a yarn architecture.
Please be kind to go through the below point to understand more about yarn:
1. Client/user submits a job request.
2. Resource manager will check the input splits and will throw an error if it cannot be computed.
3. On successful computation, the resource manager will call the job to submit the procedure.
4. It will then find a node manager where it can launch app master
5. App master process will check the input splits and will make mapper and reducer tasks(Task IDs are given at this point)
6. App master will do the computation and based on that it will request resources in form of a container that will contain memory and CPU as requested by the App master.
7. On receiving the request, the node manager will use "launch_container.sh" as well as a few other job-related inputs to successfully launch a container to be used for task execution.
This is the expected behavior and should be considered a false vulnerability.