Member since
10-01-2018
308
Posts
7
Kudos Received
5
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 1999 | 11-27-2024 12:50 PM | |
| 5818 | 09-28-2020 08:05 AM | |
| 4229 | 04-16-2020 09:20 AM | |
| 2512 | 04-16-2020 08:48 AM | |
| 6835 | 04-16-2020 08:10 AM |
05-11-2020
09:17 AM
Note: Cloudera does not support antivirus software of any kind.
This article contains general recommendations for excluding ZooKeeper components and directories from antivirus scans and monitoring.
The three primary locations you will want to exclude from antivirus are:
Data directories: These can be very large, and therefore take a long time to scan; they can also be very write-heavy, and therefore suffer performance impacts or failures if the antivirus holds up writes.
Log directories: These are write-heavy.
Scratch directories: These are internal locations used by some services for writing temporary data, and can also cause performance impacts or failures if the antivirus holds up writes.
Note: ZooKeeper has a user-configurable data directory. I recommend you exclude it. This directory can be found by running the following command:
# grep dataDir /etc/zookeeper/conf/zoo.cfg
Consider excluding the following directories and all of their subdirectories:
Installation, Configuration, and Libraries
/usr/hdp
/etc/hadoop
Runtime and Logging
/var/run/zookeeper
/var/log/zookeeper
Note: HDFS, YARN, MapReduce and ZooKeeper are mutually interdependent and you are likely to experience unsatisfactory results if you fail to also exclude the other components.
... View more
05-11-2020
09:17 AM
Note: Cloudera does not support antivirus software of any kind.
This article contains general recommendations for excluding MapReduce components and directories from antivirus scans and monitoring.
The three primary locations you will want to exclude from antivirus are:
Data directories: These can be very large, and therefore take a long time to scan; they can also be very write-heavy, and therefore suffer performance impacts or failures if the antivirus holds up writes.
Log directories: These are write-heavy.
Scratch directories: These are internal locations used by some services for writing temporary data, and can also cause performance impacts or failures if the antivirus holds up writes.
Note: Some directories in MapReduce are user-configurable. I recommend you exclude them. These properties can be found in Ambari > YARN > Configs > Advanced, and this one in particular should be excluded:
mapreduce.jobhistory.recovery.store.leveldb.path
Consider excluding the following directories and all of their subdirectories:
Installation, Configuration, and Libraries
/usr/hdp
/etc/hadoop
/var/lib/hadoop-mapreduce
Runtime and Logging
/var/run/hadoop-mapreduce
/var/log/hadoop-mapreduce
Note: HDFS, YARN, MapReduce, and ZooKeeper are mutually interdependent and you are likely to experience unsatisfactory results if you fail to also exclude the other components.
... View more
05-11-2020
09:17 AM
1 Kudo
Note: Cloudera does not support antivirus software of any kind.
This article contains general recommendations for excluding YARN components and directories from antivirus scans and monitoring.
The three primary locations you will want to exclude from antivirus are:
Data directories: These can be very large, and therefore take a long time to scan; they can also be very write-heavy, and therefore suffer performance impacts or failures if the antivirus holds up writes.
Log directories: These are write-heavy.
Scratch directories: These are internal locations used by some services for writing temporary data, and can also cause performance impacts or failures if the antivirus holds up writes.
Note: The directories YARN uses are user-configurable. I recommend you exclude them. These properties can be found in Ambari > YARN > Configs > Advanced:
yarn.nodemanager.local-dirs
yarn.nodemanager.log-dirs
yarn.nodemanager.recovery.dir
yarn.timeline-service.leveldb-state-store.path
yarn.timeline-service.leveldb-timeline-store.path
Consider excluding the following directories and all of their subdirectories:
Installation, Configuration, and Libraries
/usr/hdp
/etc/hadoop
/var/lib/hadoop-yarn
Runtime and Logging
/var/run/hadoop-yarn
/var/log/hadoop-yarn
Note: HDFS, YARN, MapReduce, and ZooKeeper are mutually interdependent and you are likely to experience unsatisfactory results if you fail to also exclude the other components.
... View more
05-11-2020
09:16 AM
1 Kudo
Note: Cloudera does not support antivirus software of any kind.
This article contains general recommendations for excluding HDFS components and directories from antivirus scans and monitoring.
The three primary locations you will want to exclude from antivirus are:
Data directories: These can be very large, and therefore take a long time to scan; they can also be very write-heavy, and therefore suffer performance impacts or failures if the antivirus holds up writes.
Log directories: These are write-heavy.
Scratch directories: These are internal locations used by some services for writing temporary data, and can also cause performance impacts or failures if the antivirus holds up writes.
Note: The directories in HDFS are user-configurable. I recommend you exclude these, especially the data directory for the DataNode and the meta directories for the NameNode and JournalNode. These details can be found in the “hdfs-site.xml” file:
# grep -A1 "dir" /etc/hadoop/conf/hdfs-site.xml
Consider excluding the following directories and all of their subdirectories:
Installation, Configuration, and Libraries
/usr/hdp
/etc/hadoop
/var/lib/hadoop-hdfs
Runtime and Logging
/var/run/hadoop
/var/log/hadoop
Scratch and Temp
/tmp/hadoop-hdfs
Note: HDFS, YARN, MapReduce and ZooKeeper are mutually interdependent and you are likely to experience unsatisfactory results if you fail to exclude the other components.
... View more
05-11-2020
09:16 AM
1 Kudo
Note: Cloudera does not support antivirus software of any kind.
This article contains general recommendations for excluding Ambari components and directories from antivirus scans and monitoring.
The three primary locations you will want to exclude from antivirus are:
Data directories: These can be very large, and therefore take a long time to scan; they can also be very write-heavy, and therefore suffer performance impacts or failures if the AV holds up writes.
Log directories: These are write-heavy.
Scratch directories: These are internal locations used by some services for writing temporary data, and can also cause performance impacts or failures if the AV holds up writes.
Note: Ambari has a special requirement in the form of a user-configurable database. I recommend you exclude this database. However, the details of this database are set on installation; the database may be colocated with ambari-server, or on a remote host. Consult with your database administrators for details on the path where the database information is stored; Ambari does not keep this information anywhere in its configuration. If you need details about which database Ambari is using, search for JDBC in the “amber.properties” file.
# grep 'jdbc' /etc/ambari-server/conf/ambari.properties
Consider excluding the following directories and all of their subdirectories:
Installation, Configuration, and Libraries
/usr/hdp
/usr/lib/ambari-agent
/usr/lib/ambari-server
/etc/hadoop
/etc/ambari-agent
/etc/ambari-server
/var/lib/ambari-agent
/var/lib/ambari-server
Runtime and Logging
/var/run/ambari-agent
/var/run/ambari-server
/var/log/ambari-agent
/var/log/ambari-server
... View more
04-30-2020
12:21 AM
Note: Cloudera does not support antivirus software of any kind. This article contains generic recommendations for excluding HDP components and directories from AV scans and monitoring. It is important to note that these recommendations do not apply to each service, and further, some services will have additional items to exclude which are unique to them. These details will be addressed in individual articles dedicated to the service in question. The three primary locations you will want to exclude from antivirus are: Data directories: These can be very large, and therefore take a long time to scan; they can also be very write-heavy, and therefore suffer performance impacts or failures if the AV holds up writes. Log directories: These are write-heavy. Scratch directories: These are internal locations used by some services for writing temporary data, and can also cause performance impacts or failures if the AV holds up writes. Consider excluding the following directories and all of their subdirectories: Installation, Configuration, and Libraries /hadoop /usr/hdp /etc/hadoop /etc/<component> /var/lib/<component> Runtime and Logging /var/run/<component> /var/log/<component> Scratch and Temp /var/tmp/<component> /tmp/<component> Note: The <component> does not only refer to the service name, as a given service may have multiple daemons with their own directories. Example: ambari-agent and ambari-server. Across HDP services there are also many user-configurable locations. Most of these can be found in Ambari properties with names like "service.scratch.dir" and "service.data.dir"; go to Ambari > Service > Configs > Advanced and search for any property containing "dir", all of which may be considered for exclusion.
... View more
04-16-2020
09:20 AM
You are correct; the drivers are built for each platform. The HDP downloads page is here: https://www.cloudera.com/downloads/hdp.html It contains the JDBC41 driver for Hive.
... View more
04-16-2020
08:48 AM
You are correct; stoping a service is not the same as a service crashing. Alerts generally do not cover intentional administrator activity like starting and stopping of services. However, you do still have access to this information; starting and stopping of services are covered under Events, of the Audit type: https://docs.cloudera.com/documentation/enterprise/6/6.3/topics/cm_dg_events.html#cmug_topic_10 The AUDIT_EVENT type covers actions performed. This is also where you will track configuration changes. Turning to the question of API use, here is the Cloudera Manager documentation's section on the API: https://docs.cloudera.com/documentation/enterprise/6/6.3/topics/cloudera_manager.html#concept_nsg_jq3_mz Here is the Tutorial linked from that doc, which has a ton of examples, including starting and stopping of services: https://archive.cloudera.com/cm6/6.3.0/generic/jar/cm_api/apidocs/tutorial.html While the Alerts don't tell you when services are started and stopped, you can query Events through the API. We have a Knowledge Base Article on the subject: https://my.cloudera.com/knowledge/Accessing-Critical-Events-Using-the-Cloudera-Manager-API-?id=72521
... View more
04-16-2020
08:10 AM
To my knowledge, what the cluster uses to check NPT is the following command: # ntpdc -np This is used in the health checks for the hosts. This is included in the docs under the Host Health Tests. For example in CDH 5.16.x: https://docs.cloudera.com/documentation/enterprise/5-16-x/topics/cm_ht_host.html#concept_lxn_zxn_yk You might also consider Chrony over NTP. From a previous discussion I had with a subject matter expert at Red Hat, these are the things chronyd can do better than ntpd: - chronyd can work well in an environment where access to the time reference is intermittent, whereas ntpd needs regular polling of time reference to work well. - chronyd can perform well even when the network is congested for longer periods of time. - chronyd can usually synchronize the clock faster and with better accuracy. - chronyd quickly adapts to sudden changes in the rate of the clock, for example, due to changes in the temperature of the crystal oscillator, whereas ntpd may need a long time to settle down again. - In the default configuration, chronyd never steps the time after the clock has been synchronized at system start, in order not to upset other running programs. ntpd can be configured to never step the time too, but it has to use a different means of adjusting the clock, which has some disadvantages including negative effect on accuracy of the clock. - chronyd can adjust the rate of the clock on a Linux system in a larger range, which allows it to operate even on machines with a broken or unstable clock. For example, on some virtual machines. - chronyd is smaller, it uses less memory and it wakes up the CPU only when necessary, which is better for power saving. I think the intermittent connection and virtual-machine specific details might apply to your use case. Their relevant documentation: Differences Between ntpd and chronyd https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html-single/system_administrators_guide/index#sect-differences_between_ntpd_and_chronyd UNDERSTANDING CHRONY AND ITS CONFIGURATION https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/system_administrators_guide/sect-understanding_chrony_and-its_configuration Note that you will have to disable ntpd, because the cluster checks for ntpd first and chronyd second. Regards, Ryan Blough, COE Cloudera Inc.
... View more
08-23-2019
12:26 PM
Very helpful! It may be implied by "Later as you have already deployed the cluster we need to reset the cluster," but I had to add an additional step of removing the empty baseurl repositories that had been configured on the hosts. If this step isn't done, the error continues to appear because the repo configuration is still left over from the failed attempt. This can be done two ways: # yum remove HDP-3.1-repo-1 # yum clean all And also by manually removing the repo file from `/etc/yum.repos.d/' and then running # yum clean all at least for the RHEL 7 hosts we were on.
... View more
- « Previous
-
- 1
- 2
- Next »