Member since
10-01-2018
272
Posts
5
Kudos Received
4
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
3592 | 09-28-2020 08:05 AM | |
3207 | 04-16-2020 09:20 AM | |
1575 | 04-16-2020 08:48 AM | |
4075 | 04-16-2020 08:10 AM |
09-28-2020
08:05 AM
1 Kudo
This will depend on what these other agents are to a large extent, but in the general case there are two options: 1) Run a cron job: one, check to see if the process is already running; two, check to see if HDFS is running; three start the process if neither of the above are true. 2) I believe Ubuntu 16 uses systemd. In systemd start order can be controlled via dependencies in the unit file of the process. You will have to reference it specifically for your version, but I believe there are two relevant settings: wants or requires determines which processes run together; before and after determines the order in which they are run. The most thorough solution is to set these other agents' unit files as requires HDFS and after HDFS.
... View more
09-28-2020
07:32 AM
1 Kudo
@matagyula I suggest we attempt to get more information out of fsck in the PROD environment. This has two parts: 1) Use the options to get more detailed output about which blocks go where, and include snapshots. $ hdfs fsck / -files -blocks -locations -includeSnapshots This will break the results down into files, which blocks belong to which files, and where those files are located. Note: this will be a longer fsck, and induce a heavier load. Not recommended during peak load times. 2) Check the user who is running the fsck. We recommend running as the hdfs user, or another admin-level user. Edit: hdfs fsck also ignores open files by default. Depending on your prod cluster's usage patterns and data structure, it is possible for a very large number of blocks of blocks to be open at once. You can include an option to include these in the count: $ hdfs fsck / -openforwrite I recommend this be done separately, before the heavier multi-option version above.
... View more
09-22-2020
09:20 AM
@kvinod Can you provide the procedure and exact command you are using to restore the snapshot? 1) Are you restoring the snapshot over the top of Y environment, or are you clearing it first? This kind of behavior often happens when a restoration does not overwrite existing content, merely adding to it. 2) Are the versions exactly the same between the two environments? It is sometimes necessary to modify the command and import a different version, due to subtle differences between them. If you could tell us your version information, that would also be useful. For example, there is a default method from CDH5: https://my.cloudera.com/knowledge/Copying-HBase-Table-Between-Clusters--ExportSnapshot?id=72706 But there is a problem with it that appears in some CDH6 versions: https://my.cloudera.com/knowledge/TSB-2020-379-Data-loss-with-restore-snapshot?id=283633 Regards, Ryan Blough, COE Cloudera Inc.
... View more
09-22-2020
09:04 AM
1 Kudo
@matagyula That does appear to be a discrepancy. There are a few things we can check for this. 1) Did you get the block numbers from the NameNode UI in both cases? If the information came from an alert, it may be out of date as old alerts are preserved. 2) In the PROD environment, are all of the DataNodes showing as online? You can get this information from the commandline using the following command: $ hdfs dfsadmin -report This should also include a block count; but the dfsadmin report will include the replicas, and identify incompletely replicated blocks as missing. 3) Is the replication factor the same in PROD as it is in the other environment? The simplest explanation is that one or more DataNodes have been excluded from the count, but if the count came from an alert it may be inaccurate due to timing. Regards, Ryan Blough, COE Cloudera Inc.
... View more
05-11-2020
09:17 AM
Note: Cloudera does not support antivirus software of any kind.
This article contains general recommendations for excluding ZooKeeper components and directories from antivirus scans and monitoring.
The three primary locations you will want to exclude from antivirus are:
Data directories: These can be very large, and therefore take a long time to scan; they can also be very write-heavy, and therefore suffer performance impacts or failures if the antivirus holds up writes.
Log directories: These are write-heavy.
Scratch directories: These are internal locations used by some services for writing temporary data, and can also cause performance impacts or failures if the antivirus holds up writes.
Note: ZooKeeper has a user-configurable data directory. I recommend you exclude it. This directory can be found by running the following command:
# grep dataDir /etc/zookeeper/conf/zoo.cfg
Consider excluding the following directories and all of their subdirectories:
Installation, Configuration, and Libraries
/usr/hdp
/etc/hadoop
Runtime and Logging
/var/run/zookeeper
/var/log/zookeeper
Note: HDFS, YARN, MapReduce and ZooKeeper are mutually interdependent and you are likely to experience unsatisfactory results if you fail to also exclude the other components.
... View more
05-11-2020
09:17 AM
Note: Cloudera does not support antivirus software of any kind.
This article contains general recommendations for excluding MapReduce components and directories from antivirus scans and monitoring.
The three primary locations you will want to exclude from antivirus are:
Data directories: These can be very large, and therefore take a long time to scan; they can also be very write-heavy, and therefore suffer performance impacts or failures if the antivirus holds up writes.
Log directories: These are write-heavy.
Scratch directories: These are internal locations used by some services for writing temporary data, and can also cause performance impacts or failures if the antivirus holds up writes.
Note: Some directories in MapReduce are user-configurable. I recommend you exclude them. These properties can be found in Ambari > YARN > Configs > Advanced, and this one in particular should be excluded:
mapreduce.jobhistory.recovery.store.leveldb.path
Consider excluding the following directories and all of their subdirectories:
Installation, Configuration, and Libraries
/usr/hdp
/etc/hadoop
/var/lib/hadoop-mapreduce
Runtime and Logging
/var/run/hadoop-mapreduce
/var/log/hadoop-mapreduce
Note: HDFS, YARN, MapReduce, and ZooKeeper are mutually interdependent and you are likely to experience unsatisfactory results if you fail to also exclude the other components.
... View more
05-11-2020
09:17 AM
1 Kudo
Note: Cloudera does not support antivirus software of any kind.
This article contains general recommendations for excluding YARN components and directories from antivirus scans and monitoring.
The three primary locations you will want to exclude from antivirus are:
Data directories: These can be very large, and therefore take a long time to scan; they can also be very write-heavy, and therefore suffer performance impacts or failures if the antivirus holds up writes.
Log directories: These are write-heavy.
Scratch directories: These are internal locations used by some services for writing temporary data, and can also cause performance impacts or failures if the antivirus holds up writes.
Note: The directories YARN uses are user-configurable. I recommend you exclude them. These properties can be found in Ambari > YARN > Configs > Advanced:
yarn.nodemanager.local-dirs
yarn.nodemanager.log-dirs
yarn.nodemanager.recovery.dir
yarn.timeline-service.leveldb-state-store.path
yarn.timeline-service.leveldb-timeline-store.path
Consider excluding the following directories and all of their subdirectories:
Installation, Configuration, and Libraries
/usr/hdp
/etc/hadoop
/var/lib/hadoop-yarn
Runtime and Logging
/var/run/hadoop-yarn
/var/log/hadoop-yarn
Note: HDFS, YARN, MapReduce, and ZooKeeper are mutually interdependent and you are likely to experience unsatisfactory results if you fail to also exclude the other components.
... View more
05-11-2020
09:16 AM
1 Kudo
Note: Cloudera does not support antivirus software of any kind.
This article contains general recommendations for excluding HDFS components and directories from antivirus scans and monitoring.
The three primary locations you will want to exclude from antivirus are:
Data directories: These can be very large, and therefore take a long time to scan; they can also be very write-heavy, and therefore suffer performance impacts or failures if the antivirus holds up writes.
Log directories: These are write-heavy.
Scratch directories: These are internal locations used by some services for writing temporary data, and can also cause performance impacts or failures if the antivirus holds up writes.
Note: The directories in HDFS are user-configurable. I recommend you exclude these, especially the data directory for the DataNode and the meta directories for the NameNode and JournalNode. These details can be found in the “hdfs-site.xml” file:
# grep -A1 "dir" /etc/hadoop/conf/hdfs-site.xml
Consider excluding the following directories and all of their subdirectories:
Installation, Configuration, and Libraries
/usr/hdp
/etc/hadoop
/var/lib/hadoop-hdfs
Runtime and Logging
/var/run/hadoop
/var/log/hadoop
Scratch and Temp
/tmp/hadoop-hdfs
Note: HDFS, YARN, MapReduce and ZooKeeper are mutually interdependent and you are likely to experience unsatisfactory results if you fail to exclude the other components.
... View more
05-11-2020
09:16 AM
1 Kudo
Note: Cloudera does not support antivirus software of any kind.
This article contains general recommendations for excluding Ambari components and directories from antivirus scans and monitoring.
The three primary locations you will want to exclude from antivirus are:
Data directories: These can be very large, and therefore take a long time to scan; they can also be very write-heavy, and therefore suffer performance impacts or failures if the AV holds up writes.
Log directories: These are write-heavy.
Scratch directories: These are internal locations used by some services for writing temporary data, and can also cause performance impacts or failures if the AV holds up writes.
Note: Ambari has a special requirement in the form of a user-configurable database. I recommend you exclude this database. However, the details of this database are set on installation; the database may be colocated with ambari-server, or on a remote host. Consult with your database administrators for details on the path where the database information is stored; Ambari does not keep this information anywhere in its configuration. If you need details about which database Ambari is using, search for JDBC in the “amber.properties” file.
# grep 'jdbc' /etc/ambari-server/conf/ambari.properties
Consider excluding the following directories and all of their subdirectories:
Installation, Configuration, and Libraries
/usr/hdp
/usr/lib/ambari-agent
/usr/lib/ambari-server
/etc/hadoop
/etc/ambari-agent
/etc/ambari-server
/var/lib/ambari-agent
/var/lib/ambari-server
Runtime and Logging
/var/run/ambari-agent
/var/run/ambari-server
/var/log/ambari-agent
/var/log/ambari-server
... View more
04-30-2020
12:21 AM
Note: Cloudera does not support antivirus software of any kind. This article contains generic recommendations for excluding HDP components and directories from AV scans and monitoring. It is important to note that these recommendations do not apply to each service, and further, some services will have additional items to exclude which are unique to them. These details will be addressed in individual articles dedicated to the service in question. The three primary locations you will want to exclude from antivirus are: Data directories: These can be very large, and therefore take a long time to scan; they can also be very write-heavy, and therefore suffer performance impacts or failures if the AV holds up writes. Log directories: These are write-heavy. Scratch directories: These are internal locations used by some services for writing temporary data, and can also cause performance impacts or failures if the AV holds up writes. Consider excluding the following directories and all of their subdirectories: Installation, Configuration, and Libraries /hadoop /usr/hdp /etc/hadoop /etc/<component> /var/lib/<component> Runtime and Logging /var/run/<component> /var/log/<component> Scratch and Temp /var/tmp/<component> /tmp/<component> Note: The <component> does not only refer to the service name, as a given service may have multiple daemons with their own directories. Example: ambari-agent and ambari-server. Across HDP services there are also many user-configurable locations. Most of these can be found in Ambari properties with names like "service.scratch.dir" and "service.data.dir"; go to Ambari > Service > Configs > Advanced and search for any property containing "dir", all of which may be considered for exclusion.
... View more
- « Previous
-
- 1
- 2
- Next »