About rblough

rblough · ‎09-28-2020

I'm glad you decided to try it out! From the screenshot, it looks like all the services are currently in the Stopped state. You should see a big button that says Actions; this will produce a drop down menu with different actions listed. What happens when you click on the Action button, and then select Start All Services?

rblough · ‎09-28-2020

This will depend on what these other agents are to a large extent, but in the general case there are two options: 1) Run a cron job: one, check to see if the process is already running; two, check to see if HDFS is running; three start the process if neither of the above are true. 2) I believe Ubuntu 16 uses systemd. In systemd start order can be controlled via dependencies in the unit file of the process. You will have to reference it specifically for your version, but I believe there are two relevant settings: wants or requires determines which processes run together; before and after determines the order in which they are run. The most thorough solution is to set these other agents' unit files as requires HDFS and after HDFS.

rblough · ‎09-28-2020

@matagyula I suggest we attempt to get more information out of fsck in the PROD environment. This has two parts: 1) Use the options to get more detailed output about which blocks go where, and include snapshots. $ hdfs fsck / -files -blocks -locations -includeSnapshots This will break the results down into files, which blocks belong to which files, and where those files are located. Note: this will be a longer fsck, and induce a heavier load. Not recommended during peak load times. 2) Check the user who is running the fsck. We recommend running as the hdfs user, or another admin-level user. Edit: hdfs fsck also ignores open files by default. Depending on your prod cluster's usage patterns and data structure, it is possible for a very large number of blocks of blocks to be open at once. You can include an option to include these in the count: $ hdfs fsck / -openforwrite I recommend this be done separately, before the heavier multi-option version above.

rblough · ‎09-22-2020

@kvinod Can you provide the procedure and exact command you are using to restore the snapshot? 1) Are you restoring the snapshot over the top of Y environment, or are you clearing it first? This kind of behavior often happens when a restoration does not overwrite existing content, merely adding to it. 2) Are the versions exactly the same between the two environments? It is sometimes necessary to modify the command and import a different version, due to subtle differences between them. If you could tell us your version information, that would also be useful. For example, there is a default method from CDH5: https://my.cloudera.com/knowledge/Copying-HBase-Table-Between-Clusters--ExportSnapshot?id=72706 But there is a problem with it that appears in some CDH6 versions: https://my.cloudera.com/knowledge/TSB-2020-379-Data-loss-with-restore-snapshot?id=283633 Regards, Ryan Blough, COE Cloudera Inc.

rblough · ‎09-22-2020

@matagyula That does appear to be a discrepancy. There are a few things we can check for this. 1) Did you get the block numbers from the NameNode UI in both cases? If the information came from an alert, it may be out of date as old alerts are preserved. 2) In the PROD environment, are all of the DataNodes showing as online? You can get this information from the commandline using the following command: $ hdfs dfsadmin -report This should also include a block count; but the dfsadmin report will include the replicas, and identify incompletely replicated blocks as missing. 3) Is the replication factor the same in PROD as it is in the other environment? The simplest explanation is that one or more DataNodes have been excluded from the count, but if the count came from an alert it may be inaccurate due to timing. Regards, Ryan Blough, COE Cloudera Inc.

rblough · ‎05-11-2020

Note: Cloudera does not support antivirus software of any kind. This article contains general recommendations for excluding ZooKeeper components and directories from antivirus scans and monitoring. The three primary locations you will want to exclude from antivirus are: Data directories: These can be very large, and therefore take a long time to scan; they can also be very write-heavy, and therefore suffer performance impacts or failures if the antivirus holds up writes. Log directories: These are write-heavy. Scratch directories: These are internal locations used by some services for writing temporary data, and can also cause performance impacts or failures if the antivirus holds up writes. Note: ZooKeeper has a user-configurable data directory. I recommend you exclude it. This directory can be found by running the following command: # grep dataDir /etc/zookeeper/conf/zoo.cfg Consider excluding the following directories and all of their subdirectories: Installation, Configuration, and Libraries /usr/hdp /etc/hadoop Runtime and Logging /var/run/zookeeper /var/log/zookeeper Note: HDFS, YARN, MapReduce and ZooKeeper are mutually interdependent and you are likely to experience unsatisfactory results if you fail to also exclude the other components.

rblough · ‎05-11-2020

Note: Cloudera does not support antivirus software of any kind. This article contains general recommendations for excluding MapReduce components and directories from antivirus scans and monitoring. The three primary locations you will want to exclude from antivirus are: Data directories: These can be very large, and therefore take a long time to scan; they can also be very write-heavy, and therefore suffer performance impacts or failures if the antivirus holds up writes. Log directories: These are write-heavy. Scratch directories: These are internal locations used by some services for writing temporary data, and can also cause performance impacts or failures if the antivirus holds up writes. Note: Some directories in MapReduce are user-configurable. I recommend you exclude them. These properties can be found in Ambari > YARN > Configs > Advanced, and this one in particular should be excluded: mapreduce.jobhistory.recovery.store.leveldb.path Consider excluding the following directories and all of their subdirectories: Installation, Configuration, and Libraries /usr/hdp /etc/hadoop /var/lib/hadoop-mapreduce Runtime and Logging /var/run/hadoop-mapreduce /var/log/hadoop-mapreduce Note: HDFS, YARN, MapReduce, and ZooKeeper are mutually interdependent and you are likely to experience unsatisfactory results if you fail to also exclude the other components.

rblough · ‎05-11-2020

Note: Cloudera does not support antivirus software of any kind. This article contains general recommendations for excluding YARN components and directories from antivirus scans and monitoring. The three primary locations you will want to exclude from antivirus are: Data directories: These can be very large, and therefore take a long time to scan; they can also be very write-heavy, and therefore suffer performance impacts or failures if the antivirus holds up writes. Log directories: These are write-heavy. Scratch directories: These are internal locations used by some services for writing temporary data, and can also cause performance impacts or failures if the antivirus holds up writes. Note: The directories YARN uses are user-configurable. I recommend you exclude them. These properties can be found in Ambari > YARN > Configs > Advanced: yarn.nodemanager.local-dirs yarn.nodemanager.log-dirs yarn.nodemanager.recovery.dir yarn.timeline-service.leveldb-state-store.path yarn.timeline-service.leveldb-timeline-store.path Consider excluding the following directories and all of their subdirectories: Installation, Configuration, and Libraries /usr/hdp /etc/hadoop /var/lib/hadoop-yarn Runtime and Logging /var/run/hadoop-yarn /var/log/hadoop-yarn Note: HDFS, YARN, MapReduce, and ZooKeeper are mutually interdependent and you are likely to experience unsatisfactory results if you fail to also exclude the other components.

rblough · ‎05-11-2020

Note: Cloudera does not support antivirus software of any kind. This article contains general recommendations for excluding HDFS components and directories from antivirus scans and monitoring. The three primary locations you will want to exclude from antivirus are: Data directories: These can be very large, and therefore take a long time to scan; they can also be very write-heavy, and therefore suffer performance impacts or failures if the antivirus holds up writes. Log directories: These are write-heavy. Scratch directories: These are internal locations used by some services for writing temporary data, and can also cause performance impacts or failures if the antivirus holds up writes. Note: The directories in HDFS are user-configurable. I recommend you exclude these, especially the data directory for the DataNode and the meta directories for the NameNode and JournalNode. These details can be found in the “hdfs-site.xml” file: # grep -A1 "dir" /etc/hadoop/conf/hdfs-site.xml Consider excluding the following directories and all of their subdirectories: Installation, Configuration, and Libraries /usr/hdp /etc/hadoop /var/lib/hadoop-hdfs Runtime and Logging /var/run/hadoop /var/log/hadoop Scratch and Temp /tmp/hadoop-hdfs Note: HDFS, YARN, MapReduce and ZooKeeper are mutually interdependent and you are likely to experience unsatisfactory results if you fail to exclude the other components.

rblough · ‎05-11-2020

Note: Cloudera does not support antivirus software of any kind. This article contains general recommendations for excluding Ambari components and directories from antivirus scans and monitoring. The three primary locations you will want to exclude from antivirus are: Data directories: These can be very large, and therefore take a long time to scan; they can also be very write-heavy, and therefore suffer performance impacts or failures if the AV holds up writes. Log directories: These are write-heavy. Scratch directories: These are internal locations used by some services for writing temporary data, and can also cause performance impacts or failures if the AV holds up writes. Note: Ambari has a special requirement in the form of a user-configurable database. I recommend you exclude this database. However, the details of this database are set on installation; the database may be colocated with ambari-server, or on a remote host. Consult with your database administrators for details on the path where the database information is stored; Ambari does not keep this information anywhere in its configuration. If you need details about which database Ambari is using, search for JDBC in the “amber.properties” file. # grep 'jdbc' /etc/ambari-server/conf/ambari.properties Consider excluding the following directories and all of their subdirectories: Installation, Configuration, and Libraries /usr/hdp /usr/lib/ambari-agent /usr/lib/ambari-server /etc/hadoop /etc/ambari-agent /etc/ambari-server /var/lib/ambari-agent /var/lib/ambari-server Runtime and Logging /var/run/ambari-agent /var/run/ambari-server /var/log/ambari-agent /var/log/ambari-server

Online	Offline
Last Visited	‎01-14-2026 01:43 PM

Member Since	‎10-01-2018 03:45 PM
Last Visited	‎01-14-2026 01:43 PM
Posts	308
Kudos received	4

Cloudera Community

Re: Altering existing range partition without data...

Re: How can I run linux commands after cloudera HD...

Re: HiveJDBC41 driver for Hortonworks

Re: Cloudera Manager alerting and API

Re: It's time we have a talk about NTP. Why does i...

Re: Ambari Hortonworks does not work

Re: How can I run linux commands after cloudera HD...

Re: Cluster Block Count - which is the real number...

Re: HBase table counts mismatch after restoring HB...

Re: Cluster Block Count - which is the real number...

Excluding ZooKeeper from antivirus Scans

Excluding MapReduce from antivirus scans

Excluding YARN from antivirus scans

Excluding HDFS from antivirus scans

Excluding Ambari from antivirus scans