Member since
10-01-2018
16
Posts
3
Kudos Received
4
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
503 | 09-28-2020 08:05 AM | |
375 | 04-16-2020 09:20 AM | |
220 | 04-16-2020 08:48 AM | |
523 | 04-16-2020 08:10 AM |
09-29-2020
07:17 AM
Now that you have made several attempts to start, are there any notifications in Ambari? What we should see is a series of operations try to run, and alerts for errors that are encountered. The alerts icon is shaped like a bell, and the operations icon is shaped like a gear. If there is nothing in either of those locations, than Ambari is not responding at all. The simplest thing would be to restart the Sandbox from scratch. If you can get terminal access to the containers the cluster is running on, you can restart the ambari-server process on the Ambari host.
... View more
09-28-2020
10:25 AM
I'm glad you decided to try it out! From the screenshot, it looks like all the services are currently in the Stopped state. You should see a big button that says Actions; this will produce a drop down menu with different actions listed. What happens when you click on the Action button, and then select Start All Services?
... View more
09-28-2020
08:05 AM
1 Kudo
This will depend on what these other agents are to a large extent, but in the general case there are two options: 1) Run a cron job: one, check to see if the process is already running; two, check to see if HDFS is running; three start the process if neither of the above are true. 2) I believe Ubuntu 16 uses systemd. In systemd start order can be controlled via dependencies in the unit file of the process. You will have to reference it specifically for your version, but I believe there are two relevant settings: wants or requires determines which processes run together; before and after determines the order in which they are run. The most thorough solution is to set these other agents' unit files as requires HDFS and after HDFS.
... View more
09-28-2020
07:32 AM
1 Kudo
@matagyula I suggest we attempt to get more information out of fsck in the PROD environment. This has two parts: 1) Use the options to get more detailed output about which blocks go where, and include snapshots. $ hdfs fsck / -files -blocks -locations -includeSnapshots This will break the results down into files, which blocks belong to which files, and where those files are located. Note: this will be a longer fsck, and induce a heavier load. Not recommended during peak load times. 2) Check the user who is running the fsck. We recommend running as the hdfs user, or another admin-level user. Edit: hdfs fsck also ignores open files by default. Depending on your prod cluster's usage patterns and data structure, it is possible for a very large number of blocks of blocks to be open at once. You can include an option to include these in the count: $ hdfs fsck / -openforwrite I recommend this be done separately, before the heavier multi-option version above.
... View more
09-22-2020
09:20 AM
@kvinod Can you provide the procedure and exact command you are using to restore the snapshot? 1) Are you restoring the snapshot over the top of Y environment, or are you clearing it first? This kind of behavior often happens when a restoration does not overwrite existing content, merely adding to it. 2) Are the versions exactly the same between the two environments? It is sometimes necessary to modify the command and import a different version, due to subtle differences between them. If you could tell us your version information, that would also be useful. For example, there is a default method from CDH5: https://my.cloudera.com/knowledge/Copying-HBase-Table-Between-Clusters--ExportSnapshot?id=72706 But there is a problem with it that appears in some CDH6 versions: https://my.cloudera.com/knowledge/TSB-2020-379-Data-loss-with-restore-snapshot?id=283633 Regards, Ryan Blough, COE Cloudera Inc.
... View more
09-22-2020
09:04 AM
1 Kudo
@matagyula That does appear to be a discrepancy. There are a few things we can check for this. 1) Did you get the block numbers from the NameNode UI in both cases? If the information came from an alert, it may be out of date as old alerts are preserved. 2) In the PROD environment, are all of the DataNodes showing as online? You can get this information from the commandline using the following command: $ hdfs dfsadmin -report This should also include a block count; but the dfsadmin report will include the replicas, and identify incompletely replicated blocks as missing. 3) Is the replication factor the same in PROD as it is in the other environment? The simplest explanation is that one or more DataNodes have been excluded from the count, but if the count came from an alert it may be inaccurate due to timing. Regards, Ryan Blough, COE Cloudera Inc.
... View more
05-11-2020
09:17 AM
Note: Cloudera does not support antivirus software of any kind.
This article contains general recommendations for excluding ZooKeeper components and directories from antivirus scans and monitoring.
The three primary locations you will want to exclude from antivirus are:
Data directories: These can be very large, and therefore take a long time to scan; they can also be very write-heavy, and therefore suffer performance impacts or failures if the antivirus holds up writes.
Log directories: These are write-heavy.
Scratch directories: These are internal locations used by some services for writing temporary data, and can also cause performance impacts or failures if the antivirus holds up writes.
Note: ZooKeeper has a user-configurable data directory. I recommend you exclude it. This directory can be found by running the following command:
# grep dataDir /etc/zookeeper/conf/zoo.cfg
Consider excluding the following directories and all of their subdirectories:
Installation, Configuration, and Libraries
/usr/hdp
/etc/hadoop
Runtime and Logging
/var/run/zookeeper
/var/log/zookeeper
Note : HDFS, YARN, MapReduce and ZooKeeper are mutually interdependent and you are likely to experience unsatisfactory results if you fail to also exclude the other components.
... View more
05-11-2020
09:17 AM
Note: Cloudera does not support antivirus software of any kind.
This article contains general recommendations for excluding MapReduce components and directories from antivirus scans and monitoring.
The three primary locations you will want to exclude from antivirus are:
Data directories: These can be very large, and therefore take a long time to scan; they can also be very write-heavy, and therefore suffer performance impacts or failures if the antivirus holds up writes.
Log directories: These are write-heavy.
Scratch directories: These are internal locations used by some services for writing temporary data, and can also cause performance impacts or failures if the antivirus holds up writes.
Note: Some directories in MapReduce are user-configurable. I recommend you exclude them. These properties can be found in Ambari > YARN > Configs > Advanced, and this one in particular should be excluded:
mapreduce.jobhistory.recovery.store.leveldb.path
Consider excluding the following directories and all of their subdirectories:
Installation, Configuration, and Libraries
/usr/hdp
/etc/hadoop
/var/lib/hadoop-mapreduce
Runtime and Logging
/var/run/hadoop-mapreduce
/var/log/hadoop-mapreduce
Note : HDFS, YARN, MapReduce, and ZooKeeper are mutually interdependent and you are likely to experience unsatisfactory results if you fail to also exclude the other components.
... View more
05-11-2020
09:17 AM
1 Kudo
Note: Cloudera does not support antivirus software of any kind.
This article contains general recommendations for excluding YARN components and directories from antivirus scans and monitoring.
The three primary locations you will want to exclude from antivirus are:
Data directories: These can be very large, and therefore take a long time to scan; they can also be very write-heavy, and therefore suffer performance impacts or failures if the antivirus holds up writes.
Log directories: These are write-heavy.
Scratch directories: These are internal locations used by some services for writing temporary data, and can also cause performance impacts or failures if the antivirus holds up writes.
Note: The directories YARN uses are user-configurable. I recommend you exclude them. These properties can be found in Ambari > YARN > Configs > Advanced:
yarn.nodemanager.local-dirs
yarn.nodemanager.log-dirs
yarn.nodemanager.recovery.dir
yarn.timeline-service.leveldb-state-store.path
yarn.timeline-service.leveldb-timeline-store.path
Consider excluding the following directories and all of their subdirectories:
Installation, Configuration, and Libraries
/usr/hdp
/etc/hadoop
/var/lib/hadoop-yarn
Runtime and Logging
/var/run/hadoop-yarn
/var/log/hadoop-yarn
Note : HDFS, YARN, MapReduce, and ZooKeeper are mutually interdependent and you are likely to experience unsatisfactory results if you fail to also exclude the other components.
... View more
05-11-2020
09:16 AM
1 Kudo
Note: Cloudera does not support antivirus software of any kind.
This article contains general recommendations for excluding HDFS components and directories from antivirus scans and monitoring.
The three primary locations you will want to exclude from antivirus are:
Data directories: These can be very large, and therefore take a long time to scan; they can also be very write-heavy, and therefore suffer performance impacts or failures if the antivirus holds up writes.
Log directories: These are write-heavy.
Scratch directories: These are internal locations used by some services for writing temporary data, and can also cause performance impacts or failures if the antivirus holds up writes.
Note: The directories in HDFS are user-configurable. I recommend you exclude these, especially the data directory for the DataNode and the meta directories for the NameNode and JournalNode. These details can be found in the “hdfs-site.xml” file:
# grep -A1 "dir" /etc/hadoop/conf/hdfs-site.xml
Consider excluding the following directories and all of their subdirectories:
Installation, Configuration, and Libraries
/usr/hdp
/etc/hadoop
/var/lib/hadoop-hdfs
Runtime and Logging
/var/run/hadoop
/var/log/hadoop
Scratch and Temp
/tmp/hadoop-hdfs
Note: HDFS, YARN, MapReduce and ZooKeeper are mutually interdependent and you are likely to experience unsatisfactory results if you fail to exclude the other components.
... View more
05-11-2020
09:16 AM
1 Kudo
Note: Cloudera does not support antivirus software of any kind.
This article contains general recommendations for excluding Ambari components and directories from antivirus scans and monitoring.
The three primary locations you will want to exclude from antivirus are:
Data directories: These can be very large, and therefore take a long time to scan; they can also be very write-heavy, and therefore suffer performance impacts or failures if the AV holds up writes.
Log directories: These are write-heavy.
Scratch directories: These are internal locations used by some services for writing temporary data, and can also cause performance impacts or failures if the AV holds up writes.
Note: Ambari has a special requirement in the form of a user-configurable database. I recommend you exclude this database. However, the details of this database are set on installation; the database may be colocated with ambari-server, or on a remote host. Consult with your database administrators for details on the path where the database information is stored; Ambari does not keep this information anywhere in its configuration. If you need details about which database Ambari is using, search for JDBC in the “amber.properties” file.
# grep 'jdbc' /etc/ambari-server/conf/ambari.properties
Consider excluding the following directories and all of their subdirectories:
Installation, Configuration, and Libraries
/usr/hdp
/usr/lib/ambari-agent
/usr/lib/ambari-server
/etc/hadoop
/etc/ambari-agent
/etc/ambari-server
/var/lib/ambari-agent
/var/lib/ambari-server
Runtime and Logging
/var/run/ambari-agent
/var/run/ambari-server
/var/log/ambari-agent
/var/log/ambari-server
... View more
04-30-2020
12:21 AM
Note: Cloudera does not support antivirus software of any kind. This article contains generic recommendations for excluding HDP components and directories from AV scans and monitoring. It is important to note that these recommendations do not apply to each service, and further, some services will have additional items to exclude which are unique to them. These details will be addressed in individual articles dedicated to the service in question. The three primary locations you will want to exclude from antivirus are: Data directories: These can be very large, and therefore take a long time to scan; they can also be very write-heavy, and therefore suffer performance impacts or failures if the AV holds up writes. Log directories: These are write-heavy. Scratch directories: These are internal locations used by some services for writing temporary data, and can also cause performance impacts or failures if the AV holds up writes. Consider excluding the following directories and all of their subdirectories: Installation, Configuration, and Libraries /hadoop /usr/hdp /etc/hadoop /etc/<component> /var/lib/<component> Runtime and Logging /var/run/<component> /var/log/<component> Scratch and Temp /var/tmp/<component> /tmp/<component> Note: The <component> does not only refer to the service name, as a given service may have multiple daemons with their own directories. Example: ambari-agent and ambari-server. Across HDP services there are also many user-configurable locations. Most of these can be found in Ambari properties with names like "service.scratch.dir" and "service.data.dir"; go to Ambari > Service > Configs > Advanced and search for any property containing "dir", all of which may be considered for exclusion.
... View more
04-16-2020
09:20 AM
You are correct; the drivers are built for each platform. The HDP downloads page is here: https://www.cloudera.com/downloads/hdp.html It contains the JDBC41 driver for Hive.
... View more
04-16-2020
08:48 AM
You are correct; stoping a service is not the same as a service crashing. Alerts generally do not cover intentional administrator activity like starting and stopping of services. However, you do still have access to this information; starting and stopping of services are covered under Events, of the Audit type: https://docs.cloudera.com/documentation/enterprise/6/6.3/topics/cm_dg_events.html#cmug_topic_10 The AUDIT_EVENT type covers actions performed. This is also where you will track configuration changes. Turning to the question of API use, here is the Cloudera Manager documentation's section on the API: https://docs.cloudera.com/documentation/enterprise/6/6.3/topics/cloudera_manager.html#concept_nsg_jq3_mz Here is the Tutorial linked from that doc, which has a ton of examples, including starting and stopping of services: https://archive.cloudera.com/cm6/6.3.0/generic/jar/cm_api/apidocs/tutorial.html While the Alerts don't tell you when services are started and stopped, you can query Events through the API. We have a Knowledge Base Article on the subject: https://my.cloudera.com/knowledge/Accessing-Critical-Events-Using-the-Cloudera-Manager-API-?id=72521
... View more
04-16-2020
08:10 AM
To my knowledge, what the cluster uses to check NPT is the following command: # ntpdc -np This is used in the health checks for the hosts. This is included in the docs under the Host Health Tests. For example in CDH 5.16.x: https://docs.cloudera.com/documentation/enterprise/5-16-x/topics/cm_ht_host.html#concept_lxn_zxn_yk You might also consider Chrony over NTP. From a previous discussion I had with a subject matter expert at Red Hat, these are the t hings chronyd can do better than ntpd: - chronyd can work well in an environment where access to the time reference is intermittent, whereas ntpd needs regular polling of time reference to work well. - chronyd can perform well even when the network is congested for longer periods of time. - chronyd can usually synchronize the clock faster and with better accuracy. - chronyd quickly adapts to sudden changes in the rate of the clock, for example, due to changes in the temperature of the crystal oscillator, whereas ntpd may need a long time to settle down again. - In the default configuration, chronyd never steps the time after the clock has been synchronized at system start, in order not to upset other running programs. ntpd can be configured to never step the time too, but it has to use a different means of adjusting the clock, which has some disadvantages including negative effect on accuracy of the clock. - chronyd can adjust the rate of the clock on a Linux system in a larger range, which allows it to operate even on machines with a broken or unstable clock. For example, on some virtual machines. - chronyd is smaller, it uses less memory and it wakes up the CPU only when necessary, which is better for power saving. I think the intermittent connection and virtual-machine specific details might apply to your use case. Their relevant documentation: Differences Between ntpd and chronyd https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html-single/system_administrators_guide/index#sect-differences_between_ntpd_and_chronyd UNDERSTANDING CHRONY AND ITS CONFIGURATION https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/system_administrators_guide/sect-understanding_chrony_and-its_configuration Note that you will have to disable ntpd, because the cluster checks for ntpd first and chronyd second. Regards, Ryan Blough, COE Cloudera Inc.
... View more
08-23-2019
12:26 PM
Very helpful! It may be implied by " Later as you have already deployed the cluster we need to reset the cluster," but I had to add an additional step of removing the empty baseurl repositories that had been configured on the hosts. If this step isn't done, the error continues to appear because the repo configuration is still left over from the failed attempt. This can be done two ways: # yum remove HDP-3.1-repo-1 # yum clean all And also by manually removing the repo file from `/etc/yum.repos.d/' and then running # yum clean all at least for the RHEL 7 hosts we were on.
... View more