Member since
03-14-2016
4721
Posts
1111
Kudos Received
874
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 2729 | 04-27-2020 03:48 AM | |
| 5287 | 04-26-2020 06:18 PM | |
| 4455 | 04-26-2020 06:05 PM | |
| 3577 | 04-13-2020 08:53 PM | |
| 5380 | 03-31-2020 02:10 AM |
04-11-2018
09:55 AM
1 Kudo
@Michael Bronson Yes, the AMS basically posts a dummy metrics to the collector with start & End time as mentioned int eh script (which is a relative time) so time difference can be a valid reason for failing service checks. get_metrics_parameters = {
"metricNames": "AMBARI_METRICS.SmokeTest.FakeMetric",
"appId": "amssmoketestfake",
"hostname": params.hostname,
"startTime": current_time - 60000,
"endTime": current_time + 61000,
"precision": "seconds",
"grouped": "false",
} https://github.com/apache/ambari/blob/release-2.6.1/ambari-server/src/main/resources/common-services/AMBARI_METRICS/0.1.0/package/scripts/service_check.py#L83-L84 And https://github.com/apache/ambari/blob/release-2.6.1/ambari-server/src/main/resources/common-services/AMBARI_METRICS/0.1.0/package/scripts/service_check.py#L121-L123
... View more
04-11-2018
09:04 AM
@Michael Bronson Also please share the output of the following command from AMS collector host and few of the cluster nodes to verify if the AMS service version is same as Ambari Binary version or not? # rpm -qa | grep ambari .
... View more
04-11-2018
09:01 AM
@Michael Bronson Few checks will be good to perform: 1. To verify is ambari is showing false Running AMS services? Please check the AMS collector hosts if the AMS collector is actually running and listening to the correct address/port: # netstat -tnlpa | grep 6188
# hostname -f 2. Please verify if the AMS process PID are matching with the PIDs that are listed int he following files. Some times not matching PIDs causes false info in the UI $ ps -ef | grep ^ams | grep ApplicationHistoryServer
$ cat /var/run/ambari-metrics-collector/ambari-metrics-collector.pid
$ ps -ef | grep ^ams | grep HMaster
$ cat /var/run/ambari-metrics-collector/hbase-ams-master.pid . 3. Can you try restarting the AMS collector Service once to see if you notice any error int he AMS collector logs? .
... View more
04-11-2018
08:51 AM
1 Kudo
@Michael Bronson Yes, NTPD is one of the major requirement to make all the cluster nodes in sync. THis is specially very important when we run Kerberized cluster because the tgt has a direct link with the time. (which is ususlly known as "Clock Skew Error") This is one of thew prerequisite of Cluster preparation: https://docs.hortonworks.com/HDPDocuments/Ambari-2.6.1.5/bk_ambari-installation-ppc/content/enable_ntp_on_the_cluster_and_on_the_browser_host.html The clocks of all the nodes in your cluster and the machine that runs the browser through which you access the Ambari Web interface must be able to synchronize with each other. .
... View more
04-11-2018
08:48 AM
@Amine
Ichlibitiche
Sometimes it can happen if the oozie shared lib is not updated properly. # oozie admin -oozie http://<oozie-server>:11000/oozie -sharelibupdate Something similar i see here which might help: https://community.hortonworks.com/articles/68765/oozie-job-failed-with-an-error-hive-sitexml-permis.html Also is this a kerberized cluster? Can you please share the complete stacktrace? .
... View more
04-11-2018
06:34 AM
@Mudassar Hussain Can you please try this: 1. Instead of copying the "authorized_key" file using SCP try the following command from ambari server host. Please make sure that you are using the correct FQDN of node4 in the following command. From Ambari Server Host # ssh-copy-id -i ~/.ssh/id_rsa.pub root@node4 . 2. Now on the host (node4) check if the FQDN is set correctly? On Node4 (check if it has ambari.repo file and it's contents are fine) # hostname -f
# cat /etc/hosts
# cat /etc/yum.repos.d/ambari.repo
# yum install ambari-agent -y . Now try from ambari UI again. If it still does not work then try the following approach Then check if the following command shows the correct ambari Hostname in the "ambari-agent.ini" file? Example: # grep -A 1 '\[server\]' /etc/ambari-agent/conf/ambari-agent.ini
[server]
hostname=ambari1.example.com . If not then edit it then try again from ambari.
... View more
04-10-2018
06:24 AM
@Mohammad Shadab Please provide us exacxt information about which version of Hadoop are you using? From where have you downloaded the Hadoop Binaries (from Apache / Hortonworks repos). The Hadoop commands are standard commands .. like "hdfs dfs -ls /" and should remain same in different os binaries... however if you are looking out for scripts that we use to start/stop /manage the hadoop on Windows/Unix then it might difffer. But if you are concerned anout any specific command then please let us know.
... View more
04-09-2018
11:56 PM
@Mohammad Shadab Windows Operating System is not mentioned in the Tested & Certified Support Matrix list here: https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.4/bk_support-matrices/content/ch01.html So it is better to be on the supported versions. If you want to use apache hadoop community version then you should refer to : https://wiki.apache.org/hadoop/Hadoop2OnWindows (although the link seems to be to old and outdated) Better will be to use the Linux based OS instead of Windows for running hadoop components.
... View more
04-09-2018
11:27 PM
@Anurag Mishra Similarly for HBase processes you can run the following command to find the Uptime of HBase Master & HBase Region Server. # ps -ef | grep `cat /var/run/hbase/hbase-hbase-master.pid`| awk 'NR==1{print $5 " - " $7}'
# ps -ef | grep `cat /var/run/hbase/hbase-hbase-regionserver.pid`| awk 'NR==1{print $5 " - " $7}' .
... View more
04-09-2018
11:24 PM
1 Kudo
@Anurag Mishra The HDFS Service is a combination of various components like NameNoides, DataNodes, ZKFC, JournalNodes ..etc So the Service Up time really does not make much sense because in a Service 90% components might be UP but some of them (10%) might be down ... So i will suggest you to check the "Service Component" UP time instead of finding the "Service" up time. The easiest option to capture the "Service Component" uptime will be to see the "uptime" field in the following command for the component: For example finding DataNode Uptime: # ps -ef | grep `cat /var/run/hadoop/hdfs/hadoop-hdfs-datanode.pid` | awk 'NR==1{print $5 " - " $7}'
Mar02 - 03:43:47
. Similarly you can run the following commands to find NameNode, JournalNode Uptime ...etc # ps -ef | grep `cat /var/run/hadoop/hdfs/hadoop-hdfs-journalnode.pid` | awk 'NR==1{print $5 " - " $7}'
# ps -ef | grep `cat /var/run/hadoop/hdfs/hadoop-hdfs-namenode.pid` | awk 'NR==1{print $5 " - " $7}'
. The output of Column 5 and column 7 shows the Uptime (date & time) Example: # ps -ef | grep `cat /var/run/hadoop/hdfs/hadoop-hdfs-namenode.pid`
root 519 31846 0 23:23 pts/0 00:00:00 grep --color=auto 17262
hdfs 17262 1 0 Mar02 ? 07:55:29 /usr/jdk64/jdk1.8.0_112/bin/java -Dproc_namenode -Xmx1024m -Dhdp.version=2.6.1.0-129 -Djava.net.preferIPv4Stack=true -Dhdp.version= -Djava.net.preferIPv4Stack=true -Dhdp.version= -Djava.net.preferIPv4Stack=true -Dhadoop.log.dir=/var/log/hadoop/hdfs..........
-Dhadoop.security.logger=INFO,RFAS org.apache.hadoop.hdfs.server.namenode.NameNode .
... View more