Created 06-05-2017 02:26 AM
Amabari installation failed, the log file -->Installing package ambari-metrics-collector ('/usr/bin/yum -d 0 -e 0 -y install ambari-metrics-collector') command failed
on my ambari server, I dont have this package at all, anyhow, I copied HDP.repo, HDP-UTILS.repo to all data nodes then try to manually install ambari-metrics-collector, but I got following:
/usr/bin/yum -d 0 -e 0 -y install ambari-metrics-collector Error: Nothing to do
/usr/bin/yum -d 0 -e 0 -y install ambari-metrics-collector Error: Nothing to do
these errors repeatedly stop my installation. and my repo was downloaded as:
1. I used this repo -->http://public-repo-1.hortonworks.com/ambari/centos6/2.x/updates/2.4.0.1/ambari.repo
2. I used AWS/EC2 m4.large with 20 gb for Root dir and 100 EBS(magnetic), I ignored the metrics_collector_heap_size warnings, because if I following the recommended size to modify them, it would come up with the second recommended size, I even go as far as 7 times edit these numbers, never done with it.
3. Any one had the same issue like mine? if the storage setting here matters? what is a good combination of these metrics_collector_heap_size, xmn_size and master_heap_size?
4. the type of instance are different, different RAM, these heap_size will depends, what to follow? isnt this HDP job?
thank so much for you help.
Created 06-05-2017 04:44 AM
1. Are you running AMS in embedded mode or in external mode?
2. What are the errors you see in the ambari-metrics-collector.log file when you are trying to start it?
3. In your attached screenshot we see NameNode UI, HBaseMaster, NodeManager related critical alerts that might not be directly related to the AMS startup issue. But if in order to see what is going wrong we can take a look on those components logs as well.
4. Do you have sufficient memory on your Host where you are running these processes?
# free -m # lsof -p $PID
.
5. All the hosts are configured with correct FQDN ? Means the "hostname -f" command output should be resolvable from each clusternodes.
# hostname -f
.
6. Also can you please check if the Hostname & Port mentioned in the critical alert mentioned in the screen shot are opened or not? Or if there is any firewall issue that is blocking the port access.
# telnet ip-172-31-1-92 $PORT
.
Created 06-05-2017 02:33 AM
The following message shows that you already have the AMS binaries installed on the mentioned host.
/usr/bin/yum -d 0 -e 0 -y install ambari-metrics-collector Error: Nothing to do
.
"ambari-metrics-collector" RPM is part of ambari repo (not the HDP repo). So please check:
# rpm -qa | grep ambari-metrics # cat /etc/yum.repos.d/ambari.repo
For the "metrics_collector_heapsize" and "hbase_master_heapsize" the tuning configurations are mentioned in the following link:
https://cwiki.apache.org/confluence/display/AMBARI/Configurations+-+Tuning
.
Created 06-05-2017 02:50 AM
Thank you so much, let me try it and get back to you in couple of hours.
Created 06-05-2017 04:12 AM
I used the following to setup these recommended size I found in this community help
ams-env :: collector_heapsize = 2048
ams-hbase-env :: hbase_master_heapsize = 512
ams-hbase-env :: hbase_master_xmn = 102
ams-hbase-env :: hbase_regionserver_heapsize = 4096
ams-hbase-env :: regionserver_xmn_size = 512
Created 06-05-2017 04:15 AM
I only can upload one of my error file. but most of them are the same, connection failed...and seemed ambari-metrics never can be started. Only ZK is up. so please help.
thank you very much.
Created 06-05-2017 04:44 AM
1. Are you running AMS in embedded mode or in external mode?
2. What are the errors you see in the ambari-metrics-collector.log file when you are trying to start it?
3. In your attached screenshot we see NameNode UI, HBaseMaster, NodeManager related critical alerts that might not be directly related to the AMS startup issue. But if in order to see what is going wrong we can take a look on those components logs as well.
4. Do you have sufficient memory on your Host where you are running these processes?
# free -m # lsof -p $PID
.
5. All the hosts are configured with correct FQDN ? Means the "hostname -f" command output should be resolvable from each clusternodes.
# hostname -f
.
6. Also can you please check if the Hostname & Port mentioned in the critical alert mentioned in the screen shot are opened or not? Or if there is any firewall issue that is blocking the port access.
# telnet ip-172-31-1-92 $PORT
.
Created 06-05-2017 11:17 AM
you are right, Jay. the problem is port connection refused. however, my /tmp is 777, so what to do from here.
[root@ip-172-31-12-243 /]# free -m
total used free shared buffers cached Mem: 16077 390 15687 3 81 74 -/+ buffers/cache: 234 15843 Swap: 0 0 0
[root@ip-172-31-12-243 /]# hostname -f
ip-172-31-12-243.ca-central-1.compute.internal
telnet ip-172-31-1-92.ca-central-1.compute.internal 3000
Trying 52.60.67.105... telnet: connect to address 52.60.67.105: Connection refused
[root@ip-172-31-12-243 /]# telnet ip-172-31-1-92.ca-central-1.compute.internal 16000
Trying 52.60.67.105... telnet: connect to address 52.60.67.105: Connection refused
thank you so much for helping here.
Created 06-05-2017 11:31 AM
605-hdp-error.txt on one of the datanode. no ambari-metrics-collector.log on ambari server.
this is metrics log file. I only have 20g for root dir /, no EBS storage assigned here
Created 06-05-2017 11:36 AM
Looks like your AMS HBase configurations are missing or not correct. Are you using Embedded Mode AMS ?
Do you see the following files here:
# ls -lart /etc/ambari-metrics-collector/conf/ total 36 drwxr-xr-x. 3 root root 4096 Aug 18 2016 .. -rw-r--r--. 1 ams hadoop 7868 Aug 18 2016 ams-site.xml -rw-r--r--. 1 ams hadoop 1000 Aug 18 2016 ssl-server.xml -rw-r--r--. 1 ams hadoop 6081 Sep 20 2016 hbase-site.xml drwxr-xr-x. 2 ams hadoop 4096 Nov 23 2016 . -rw-r--r--. 1 ams hadoop 1319 Apr 4 13:38 log4j.properties -rw-r--r--. 1 ams hadoop 1283 Apr 4 13:38 ams-env.sh
.
It will be easy to reinstall the AMS if you find any missing file there.
Created 06-05-2017 01:42 PM
on my ambari server and datanode, both has no ambari-metrics-collector found.
[root@ip-172-31-1-92 etc]# ls -lart /etc/ambari-metrics-collector
ls: cannot access /etc/ambari-metrics-collector: No such file or directory
[root@ip-172-31-12-243 etc]# ls -lart /etc/ambari-metrics-collector ls: cannot access /etc/ambari-metrics-collector: No such file or directory
1. when add volume 100gb or 800gb, I still have the same connection problem
2. this time I didnt add new volume to EBS, only 20gb for root /.
3. I reinstall more than 10 time, always the same issue here.
so do you think my security settings is the problem, here is my security group info,
all TCP, HTTP all ports, all UDP and all traffice with anywhere access. this is wide open security.
Do you think the adding ambari id_rsa.pub to authorized_keys on all servers for passwdless is enough?
From ambari server, I can passweless ssh to all other servers including itself, do you think I need to make passwdless between all nodes?
thank you for your help.