Member since
06-28-2017
279
Posts
43
Kudos Received
24
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1983 | 12-24-2018 08:34 AM | |
5360 | 12-24-2018 08:21 AM | |
2216 | 08-23-2018 07:09 AM | |
9703 | 08-21-2018 05:50 PM | |
5140 | 08-20-2018 10:59 AM |
02-14-2018
11:57 AM
Not sure what your plan is? If you decommission a data node avoiding that the rebalancing takes places could lead to data loss. For sure it will leave some file chunks without any redundant storage. So either you are able to delete some data on your HDFS to allow the rebalancing to suceed, or you add some capacity (i.e. with a new temporary node) to hdfs before decommissioning the data node.
... View more
02-09-2018
06:10 PM
I tried some things, after changing permissions on the hdfs trash and cleaning up again the dirs as per https://community.hortonworks.com/questions/121137/ambari-metrics-collector-restarting-again-and-agai.html I have been able to start the ambari metrics collector and it looks like it is running continuously now. Still when I turn off maintenance mode, I get the alert back Connection failed:[Errno111]Connection refused to cgihdp4.localnet:6188 As far as I know 6188 is the port of the timeline server. When checking this, the timeline server service is not even installed on the cgihdp4, but is up and running on cgihdp1. So I searched for the config of the timeline server, which is in Ambari below the section Advanced ams-site -> timeline.metrics.service.webapp.address, and the address mentioned there is non surprisingly cgihdp4.localnet:6188, changed this to cgihdp1.localnet:6188, restarted the metrics collector and things are running smoothly. So basically just a stupid config error, embarassing, but many thanks @Jay Kumar SenSharma for supporting me on this issue.
... View more
02-09-2018
02:58 PM
in Ambari, go to the host details, and there you can click on the button right to the 'Datanode HDFS' service line .screenshot-decomission.png You should turn on maintenance mode before to avoid alerts.
... View more
02-09-2018
02:54 PM
for the trash dir, try also to execute the command without the / at the end.
... View more
02-09-2018
02:48 PM
One question: have you performed an upgrade of HDFS? You may also want to check with: hdfs fsck / -includeSnapshots
... View more
02-09-2018
02:41 PM
the message could be caused by a process still or already accessing the file. Try to check if this is the case by: lsof | grep /opt/app/data11/hadoop/hdfs/data/current/BP-441779837-135.208.32.109-1458040734038 The first three columns are: command process id user If there is a process locking the file, this should help you to identify it.
... View more
02-09-2018
02:22 PM
1 Kudo
if you want to see the usage within dfs, this should provide you with the disk usage: hdfs dfs -du -h / To see the size of the trash dir use this command: hdfs dfs -du -h To add new disk (in the normal mode), you typically decommission the data node service on the worker node, add the disk and decommision again, but the HDFS will try to replicate the blocks from that node to the other nodes to avoid data loss. I'm not sure if an already full hdfs will cause errors here. Can you try to (temporary) add nodes? This will add hdfs capacity, with that the decommissioning of one node should be ok, providing you a way to increase the local disk capacity. Not sure if the rebalancing needs to be triggered manually, I believe it will start automatically (causing during that time additional load on the nodes).
... View more
02-09-2018
12:45 PM
It allows you to run brokers of different versions in one cluster, to avoid a downtime of the cluster during the upgrade. Before you upgrade any broker, you set the inter.broker.protocol.version to the existing version on all brokers. Then you start upgrading broker by broker, the newer broker will still use the 'old' protocol to communicate with the other brokers. This keeps the cluster functional during the time when just some brokers are updated. Once all brokers are upgraded, you change the inter.broker.protocol.version to the new version and restart them one by one. More details here: https://kafka.apache.org/documentation/#upgrade
... View more
02-04-2018
01:39 PM
1 Kudo
@Jay Kumar SenSharma : Thanks for your answer, I simply wasn't aware that the process will change directory permissions, the only reason i used root to start it was that I tried to make sure that any issue I experience wasn't due to lacking permissions. In the meantime the service has stopped itself: [root@cgihdp4 ~]# ambari-metrics-collector status
AMS is not running.
[root@cgihdp4 ~]# su - ams
[ams@cgihdp4 ~]$ ambari-metrics-collector status
AMS is not running.
[ams@cgihdp4 ~]$ ambari-metrics-collector start
tee: /var/log/ambari-metrics-collector/ambari-metrics-collector-startup.out: Permission denied
Sun Feb 4 14:31:19 CET 2018 Starting HBase.
tee: /var/log/ambari-metrics-collector/ambari-metrics-collector-startup.out: Permission denied
master is running as process 23182. Continuing
master running as process 23182. Stop it first.
tee: /var/log/ambari-metrics-collector/ambari-metrics-collector-startup.out: Permission denied
Verifying ambari-metrics-collector process status...
Sun Feb 4 14:31:21 CET 2018 Collector successfully started.
Sun Feb 4 14:31:21 CET 2018 Initializing Ambari Metrics data model
...
[ams@cgihdp4 ~]$ ambari-metrics-collector status
AMS is running as process 22414.
I guess the permission denied is caused by what you just pointed out, so I will change this again, but I am confused about 'master is running as process 23182', which is the Hbase Master, running with user 'ams', but does it indicate an issue now? Otherwise nothing changed now, still no process listening to port 6188
... View more