About arald

arald · ‎02-14-2018

Not sure what your plan is? If you decommission a data node avoiding that the rebalancing takes places could lead to data loss. For sure it will leave some file chunks without any redundant storage. So either you are able to delete some data on your HDFS to allow the rebalancing to suceed, or you add some capacity (i.e. with a new temporary node) to hdfs before decommissioning the data node.

arald · ‎02-09-2018

I tried some things, after changing permissions on the hdfs trash and cleaning up again the dirs as per https://community.hortonworks.com/questions/121137/ambari-metrics-collector-restarting-again-and-agai.html I have been able to start the ambari metrics collector and it looks like it is running continuously now. Still when I turn off maintenance mode, I get the alert back Connection failed:[Errno111]Connection refused to cgihdp4.localnet:6188 As far as I know 6188 is the port of the timeline server. When checking this, the timeline server service is not even installed on the cgihdp4, but is up and running on cgihdp1. So I searched for the config of the timeline server, which is in Ambari below the section Advanced ams-site -> timeline.metrics.service.webapp.address, and the address mentioned there is non surprisingly cgihdp4.localnet:6188, changed this to cgihdp1.localnet:6188, restarted the metrics collector and things are running smoothly. So basically just a stupid config error, embarassing, but many thanks @Jay Kumar SenSharma for supporting me on this issue.

arald · ‎02-09-2018

To delete /spark2-history: hdfs dfs -rm -r /spark2-history/*

arald · ‎02-09-2018

in Ambari, go to the host details, and there you can click on the button right to the 'Datanode HDFS' service line .screenshot-decomission.png You should turn on maintenance mode before to avoid alerts.

arald · ‎02-09-2018

for the trash dir, try also to execute the command without the / at the end.

arald · ‎02-09-2018

One question: have you performed an upgrade of HDFS? You may also want to check with: hdfs fsck / -includeSnapshots

arald · ‎02-09-2018

the message could be caused by a process still or already accessing the file. Try to check if this is the case by: lsof | grep /opt/app/data11/hadoop/hdfs/data/current/BP-441779837-135.208.32.109-1458040734038 The first three columns are: command process id user If there is a process locking the file, this should help you to identify it.

arald · ‎02-09-2018

if you want to see the usage within dfs, this should provide you with the disk usage: hdfs dfs -du -h / To see the size of the trash dir use this command: hdfs dfs -du -h To add new disk (in the normal mode), you typically decommission the data node service on the worker node, add the disk and decommision again, but the HDFS will try to replicate the blocks from that node to the other nodes to avoid data loss. I'm not sure if an already full hdfs will cause errors here. Can you try to (temporary) add nodes? This will add hdfs capacity, with that the decommissioning of one node should be ok, providing you a way to increase the local disk capacity. Not sure if the rebalancing needs to be triggered manually, I believe it will start automatically (causing during that time additional load on the nodes).

arald · ‎02-09-2018

It allows you to run brokers of different versions in one cluster, to avoid a downtime of the cluster during the upgrade. Before you upgrade any broker, you set the inter.broker.protocol.version to the existing version on all brokers. Then you start upgrading broker by broker, the newer broker will still use the 'old' protocol to communicate with the other brokers. This keeps the cluster functional during the time when just some brokers are updated. Once all brokers are upgraded, you change the inter.broker.protocol.version to the new version and restart them one by one. More details here: https://kafka.apache.org/documentation/#upgrade

arald · ‎02-04-2018

@Jay Kumar SenSharma : Thanks for your answer, I simply wasn't aware that the process will change directory permissions, the only reason i used root to start it was that I tried to make sure that any issue I experience wasn't due to lacking permissions. In the meantime the service has stopped itself: [root@cgihdp4 ~]# ambari-metrics-collector status AMS is not running. [root@cgihdp4 ~]# su - ams [ams@cgihdp4 ~]$ ambari-metrics-collector status AMS is not running. [ams@cgihdp4 ~]$ ambari-metrics-collector start tee: /var/log/ambari-metrics-collector/ambari-metrics-collector-startup.out: Permission denied Sun Feb 4 14:31:19 CET 2018 Starting HBase. tee: /var/log/ambari-metrics-collector/ambari-metrics-collector-startup.out: Permission denied master is running as process 23182. Continuing master running as process 23182. Stop it first. tee: /var/log/ambari-metrics-collector/ambari-metrics-collector-startup.out: Permission denied Verifying ambari-metrics-collector process status... Sun Feb 4 14:31:21 CET 2018 Collector successfully started. Sun Feb 4 14:31:21 CET 2018 Initializing Ambari Metrics data model ... [ams@cgihdp4 ~]$ ambari-metrics-collector status AMS is running as process 22414. I guess the permission denied is caused by what you just pointed out, so I will change this again, but I am confused about 'master is running as process 23182', which is the Hbase Master, running with user 'ams', but does it indicate an issue now? Otherwise nothing changed now, still no process listening to port 6188

Online	Offline
Last Visited	‎08-19-2019 03:23 AM

Member Since	‎06-28-2017 06:04 AM
Last Visited	‎08-19-2019 03:23 AM
Posts	279
Kudos received	43

Cloudera Community

Re: secured nifi cluster must import a cert to bro...

Re: Nifi Epoch conversion not working?

Re: Scenario when we store data in HBase and acce...

Re: Setup environment variables in NiFi cluster se...

Re: CREATE EXTERNAL HIVE TABLE on existing HBASE T...

Re: HortonWorks Cloudbreak default HDFS as Azure W...

Re: Ambari Metrics collector alert

Re: HDFS disk usage is 100%

Re: HDFS disk usage is 100%

Re: HDFS disk usage is 100%

Re: DataNode stopped and not starting now with - F...

Re: DataNode stopped and not starting now with - F...

Re: HDFS disk usage is 100%

Re: inter.broker.protocol.version in kafka

Re: Ambari Metrics collector alert