Member since 
    
	
		
		
		06-28-2017
	
	
	
	
	
	
	
	
	
	
	
	
	
	
			
      
                279
            
            
                Posts
            
        
                43
            
            
                Kudos Received
            
        
                24
            
            
                Solutions
            
        My Accepted Solutions
| Title | Views | Posted | 
|---|---|---|
| 2564 | 12-24-2018 08:34 AM | |
| 6357 | 12-24-2018 08:21 AM | |
| 2949 | 08-23-2018 07:09 AM | |
| 11949 | 08-21-2018 05:50 PM | |
| 6183 | 08-20-2018 10:59 AM | 
			
    
	
		
		
		02-14-2018
	
		
		11:57 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Not sure what your plan is? If you decommission a data node avoiding that the rebalancing takes places could lead to data loss. For sure it will leave some file chunks without any redundant storage. So either you are able to delete some data on your HDFS to allow the rebalancing to suceed, or you add some capacity (i.e. with a new temporary node) to hdfs before decommissioning the data node. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		02-09-2018
	
		
		06:10 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 I tried some things, after changing permissions on the hdfs trash and cleaning up again the dirs as per https://community.hortonworks.com/questions/121137/ambari-metrics-collector-restarting-again-and-agai.html    I have been able to start the ambari metrics collector and it looks like it is running continuously now. Still when I turn off maintenance mode, I get the alert back  Connection failed:[Errno111]Connection refused to cgihdp4.localnet:6188  As far as I know 6188 is the port of the timeline server. When checking this, the timeline server service is not even installed on the cgihdp4, but is up and running on cgihdp1. So I searched for the config of the timeline server, which is in Ambari below the section Advanced ams-site -> timeline.metrics.service.webapp.address, and the address mentioned there is non surprisingly cgihdp4.localnet:6188, changed this to cgihdp1.localnet:6188, restarted the metrics collector and things are running smoothly.  So basically just a stupid config error, embarassing, but many thanks @Jay Kumar SenSharma for supporting me on this issue.  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		02-09-2018
	
		
		02:58 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 in Ambari, go to the host details, and there you can click on the button right to the 'Datanode HDFS' service line .screenshot-decomission.png  You should turn on maintenance mode before to avoid alerts. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		02-09-2018
	
		
		02:54 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 for the trash dir, try also to execute the command without the / at the end. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		02-09-2018
	
		
		02:48 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 One question: have you performed an upgrade of HDFS?   You may also want to check with:  hdfs fsck / -includeSnapshots   
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		02-09-2018
	
		
		02:41 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 the message could be caused by a process still or already accessing the file. Try to check if this is the case by:  lsof | grep /opt/app/data11/hadoop/hdfs/data/current/BP-441779837-135.208.32.109-1458040734038  The first three columns are:   command  process id  user   If there is a process locking the file, this should help you to identify it. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		02-09-2018
	
		
		02:22 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 if you want to see the usage within dfs, this should provide you with the disk usage:  hdfs dfs -du -h /  To see the size of the trash dir use this command:  hdfs dfs -du -h  To add new disk (in the normal mode), you typically decommission the data node service on the worker node, add the disk and decommision again, but the HDFS will try to replicate the blocks from that node to the other nodes to avoid data loss. I'm not sure if an already full hdfs will cause errors here. Can you try to (temporary) add nodes? This will add hdfs capacity, with that the decommissioning of one node should be ok, providing you a way to increase the local disk capacity.  Not sure if the rebalancing needs to be triggered manually, I believe it will start automatically (causing during that time additional load on the nodes). 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		02-09-2018
	
		
		12:45 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 It allows you to run brokers of different versions in one cluster, to avoid a downtime of the cluster during the upgrade. Before you upgrade any broker, you set the inter.broker.protocol.version to the existing version on all brokers. Then you start upgrading broker by broker, the newer broker will still use the 'old' protocol to communicate with the other brokers. This keeps the cluster functional during the time when just some brokers are updated.  Once all brokers are upgraded, you change the inter.broker.protocol.version to the new version and restart them one by one.  More details here: https://kafka.apache.org/documentation/#upgrade 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		02-04-2018
	
		
		01:39 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 @Jay Kumar SenSharma : Thanks for your answer, I simply wasn't aware that the process will change directory permissions, the only reason i used root to start it was that I tried to make sure that any issue I experience wasn't due to lacking permissions.  In the meantime the service has stopped itself:  [root@cgihdp4 ~]# ambari-metrics-collector status
AMS is not running.
[root@cgihdp4 ~]# su - ams
[ams@cgihdp4 ~]$ ambari-metrics-collector status
AMS is not running.
[ams@cgihdp4 ~]$ ambari-metrics-collector start
tee: /var/log/ambari-metrics-collector/ambari-metrics-collector-startup.out: Permission denied
Sun Feb  4 14:31:19 CET 2018 Starting HBase.
tee: /var/log/ambari-metrics-collector/ambari-metrics-collector-startup.out: Permission denied
master is running as process 23182. Continuing
master running as process 23182. Stop it first.
tee: /var/log/ambari-metrics-collector/ambari-metrics-collector-startup.out: Permission denied
Verifying ambari-metrics-collector process status...
Sun Feb  4 14:31:21 CET 2018 Collector successfully started.
Sun Feb  4 14:31:21 CET 2018 Initializing Ambari Metrics data model
...
[ams@cgihdp4 ~]$ ambari-metrics-collector status
AMS is running as process 22414.
    I guess the permission denied is caused by what you just pointed out, so I will change this again, but I am confused about 'master is running as process 23182', which is the Hbase Master, running with user 'ams', but does it indicate an issue now? Otherwise nothing changed now, still no process listening to port 6188 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		 
        













