Member since 
    
	
		
		
		04-08-2016
	
	
	
	
	
	
	
	
	
	
	
	
	
	
			
      
                29
            
            
                Posts
            
        
                2
            
            
                Kudos Received
            
        
                5
            
            
                Solutions
            
        My Accepted Solutions
| Title | Views | Posted | 
|---|---|---|
| 3019 | 12-08-2017 02:19 PM | |
| 14546 | 01-04-2017 02:58 PM | |
| 10221 | 12-08-2016 07:14 AM | |
| 7539 | 12-08-2016 07:12 AM | |
| 5809 | 06-14-2016 07:38 AM | 
			
    
	
		
		
		05-21-2018
	
		
		07:21 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Yes!     There was a snapshot.     Thank you! 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		05-18-2018
	
		
		03:12 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 CDH 5.13.1  Redhat 6.9    We wish to change the number of replications from the default of 3 copies to 2 on one particular folder in hdfs.    After running this on one cluster:    $ hdfs dfs -setrep -R 2 /backups    and then doing a    $ hdfs dfs -du /    we saw that it freed the blocks very quickly and the output of fsck shows no "Over-replicated blocks":    Status: HEALTHY   Total size:    149514016589 B   Total dirs:    27440   Total files:    128746   Total symlinks:        0   Total blocks (validated):    126355 (avg. block size 1183285 B)   Minimally replicated blocks:    126355 (100.0 %)   Over-replicated blocks:    0 (0.0 %)   Under-replicated blocks:    0 (0.0 %)   Mis-replicated blocks:        0 (0.0 %)   Default replication factor:    3   Average block replication:    2.3367577   Corrupt blocks:        0   Missing replicas:        0 (0.0 %)   Number of data-nodes:        3   Number of racks:        1      However on a bigger test system did the same command and even a day later still no change.    $ hdfs fsck /    shows "Over-replicated blocks"    Status: HEALTHY   Total size:    56614841380 B   Total dirs:    7222   Total files:    113731   Total symlinks:        0   Total blocks (validated):    110143 (avg. block size 514012 B)   Minimally replicated blocks:    110143 (100.0 %)   Over-replicated blocks:    37439 (33.991264 %)   Under-replicated blocks:    0 (0.0 %)   Mis-replicated blocks:        0 (0.0 %)   Default replication factor:    3   Average block replication:    2.9921465   Corrupt blocks:        0   Missing replicas:        0 (0.0 %)   Number of data-nodes:        8   Number of racks:        1    The number of Over-replicated blocks has reduced slightly and seems stuck at 37439.    I've manually restarted each datanode service, and later restarted the entire cluster.    Still stuck at 37439.    I found this comment from Jarsh J:    |Then monitor the over-replicated blocks in Cloudera Manager via the below chart tsquery:  |  |SELECT excess_blocks WHERE roleType = NAMENODE  |  |This should show a spike and then begin a slow but steady drop back to zero over time, which you can monitor.      but when I run this query it reports "excess_blocks" is 0.      $ hdfs dfs -du /  22987202359  69376013863  /backups      shows 3 copies still.    How to get this data space cleared?    Rebalance did nothing.    thanks. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
			
	
					
			
		
	
	
	
	
				
		
	
	
- Labels:
 - 
						
							
		
			HDFS
 
			
    
	
		
		
		12-08-2017
	
		
		02:19 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 This problem is HDFS-9530 which has a fix in CDH-5.9.0.     Bouncing the DN instances cleared this issue manually until we upgrade.    
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		12-07-2017
	
		
		09:53 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Seeing this issue on all data nodes.     Example for one node:     hadoop has its own partition  bash 'du -h --max-depth=1' in hadoop partition reports 'dn' directory is consuming 207G  bash 'df -h' reports hadoop partition size 296G, used 208G,Avail 73G, Use% 75%     Configured Capacity: 314825441690 (293.20 GB)  -- good  DFS Used: 221825508284 (206.59 GB)  -- good  Non DFS Used: 55394479116 (51.59 GB)  -- ??? bash says 1G used outside of 'dn' directory in the partition  DFS Remaining: 37605454290 (35.02 GB) -- ??? bash says 73G free  DFS Used%: 70.46%  DFS Remaining%: 11.94%     fsck reports healthy     redhat 6.9  5.8.2-1.cdh5.8.2.p0.3     dfs.datanode.du.reserved == 1.96GiB      how to trouble-shoot ?     thanks.     hdfs dfsadmin -report  Configured Capacity: 1574127208450 (1.43 TB)  Present Capacity: 1277963063885 (1.16 TB)  DFS Remaining: 410632669242 (382.43 GB)  DFS Used: 867330394643 (807.76 GB)  DFS Used%: 67.87%  Under replicated blocks: 0  Blocks with corrupt replicas: 0  Missing blocks: 0  Missing blocks (with replication factor 1): 0     hdfs fsck /   Total size:    281353009325 B   Total dirs:    5236   Total files:   501295   Total symlinks:                0 (Files currently being written: 37)   Total blocks (validated):      501272 (avg. block size 561278 B)   Minimally replicated blocks:   501272 (100.0 %)   Over-replicated blocks:        0 (0.0 %)   Under-replicated blocks:       0 (0.0 %)   Mis-replicated blocks:         0 (0.0 %)   Default replication factor:    3   Average block replication:     3.0   Corrupt blocks:                0   Missing replicas:              0 (0.0 %)   Number of data-nodes:          5   Number of racks:               1 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
			
	
					
			
		
	
	
	
	
				
		
	
	
- Labels:
 - 
						
							
		
			HDFS
 
			
    
	
		
		
		01-06-2017
	
		
		09:35 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 The root cause seems to be that there are two 'textarea' boxes for the parameter 'java configuration options for nodemanager' and if these do not contain the same value, then nodemanager will not start.     these are the two boxes:     NODEMANAGER Imported From: TaskTracker (1)  NODEMANAGER Imported From: TaskTracker Default Group        Shouldn't Cloudera Manager not allow this condition to exist, or protect the user from this happening in the first place.     Thanks.     (the system at the JVM very well might be it receives an empty string for this parameter when these two do not match, just a guess) 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		01-04-2017
	
		
		02:58 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 found two values for the search "java configuration options for nodemanager"     copy / paste to make them same     (we added jmx parameters)     this seems to have fixed it.     needs verification. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		01-04-2017
	
		
		02:40 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 thanks for your reply.     we have 5 nodes configured to run NodeManager.  1 works, but 4 fails.     if the "Java Configuration Options for NodeManager" was an empty string then none should start, correct?     its not empty.     please if you have other ideas we would appreciate it.     thanks 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		12-14-2016
	
		
		09:57 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Thanks for the quick response.  # cat /etc/redhat-release  Red Hat Enterprise Linux Server release 6.7 (Santiago)  # uname -a  Linux hostname 2.6.32-642.6.2.el6.x86_64 #1 SMP Mon Oct 24 10:22:33 EDT 2016 x86_64 x86_64 x86_64 GNU/Linux  Cloudera Manager version:  Version: Cloudera Express 5.8.2 (#17 built by jenkins on 20160916-1426 git: d23c620f3a3bbd85d8511d6ebba49beaaab14b75)  CDH Parcel version:  CDH 5 5.8.2-1.cdh5.8.2.p0.3 Distributed, Activated  # cat /var/log/hadoop-yarn/hadoop-cmf-yarn-NODEMANAGER-hostname.log.out  http://pastebin.com/iu4hR03Q  ==> we assume the SIGTERM is caused by Cloudera Agent (perhaps giving up waiting on some indication NM is running properly)  # cat /var/log/cloudera-scm-agent/cloudera-scm-agent.out  http://pastebin.com/8StbBsj4  ==> there are errors in here ('ValueError: dictionary update sequence element #25 has length 1; 2 is required' and 'MainThread agent ERROR Failed to activate ')    There is no 'logs' directory within the process/*NODEMANAGER* (so no stderr to be found):  # find /var/run/cloudera-scm-agent/process | grep 'logs\|NODEMANAGER'  /var/run/cloudera-scm-agent/process/573-zookeeper-server/logs  /var/run/cloudera-scm-agent/process/573-zookeeper-server/logs/stderr.log  /var/run/cloudera-scm-agent/process/573-zookeeper-server/logs/stdout.log  /var/run/cloudera-scm-agent/process/585-hdfs-DATANODE/logs  /var/run/cloudera-scm-agent/process/585-hdfs-DATANODE/logs/stderr.log  /var/run/cloudera-scm-agent/process/585-hdfs-DATANODE/logs/stdout.log  /var/run/cloudera-scm-agent/process/593-yarn-NODEMANAGER  /var/run/cloudera-scm-agent/process/593-yarn-NODEMANAGER/cloudera-monitor.properties  /var/run/cloudera-scm-agent/process/593-yarn-NODEMANAGER/cloudera-stack-monitor.properties  /var/run/cloudera-scm-agent/process/593-yarn-NODEMANAGER/container-executor.cfg  /var/run/cloudera-scm-agent/process/593-yarn-NODEMANAGER/core-site.xml  /var/run/cloudera-scm-agent/process/593-yarn-NODEMANAGER/event-filter-rules.json  /var/run/cloudera-scm-agent/process/593-yarn-NODEMANAGER/hadoop-metrics2.properties  /var/run/cloudera-scm-agent/process/593-yarn-NODEMANAGER/hadoop-policy.xml  /var/run/cloudera-scm-agent/process/593-yarn-NODEMANAGER/hdfs-site.xml  /var/run/cloudera-scm-agent/process/593-yarn-NODEMANAGER/http-auth-signature-secret  /var/run/cloudera-scm-agent/process/593-yarn-NODEMANAGER/log4j.properties  /var/run/cloudera-scm-agent/process/593-yarn-NODEMANAGER/mapred-site.xml  /var/run/cloudera-scm-agent/process/593-yarn-NODEMANAGER/redaction-rules.json  /var/run/cloudera-scm-agent/process/593-yarn-NODEMANAGER/ssl-client.xml  /var/run/cloudera-scm-agent/process/593-yarn-NODEMANAGER/ssl-server.xml  /var/run/cloudera-scm-agent/process/593-yarn-NODEMANAGER/topology.map  /var/run/cloudera-scm-agent/process/593-yarn-NODEMANAGER/topology.py  /var/run/cloudera-scm-agent/process/593-yarn-NODEMANAGER/yarn.keytab  /var/run/cloudera-scm-agent/process/593-yarn-NODEMANAGER/yarn-site.xml  /var/run/cloudera-scm-agent/process/604-impala-IMPALAD/logs  /var/run/cloudera-scm-agent/process/604-impala-IMPALAD/logs/stderr.log  /var/run/cloudera-scm-agent/process/604-impala-IMPALAD/logs/stdout.log     So we are using Cloudera Manager.     When restarting the cluster, Yarn fails to start but all other services start OK.     When we drill into the Yarn 'instances' we find:  JobHistory Server running  ResourceManager (Active) running  ResourceManager (Standby) running  NodeManger (running)  NodeManger (stopped)  NodeManger (stopped)  NodeManger (stopped)  NodeManger (stopped)     with Status 4 "Down" and 4 "Good Health"     if we select one of the stopped instances of NodeManager, then attempt to manually start it the above pastebin logs are what we see.     there is no log directory created, and thus no stderr.     Cloudera Manager waits for it to start but eventually marks it failed.     We are planning to deploy HA to production, and this is our test run on the QA lab system.     This failure is now blocking us from proceeding with our production HA deployment.     Frankly we don't even use Yarn (or MapReduce).  At this point we only use HDFS and Impala.     Yarn seems to be a dependency for Hive and Impala.     If we are not using Yarn/MR and we can decomission these 4 failed NM instances, can the system run with a single HA pair of RM with just one instance of NM?     (it would at least make Cloudera Manger happy with green status and no failures upon cluster restarts)     Thanks. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		12-13-2016
	
		
		06:49 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 # uname -a  Linux hostname 2.6.32-642.6.2.el6.x86_64 #1 SMP Mon Oct 24 10:22:33 EDT 2016 x86_64 x86_64 x86_64 GNU/Linux     Version: Cloudera Express 5.8.2 (#17 built by jenkins on 20160916-1426 git:  d23c620f3a3bbd85d8511d6ebba49beaaab14b75)     CDH 5 5.8.2-1.cdh5.8.2.p0.3 Distributed, Activated     While reconfiguring for high availability now 4 of 5 NodeManagers won't start.     there is no stderr file.     http://pastebin.com/iu4hR03Q  http://pastebin.com/8StbBsj4     I've tried removing the roles, then re-adding the roles.     deleted all files in:  /var/lib/hadoop-yarn/yarn-nm-recovery/  /var/yarn/     confirmed owners matched working node.     no luck so far. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
			
	
					
			
		
	
	
	
	
				
		
	
	
- Labels:
 - 
						
							
		
			Apache YARN
 
			
    
	
		
		
		12-08-2016
	
		
		07:14 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 increased catalog server heap resolved this problem.     however, there should be a jira opened against impala daemon.     if the catalog server misbehaves, impala daemon should not have queries stuck 'in flight' forever, along with consuming one cpu at 100%. (consumes an entire cpu for every stuck query)    
						
					
					... View more