Member since 
    
	
		
		
		12-01-2016
	
	
	
	
	
	
	
	
	
	
	
	
	
	
			
      
                25
            
            
                Posts
            
        
                1
            
            
                Kudos Received
            
        
                1
            
            
                Solution
            
        My Accepted Solutions
| Title | Views | Posted | 
|---|---|---|
| 2216 | 03-26-2017 03:54 AM | 
			
    
	
		
		
		08-04-2017
	
		
		12:38 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @Jay SenSharma  Thanks for getting back on this, the details of Ambari Agent as below  ]$ ambari-agent --version
2.2.1.0
]$ rpm -qa|grep ambari-agent
ambari-agent-2.2.1.0-161.x86_64  Its does seem like , the issue indicated in the Jira is relevant to the issue that occurred. As of now this issue has occurred only once but it does seem like migrating would be a good option to avoid this issue in future.    Also, i had indicated that Namenode CPU WIO was N/A, after a few hours i am able to see the metric on the Dashboard. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		08-04-2017
	
		
		10:58 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 The issue started with an Alert on Hive Metastore Service:  Metastore on dh01.int.belong.com.au failed (Traceback (most recent call last):
  File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/alerts/alert_hive_metastore.py", line 183, in execute
    timeout=int(check_command_timeout) )
  File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 154, in __init__
    self.env.run()
  File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 158, in run
    self.run_action(resource, action)
  File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 121, in run_action
    provider_action()
  File "/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py", line 238, in action_run
    tries=self.resource.tries, try_sleep=self.resource.try_sleep)
  File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 70, in inner
    result = function(command, **kwargs)
  File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 92, in checked_call
    tries=tries, try_sleep=try_sleep)
  File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 140, in _call_wrapper
    result = _call(command, **kwargs_copy)
  File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 291, in _call
    raise Fail(err_msg)
Fail: Execution of 'export HIVE_CONF_DIR='/usr/hdp/current/hive-metastore/conf/conf.server' ; hive --hiveconf hive.metastore.uris=thrift://dh01.int.belong.com.au:9083                 --hiveconf hive.metastore.client.connect.retry.delay=1                 --hiveconf hive.metastore.failure.retries=1                 --hiveconf hive.metastore.connect.retries=1                 --hiveconf hive.metastore.client.socket.timeout=14                 --hiveconf hive.execution.engine=mr -e 'show databases;'' returned 5. Java HotSpot(TM) 64-Bit Server VM warning: INFO: os::commit_memory(0x00000002c0000000, 977797120, 0) failed; error='Cannot allocate memory' (errno=12)
Unable to determine Hadoop version information.
'hadoop version' returned:
Java HotSpot(TM) 64-Bit Server VM warning: INFO: os::commit_memory(0x00000002c0000000, 977797120, 0) failed; error='Cannot allocate memory' (errno=12)
# # There is insufficient memory for the Java Runtime Environment to continue. # Native memory allocation (mmap) failed to map 977797120 bytes for committing reserved memory. # An error report file with more information is saved as: # /home/ambari-qa/hs_err_pid4858.log
)
  I tried launching hive from command prompt : sudo hive , this error ed out with Java Run time Environment Exception.  Then, i looked at memory utilization which indicated that SWAP has run out.  ]$ free -m   	total       used       free     shared    buffers     cached
Mem		:       64560      63952        607          0         77        565
-/+ buffers/cache:      	63309       1251
Swap	:       1023       1023          0
  I tried to restart Hive Metastore service from Ambari but that operation Hung for over 30 minutes without printing anything in the stdout and strerror logs. At this point I involved Server Administrator in the investigation and it was revealed that the following process had reserved upto 40 GB. It seemed strange (I am not sure what is the optimal utilization pattern for Ambari Agent/Monitor ?? !! )  root      3424  3404 14  2016 ?        52-22:05:00 /usr/bin/python2 /usr/lib/python2.6/site-packages/ambari_agent/main.py start  At this point i tried to restart Ambari Metric service on the name node from Ambari, the operation Timed out and then "Heart Beat" from the node stopped. As can be seen in the image.  I was not able to restart Ambari Metric service on the Name Node from Ambari Console, as the option was disabled. I tried to so a rolling restart of all Ambari Monitor Services, but the Monitor Service on Name Node did not start.  At this point we decided to 2 things,  add more swap space (Admin added 1 more GB ) and then i stopped and started Ambari Services as follows:  #Stop operation did not succed at first go and i had to kill the Pid
sudo su - ams -c '/usr/sbin/ambari-metrics-monitor --config /etc/ambari-metrics-monitor/conf stop'
sudo su - ams -c '/usr/sbin/ambari-metrics-monitor --config /etc/ambari-metrics-monitor/conf start'
#I looked at Agent Status
sudo ambari-agent status#The agent was not running, Hence i started the agent
sudo ambari-agent start
  After the agent start the monitor from this node was up and reflected in Ambari. The only issue that i have now is that Namenode CPU WIO is N/A on the Ambari Dashboard ? , Will be helpfull to know how to get this  back ?  Also, what i intend to do is to review HiveServer2 and Metastore heap sizes which current stand at, again would these settings cause this issue were swap runs out. This has not happened before !  HiveServer2 Heap Size = 20480 MB  Metastore Heap Size = 12288 MB  
 Environment Information:
Hadoop 2.7.1.2.4.0.0-169 
hive-meta-store - 2.4.0.0-169 
hive-server2 - 2.4.0.0-169
hive-webhcat - 2.4.0.0-169 
Ambari 2.2.1.0RAM:  64 GB  Helpfull links:  https://community.hortonworks.com/questions/15862/how-can-i-start-my-ambari-heartbeat.html  https://cwiki.apache.org/confluence/display/AMBARI/Metrics     
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
			
	
					
			
		
	
	
	
	
				
		
	
	
- Labels:
 - 
						
							
		
			Apache Ambari
 
			
    
	
		
		
		07-21-2017
	
		
		07:32 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @mqureshi .. Thanks for getting back. I have reduced the HiveServer2 Heap Size to 20 GB and observing the behavior, i intend to reduce to 12 GB ,step wise over the coming days. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		07-19-2017
	
		
		07:48 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 I am facing hive errors intermittently,
Garbage Collection Issues indicated in the log:
hiveserver2:    @dh01 hive]$ cat hiveserver2.log | grep 'GC'
        at org.apache.tez.dag.api.client.DAGClientHandler.submitDAG(DAGClientHandler.java:118)
        at org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolBlockingPBServerImpl.submitDAG(DAGClientAMProtocolBlockingPBServerImpl.java:163)
        at org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolRPC$DAGClientAMProtocol$2.callBlockingMethod(DAGClientAMProtocolRPC.java:7471)
2017-07-17 14:00:22,815 INFO  [org.apache.hadoop.util.JvmPauseMonitor$Monitor@59fc6d05]: util.JvmPauseMonitor (JvmPauseMonitor.java:run(195)) - Detected pause in JVM or host machine (eg GC): pause of approximately 1913ms
GC pool 'PS Scavenge' had collection(s): count=1 time=1961ms
2017-07-17 14:14:28,531 INFO  [org.apache.hadoop.util.JvmPauseMonitor$Monitor@59fc6d05]: util.JvmPauseMonitor (JvmPauseMonitor.java:run(195)) - Detected pause in JVM or host machine (eg GC): pause of approximately 1452ms
GC pool 'PS Scavenge' had collection(s): count=1 time=1701ms
2017-07-17 15:04:32,309 INFO  [org.apache.hadoop.util.JvmPauseMonitor$Monitor@59fc6d05]: util.JvmPauseMonitor (JvmPauseMonitor.java:run(195)) - Detected pause in JVM or host machine (eg GC): pause of approximately 1838ms
GC pool 'PS Scavenge' had collection(s): count=1 time=2195ms
2017-07-17 16:08:45,121 INFO  [org.apache.hadoop.util.JvmPauseMonitor$Monitor@59fc6d05]: util.JvmPauseMonitor (JvmPauseMonitor.java:run(195)) - Detected pause in JVM or host machine (eg GC): pause of approximately 1568ms
GC pool 'PS Scavenge' had collection(s): count=1 time=1707ms
 hivemetastore: @dh01 hive]$ cat hivemetastore.log | grep -i "GC pool"
GC pool 'PS Scavenge' had collection(s): count=1 time=3521ms
GC pool 'PS MarkSweep' had collection(s): count=1 time=11097ms
GC pool 'PS Scavenge' had collection(s): count=1 time=37ms
@dh01 hive]$ cat hivemetastore.log | grep -i "JvmPauseMonitor"
2017-07-19 04:26:50,008 INFO  [org.apache.hadoop.util.JvmPauseMonitor$Monitor@4f85aca0]: util.JvmPauseMonitor (JvmPauseMonitor.java:run(195)) - Detected pause in JVM or host machine (eg GC): pause of approximately 3050ms
2017-07-19 11:01:32,392 WARN  [org.apache.hadoop.util.JvmPauseMonitor$Monitor@4f85aca0]: util.JvmPauseMonitor (JvmPauseMonitor.java:run(191)) - Detected pause in JVM or host machine (eg GC): pause of approximately 10915ms
  
HiveServer2 Heap Size = 24210 MB (had been set already)
Metastore Heap Size = 12288 MB (changed from 8 GB previously).
Client heap Size= 2 GB (changed from 1 GB previously).  I did read the article below and the provided links, which was helpfull:  https://community.hortonworks.com/articles/14170/namenode-garbage-collection-configuration-best-pra.html  but after having made the changes to indicated heap sizes , i still had instances were Hiveserver2 or Metastore service would go on alert in ambari for a few seconds and come back healthy.   The logs , did not have any errors in this instance  hive.out  hive.log  hive-server2.out  hive-server2.log  hivemetastore.log  hiveserver2.log  Am i missing something ?, would setting HiveServer2 Heap Size and Metastore Heap Size Same help.. i.e setting (HiveServer2 Heap Size =12288 MB)  Environment:   Hadoop 2.7.1.2.4.0.0-169 
hive-meta-store - 2.4.0.0-169 
hive-server2 - 2.4.0.0-169
hive-webhcat - 2.4.0.0-169 
Ambari 2.2.1.0 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
			
	
					
			
		
	
	
	
	
				
		
	
	
- Labels:
 - 
						
							
		
			Apache Hive
 
			
    
	
		
		
		06-29-2017
	
		
		11:21 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Hi @ssathish,  I did look at the Link you posted and decided to delete the file.  
CAUTION:
For some reason a few hours later there were inconsistencies in the cluster . One of the data nodes (D5) were clean up was done had corruption in the way containers were processed. Some jobs for which containers were lunched in D5 executed to completion successfully and some other jobs failed due to Vertex failed error. We could not find any errors in RM log/Datanode Log/Node Manager Log  We had to remove D5 off the cluster and reinstall node manager to set things right. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		06-26-2017
	
		
		02:29 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 I have a disk running full on one of my Data node:
  [ayguha@dh03 hadoop]$ sudo du -h --max-depth=1
674G    ./hdfs
243G    ./yarn
916G    .
  [xx@dh03 local]$ sudo du -h --max-depth=1
1.4G    ./filecache
3.2G    ./usercache
68K     ./nmPrivate
242G    .
  There are over 1k tmp files accumulating in /data/hadoop/yarn/local  [ayguha@dh03 local]$ ls -l *.tmp | wc -l
1055
./optimized-preview-record-buffer-2808068b-4d54-492e-a31a-385065d25a408826610818023522318.tmp
./preview-record-buffer-24a7477f-01f0-427e-a032-54866df48b197825057363055390034.tmp
./preview-record-buffer-b22020bb-6ec2-4f73-9d65-65dbba50136e527236496621902098.tmp
[ayguha@dh03 local]$ find ./*preview-record-buffer* -type f -mtime +90 | wc -l
973  There are near 1k files that are older than 3 months . Is it safe to delete these files ?  ENV:
Hadoop 2.7.1.2.4.0.0-169
HDP 2.4 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
			
	
					
			
		
	
	
	
	
				
		
	
	
- Labels:
 - 
						
							
		
			Apache YARN
 
			
    
	
		
		
		05-29-2017
	
		
		06:17 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @mqureshi 
The cluster currently only has one active name node.      
Is there a better way to find out the 'Active Node' ?
I used the following as well.. but does not distinguish 
  curl --user admin:admin http://dh01.int.belong.com.au:8080/api/v1/clusters/belong1/host_components?HostRoles/component_name=NAMENODE&metrics/dfs/FSNamesystem/HAState=active  dh01 ~]$ curl --user admin:admin http://dh01.int.belong.com.au:8080/api/v1/clusters/belong1/host_components?HostRoles/component_name=NAMENODE&metrics/dfs/FSNamesystem/HAState=active
[1] 16533
-bash: metrics/dfs/FSNamesystem/HAState=active: No such file or directory
[ayguha@dh01 ~]$ {
  "href" : "http://dh01.int.belong.com.au:8080/api/v1/clusters/belong1/host_components?HostRoles/component_name=NAMENODE",
  "items" : [
    {
      "href" : "http://dh01.int.belong.com.au:8080/api/v1/clusters/belong1/hosts/dh01.int.belong.com.au/host_components/NAMENODE",
      "HostRoles" : {
        "cluster_name" : "belong1",
        "component_name" : "NAMENODE",
        "host_name" : "dh01.int.belong.com.au"
      },
      "host" : {
        "href" : "http://dh01.int.belong.com.au:8080/api/v1/clusters/belong1/hosts/dh01.int.belong.com.au"
      }
    },
    {
      "href" : "http://dh01.int.belong.com.au:8080/api/v1/clusters/belong1/hosts/dh02.int.belong.com.au/host_components/NAMENODE",
      "HostRoles" : {
        "cluster_name" : "belong1",
        "component_name" : "NAMENODE",
        "host_name" : "dh02.int.belong.com.au"
      },
      "host" : {
        "href" : "http://dh01.int.belong.com.au:8080/api/v1/clusters/belong1/hosts/dh02.int.belong.com.au"
      }
    }
  ]
}
  Also hdfs-site.xml does not have the property dfs.namenode.rpc-address.
 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		05-29-2017
	
		
		05:36 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @mqureshi   Command: tried it directly without pushing it to background
sudo -u hdfs hdfs balancer -fs hdfs://belongcluster1:8020 -threshold 5
  [ayguha@dh01 ~]$ sudo -u hdfs hdfs balancer -fs hdfs://belongcluster1:8020 -threshold 5
17/05/29 15:29:39 INFO balancer.Balancer: Using a threshold of 5.0
17/05/29 15:29:39 INFO balancer.Balancer: namenodes  = [hdfs://belongcluster1, hdfs://belongcluster1:8020]
17/05/29 15:29:39 INFO balancer.Balancer: parameters = Balancer.BalancerParameters [BalancingPolicy.Node, threshold = 5.0, max idle iteration = 5, #excluded nodes = 0, #included nodes = 0, #source nodes = 0, #blockpools = 0, run during upgrade = false]
17/05/29 15:29:39 INFO balancer.Balancer: included nodes = []
17/05/29 15:29:39 INFO balancer.Balancer: excluded nodes = []
17/05/29 15:29:39 INFO balancer.Balancer: source nodes = []
Time Stamp               Iteration#  Bytes Already Moved  Bytes Left To Move  Bytes Being Moved
17/05/29 15:29:41 INFO balancer.KeyManager: Block token params received from NN: update interval=10hrs, 0sec, token lifetime=10hrs, 0sec
17/05/29 15:29:41 INFO block.BlockTokenSecretManager: Setting block keys
17/05/29 15:29:41 INFO balancer.KeyManager: Update block keys every 2hrs, 30mins, 0sec
17/05/29 15:29:42 INFO block.BlockTokenSecretManager: Setting block keys
17/05/29 15:29:42 INFO balancer.KeyManager: Block token params received from NN: update interval=10hrs, 0sec, token lifetime=10hrs, 0sec
17/05/29 15:29:42 INFO block.BlockTokenSecretManager: Setting block keys
17/05/29 15:29:42 INFO balancer.KeyManager: Update block keys every 2hrs, 30mins, 0sec
java.io.IOException: Another Balancer is running..  Exiting ...
May 29, 2017 3:29:42 PM  Balancing took 3.035 seconds
  Error:    17/05/29 15:29:42 INFO balancer.KeyManager: Update block keys every 2hrs, 30mins, 0sec
java.io.IOException: Another Balancer is running..  Exiting ...  Also checked if balancer process is stuck..  from the output it does not look like anything is hanging from previous tries.  dh01 ~]$ ps -ef | grep "balancer"
ayguha    4611  2551  0 15:34 pts/0    00:00:00 grep balancer
dh01 ~]$hdfs dfs -ls /system/balancer.id
ls: `/system/balancer.id': No such file or directory
 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		05-29-2017
	
		
		03:40 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @mqureshi 
I found another thread with similar issue:
https://community.hortonworks.com/questions/22105/hdfs-balancer-is-getting-failed-after-30-mins-in-a.html
here they say indicate that if HA is enabled then one would need to remove dfs.namenode.rpc-address .
I ran a check on Ambari Server using the configs.sh:
  /var/lib/ambari-server/resources/scripts/configs.sh -u admin -p admin -port 8080 get dh01.int.belong.com.au belong1 hdfs-site  and the output does not contain the dfs.namenode.rpc-address property.  ########## Performing 'GET' on (Site:hdfs-site, Tag:version1470359698835)
"properties" : {
"dfs.block.access.token.enable" : "true",
"dfs.blockreport.initialDelay" : "120",
"dfs.blocksize" : "134217728",
"dfs.client.block.write.replace-datanode-on-failure.enable" : "NEVER",
"dfs.client.failover.proxy.provider.belongcluster1" : "org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider",
"dfs.client.read.shortcircuit" : "true",
"dfs.client.read.shortcircuit.streams.cache.size" : "4096",
"dfs.client.retry.policy.enabled" : "false",
"dfs.cluster.administrators" : " hdfs",
"dfs.content-summary.limit" : "5000",
"dfs.datanode.address" : "0.0.0.0:50010",
"dfs.datanode.balance.bandwidthPerSec" : "6250000",
"dfs.datanode.data.dir" : "/data/hadoop/hdfs/data",
"dfs.datanode.data.dir.perm" : "750",
"dfs.datanode.du.reserved" : "1073741824",
"dfs.datanode.failed.volumes.tolerated" : "0",
"dfs.datanode.http.address" : "0.0.0.0:50075",
"dfs.datanode.https.address" : "0.0.0.0:50475",
"dfs.datanode.ipc.address" : "0.0.0.0:8010",
"dfs.datanode.max.transfer.threads" : "16384",
"dfs.domain.socket.path" : "/var/lib/hadoop-hdfs/dn_socket",
"dfs.encrypt.data.transfer.cipher.suites" : "AES/CTR/NoPadding",
"dfs.encryption.key.provider.uri" : "",
"dfs.ha.automatic-failover.enabled" : "true",
"dfs.ha.fencing.methods" : "shell(/bin/true)",
"dfs.ha.namenodes.belongcluster1" : "nn1,nn2",
"dfs.heartbeat.interval" : "3",
"dfs.hosts.exclude" : "/etc/hadoop/conf/dfs.exclude",
"dfs.http.policy" : "HTTP_ONLY",
"dfs.https.port" : "50470",
"dfs.journalnode.edits.dir" : "/hadoop/hdfs/journal",
"dfs.journalnode.https-address" : "0.0.0.0:8481",
"dfs.namenode.accesstime.precision" : "0",
"dfs.namenode.acls.enabled" : "true",
"dfs.namenode.audit.log.async" : "true",
"dfs.namenode.avoid.read.stale.datanode" : "true",
"dfs.namenode.avoid.write.stale.datanode" : "true",
"dfs.namenode.checkpoint.dir" : "/tmp/hadoop/hdfs/namesecondary",
"dfs.namenode.checkpoint.edits.dir" : "${dfs.namenode.checkpoint.dir}",
"dfs.namenode.checkpoint.period" : "21600",
"dfs.namenode.checkpoint.txns" : "1000000",
"dfs.namenode.fslock.fair" : "false",
"dfs.namenode.handler.count" : "200",
"dfs.namenode.http-address" : "dh01.int.belong.com.au:50070",
"dfs.namenode.http-address.belongcluster1.nn1" : "dh01.int.belong.com.au:50070",
"dfs.namenode.http-address.belongcluster1.nn2" : "dh02.int.belong.com.au:50070",
"dfs.namenode.https-address" : "dh01.int.belong.com.au:50470",
"dfs.namenode.https-address.belongcluster1.nn1" : "dh01.int.belong.com.au:50470",
"dfs.namenode.https-address.belongcluster1.nn2" : "dh02.int.belong.com.au:50470",
"dfs.namenode.name.dir" : "/data/hadoop/hdfs/namenode",
"dfs.namenode.name.dir.restore" : "true",
"dfs.namenode.rpc-address.belongcluster1.nn1" : "dh01.int.belong.com.au:8020",
"dfs.namenode.rpc-address.belongcluster1.nn2" : "dh02.int.belong.com.au:8020",
"dfs.namenode.safemode.threshold-pct" : "0.99",
"dfs.namenode.shared.edits.dir" : "qjournal://dh03.int.belong.com.au:8485;dh02.int.belong.com.au:8485;dh01.int.belong.com.au:8485/belongcluster1",
"dfs.namenode.stale.datanode.interval" : "30000",
"dfs.namenode.startup.delay.block.deletion.sec" : "3600",
"dfs.namenode.write.stale.datanode.ratio" : "1.0f",
"dfs.nameservices" : "belongcluster1",
"dfs.permissions.enabled" : "true",
"dfs.permissions.superusergroup" : "hdfs",
"dfs.replication" : "3",
"dfs.replication.max" : "50",
"dfs.support.append" : "true",
"dfs.webhdfs.enabled" : "true",
"fs.permissions.umask-mode" : "022",
"nfs.exports.allowed.hosts" : "* rw",
"nfs.file.dump.dir" : "/tmp/.hdfs-nfs"
}
  
Are you suggesting that i just keep 1 namenode service address and point it to primary name node host:port. Something like the below:      <property>
      <name>dfs.namenode.rpc-address.belongcluster1</name>
      <value>dh01.int.belong.com.au:8020</value>
    </property>
 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		05-29-2017
	
		
		02:39 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @mqureshi   About : https://community.hortonworks.com/articles/4595/balancer-not-working-in-hdfs-ha.html  my hdfs-site.xml has 2 entries .. i am not sure if i need to delete both or NN2 only..      <property>
      <name>dfs.namenode.rpc-address.belongcluster1.nn1</name>
      <value>dh01.int.belong.com.au:8020</value>
    </property>
    
    <property>
      <name>dfs.namenode.rpc-address.belongcluster1.nn2</name>
      <value>dh02.int.belong.com.au:8020</value>
    </property>
 
						
					
					... View more