- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Load not distributed in the cluster
- Labels:
-
Apache Hadoop
-
Apache YARN
Created ‎08-03-2016 12:28 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
We are having a 5 node cluster. ( 2 master and 3 slave) and we are running MR jobs. but we always see that only 2 nodes are getting loaded and utilized, while the other node remains idle. what all could be the reasons for this. all the 3 nodes are in the same rack.
Created ‎08-04-2016 04:00 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Please do the following it helped me.
1) Login to Ambari WI
HDFS -> Quick Links ->NameNode UI ->Datanodes
Check how capacity of HDFS and how much it utilized and left storage if blocks are not replicated equally.
Jobs will be running on Data Node only, hence it running on two nodes only.
2) While running the MRapp always try to get for Data Locality during JOB run.
3) Do the Load Balancer on cluster , data will be distributed across the Datanodes.
4) After Balancer completed , try to check how jobs are running.
still jobs are facing same , please update .
if its help full , your close is appreciated.
Created ‎08-03-2016 01:00 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @ARUN
The main reason might be the data blocks needed for the MapReduce job to run are located in those two nodes itself.
Can you please check the data blocks of the file you are processing and verify that the data is distributed in 3 nodes.
Speculative execution( case when your nodes are too busy running the tasks then the data can be moved temporarily to the third node and run the task.) also not be happening.
Created ‎08-04-2016 04:00 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Please do the following it helped me.
1) Login to Ambari WI
HDFS -> Quick Links ->NameNode UI ->Datanodes
Check how capacity of HDFS and how much it utilized and left storage if blocks are not replicated equally.
Jobs will be running on Data Node only, hence it running on two nodes only.
2) While running the MRapp always try to get for Data Locality during JOB run.
3) Do the Load Balancer on cluster , data will be distributed across the Datanodes.
4) After Balancer completed , try to check how jobs are running.
still jobs are facing same , please update .
if its help full , your close is appreciated.
Created ‎08-04-2016 04:22 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
In Addition to above answers:
1. Can you please check if nodemanager state is healthy for 3rd node? Sometimes, because of disk failure/reserved disk nodemanegers goes into unhealthy state although nodemanager daemon is still running, jobs will not get scheduled on problematic node.
2. Most important thing is try to run hdfs balancer if data distribution is un-even across all the datanodes.
3. Below is the command to run hdfs balancer
sudo -u hdfs hadoop balancer -threshold <threshold-value>
Note - default threshold is 10, you can reduce it upto 1 depending on how close you want to balance your cluster.
.
Hope this information helps!
Created ‎08-04-2016 05:10 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks @Kuldeep Kulkarni and @Shiva Nagesh
Created ‎09-27-2016 06:19 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi all,
Could you please share more information about the following scenario... trying to run exactly the statement you mentioned... running from the active master_node :
16/09/26 17:42:53 INFO balancer.Balancer: namenodes = [hdfs://hadoop2, hdfs://linux.lab.domain.com:8020] 16/09/26 17:42:53 INFO balancer.Balancer: parameters = Balancer.Parameters [BalancingPolicy.Node, threshold = 5.0, max idle iteration = 5, #excluded nodes = 0, #included nodes = 0, #source nodes = 0, run during upgrade = false] 16/09/26 17:42:53 INFO balancer.Balancer: included nodes = [] 16/09/26 17:42:53 INFO balancer.Balancer: excluded nodes = [] 16/09/26 17:42:53 INFO balancer.Balancer: source nodes = [] Time Stamp Iteration# Bytes Already Moved Bytes Left To Move Bytes Being Moved 16/09/26 17:42:53 INFO balancer.KeyManager: Block token params received from NN: update interval=10hrs, 0sec, token lifetime=10hrs, 0sec 16/09/26 17:42:53 INFO block.BlockTokenSecretManager: Setting block keys 16/09/26 17:42:53 INFO balancer.KeyManager: Update block keys every 2hrs, 30mins, 0sec 16/09/26 17:42:54 INFO block.BlockTokenSecretManager: Setting block keys org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): Operation category READ is not supported in state standby at org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:87) at org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.checkOperation(NameNode.java:1872) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOperation(FSNamesystem.java:1306) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getServerDefaults(FSNamesystem.java:1618) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getServerDefaults(NameNodeRpcServer.java:595) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getServerDefaults(ClientNamenodeProtocolServerSideTranslatorPB.java:383) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2137) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2133) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Unknown Source) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2131) . Exiting ...
. Exiting ... Sep 26, 2016 5:42:54 PM Balancing took 1.314 seconds
it runs in less than 2 seconds....what seems not running 100% , right ?
How many seconds it could take in average if it really runs fine ? or is there any log to check further information if some error is happenning ?
I also tried to run the same from the Ambari console (balance all nodes)... the same result is reached.
thanks and br
