Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

slave VM removed from list of slaves and still being accessed by Yarn/Tez

Highlighted

slave VM removed from list of slaves and still being accessed by Yarn/Tez

Explorer

So I removed the vm4 from the list of slave VMs and when I run the following command it doesn't access it

hdfs dfsadmin -report

result is:

ubuntu@anmol-vm1-new:~$ hdfs dfsadmin -report
15/12/1406:56:12 WARN util.NativeCodeLoader:Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
ConfiguredCapacity:1268169326592(1.15 TB)PresentCapacity:1199270457337(1.09 TB)
DFS Remaining:1199213064192(1.09 TB)
DFS Used:57393145(54.73 MB)
DFS Used%:0.00%Under replicated blocks:27Blocks with corrupt replicas:0Missing blocks:0-------------------------------------------------Datanodes available:3(3 total,0 dead)Live datanodes:Name:10.0.1.191:50010(anmol-vm2-new)Hostname: anmol-vm2-newDecommissionStatus:NormalConfiguredCapacity:422723108864(393.69 GB)
DFS Used:19005440(18.13 MB)Non DFS Used:21501829120(20.03 GB)
DFS Remaining:401202274304(373.65 GB)
DFS Used%:0.00%
DFS Remaining%:94.91%ConfiguredCacheCapacity:0(0 B)CacheUsed:0(0 B)CacheRemaining:0(0 B)CacheUsed%:100.00%CacheRemaining%:0.00%Last contact:MonDec1406:56:12 UTC 2015Name:10.0.1.190:50010(anmol-vm1-new)Hostname: anmol-vm1-newDecommissionStatus:NormalConfiguredCapacity:422723108864(393.69 GB)
DFS Used:19369984(18.47 MB)Non DFS Used:25831350272(24.06 GB)
DFS Remaining:396872388608(369.62 GB)
DFS Used%:0.00%
DFS Remaining%:93.88%ConfiguredCacheCapacity:0(0 B)CacheUsed:0(0 B)CacheRemaining:0(0 B)CacheUsed%:100.00%CacheRemaining%:0.00%Last contact:MonDec1406:56:13 UTC 2015Name:10.0.1.192:50010(anmol-vm3-new)Hostname: anmol-vm3-newDecommissionStatus:NormalConfiguredCapacity:422723108864(393.69 GB)
DFS Used:19017721(18.14 MB)Non DFS Used:21565689863(20.08 GB)
DFS Remaining:401138401280(373.59 GB)
DFS Used%:0.00%
DFS Remaining%:94.89%ConfiguredCacheCapacity:0(0 B)CacheUsed:0(0 B)CacheRemaining:0(0 B)CacheUsed%:100.00%CacheRemaining%:0.00%Last contact:MonDec1406:56:11 UTC 2015

however at some point Yarn tries to access it. Here's the log I received:

yarn logs -applicationId application_1450050523156_0009

http://pastebin.com/UVHnkRRp

Service org.apache.tez.dag.app.rm.TaskScheduler failed in state STARTED; cause: java.lang.IllegalArgumentException: java.net.UnknownHostException: anmol-vm4-new
        at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:377)
        at org.apache.hadoop.yarn.server.security.BaseNMTokenSecretManager.newInstance(BaseNMTokenSecretManager.java:145)
        at org.apache.hadoop.yarn.server.security.BaseNMTokenSecretManager.createNMToken(BaseNMTokenSecretManager.java:136)
        at org.apache.hadoop.yarn.server.resourcemanager.security.NMTokenSecretManagerInRM.createAndGetOptimisticNMToken(NMTokenSecretManagerInRM.java:325)
        at org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.registerApplicationMaster(ApplicationMasterService.java:297)
        at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.registerApplicationMaster(ApplicationMasterProtocolPBServiceImpl.java:90)
        at org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:95)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2014)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2010)
        at java.security.AccessController.doPrivileged(NativeMethod)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1561)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2008)Caused by: java.net.UnknownHostException: anmol-vm4-new...15 more

Any idea why is it trying to access VM4 which is not in slaves list and how that could be fixed?

UPDATE: I did the following but still I receive an error because it tries to access vm4:

1)add the files exclude and mapred.exclude in conf directory of yarnpp including the private IP address of vm4.

2)add this in mapred-site.xml:

<property><name>mapred.hosts.exclude</name><value>/home/hadoop/yarnpp/conf/mapred.exclude</value><description>Names a file that contains the list of hosts that
      should be excluded by the jobtracker.If the value is empty, no
      hosts are excluded.</description></property>

3)add this to hdfs-site.xml:

<property><name>dfs.hosts.exclude</name><value>/home/hadoop/yarnpp/conf/exclude</value><final>true</final></property>

3.5) added this to yarn-site.xml:

<property><name>yarn.resourcemanager.nodes.exclude-path</name><value>/home/hadoop/yarnpp/conf/exclude</value><description>Path to file with nodes to exclude.</description></property>

4)run cp_host.sh to copy the conf directory to all the slaves!

5)run reboot_everything script (which does stop-all.sh, formatting and start-all.sh)

6) hadoop dfsadmin -refreshNodes

7) run this command in master VM:

 yarn rmadmin -refreshNodes

And here's the new log: http://pastebin.com/cKPY9gmB

800-screen-shot-2015-12-14-at-23915-am.png

4 REPLIES 4
Highlighted

Re: slave VM removed from list of slaves and still being accessed by Yarn/Tez

Explorer

And this is the error I get when running gridmix-generate.sh job @Neeraj Sabharwal

15/12/14 10:14:53 INFO ipc.Client: Retrying connect to server: anmol-vm3-new/10.0.1.192:50833. Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)

Highlighted

Re: slave VM removed from list of slaves and still being accessed by Yarn/Tez

@Mona Jalal

It looks like that node removal did not happen correctly. This is a good thread http://stackoverflow.com/questions/16774439/how-do...

This is another reason that I am big fan of Ambari ;)

Highlighted

Re: slave VM removed from list of slaves and still being accessed by Yarn/Tez

Explorer

already tried that! Didn't work so I thought I'd ask here and someone tell me why is that!

Highlighted

Re: slave VM removed from list of slaves and still being accessed by Yarn/Tez

Mentor

@Mona Jalal are you still having problems with this? Can you provide your own solution or accept best answer?

Don't have an account?
Coming from Hortonworks? Activate your account here