Every NodeManager in my hadoop cluster is not connected to its ResourceManager.
These are the errors I can see from yarn:
Thread Thread[Timer-2,5,main] threw an Exception.
java.lang.IllegalArgumentException: Wrong FS: hdfs://nameservice1:8020/user/history/done_intermediate/hive/job_1557996286771_33621_conf.xml, expected: hdfs://nameservice1
at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:662)
at org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:222)
at org.apache.hadoop.hdfs.DistributedFileSystem.access$000(DistributedFileSystem.java:114)
at org.apache.hadoop.hdfs.DistributedFileSystem$20.doCall(DistributedFileSystem.java:1266)
at org.apache.hadoop.hdfs.DistributedFileSystem$20.doCall(DistributedFileSystem.java:1262)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1262)
at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1418)
at org.apache.hadoop.fs.FileUtil.checkDest(FileUtil.java:499)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:351)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:341)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:292)
at org.apache.hadoop.mapreduce.v2.hs.KilledHistoryService$FlagFileHandler.copy(KilledHistoryService.java:210)
at org.apache.hadoop.mapreduce.v2.hs.KilledHistoryService$FlagFileHandler.access$300(KilledHistoryService.java:85)
at org.apache.hadoop.mapreduce.v2.hs.KilledHistoryService$FlagFileHandler$1.run(KilledHistoryService.java:138)
at org.apache.hadoop.mapreduce.v2.hs.KilledHistoryService$FlagFileHandler$1.run(KilledHistoryService.java:125)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1924)
at org.apache.hadoop.mapreduce.v2.hs.KilledHistoryService$FlagFileHandler.run(KilledHistoryService.java:125)
at java.util.TimerThread.mainLoop(Timer.java:555)
at java.util.TimerThread.run(Timer.java:505)
View Log File
host4 ERROR October 15, 2019 11:40 PM NodeManager
RECEIVED SIGNAL 15: SIGTERM
View Log File
master3 ERROR October 15, 2019 11:40 PM JobHistoryServer
RECEIVED SIGNAL 15: SIGTERM
View Log File
master1 ERROR October 15, 2019 11:40 PM ResourceManager
RECEIVED SIGNAL 15: SIGTERM
View Log File
host3 ERROR October 15, 2019 11:40 PM NodeManager
RECEIVED SIGNAL 15: SIGTERM
View Log File
master1 ERROR October 15, 2019 11:40 PM AbstractDelegationTokenSecretManager
ExpiredTokenRemover received java.lang.InterruptedException: sleep interrupted
View Log File
master1 ERROR October 15, 2019 11:40 PM AbstractDelegationTokenSecretManager
ExpiredTokenRemover received java.lang.InterruptedException: sleep interrupted
View Log File
master1 ERROR October 15, 2019 11:40 PM AbstractDelegationTokenSecretManager
ExpiredTokenRemover received java.lang.InterruptedException: sleep interrupted
View Log File
host2 ERROR October 15, 2019 11:40 PM NodeManager
RECEIVED SIGNAL 15: SIGTERM
please any help?
Created 11-12-2019 06:25 AM
Hi,
From the Yarn logs we not able to see lot of " Sig term" Errors. Did you checked for the memory in the Yarn job? Added how do you found that the Node manager is not connected to the Resourcemanager? Could you share more information?
Also please share Node manager logs and Resource manager logs for further digging of this issue.
Thanks
AKR