Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

yarn logs + blk_xxxxxx_xxxxxx does not exist or is not under Construction

avatar

We have spark cluster with the following details ( all machines are linux redhat machines )

 

2 name-node machines
2 resource-manager machines
8 data-node machines ( HDFS file-system)

 

We are running running spark streaming application

From the yarn logs we can see the following errors , example:


yarn logs -applicationId application_xxxxxxxx -log_files ALL


---2019-11-08T10:12:20.040 ERROR [][][] [org.apache.spark.scheduler.LiveListenerBus] Listener EventLoggingListener threw an exception
org.apache.hadoop.ipc.RemoteException(java.io.IOException): BP-484874736-172.2.45.23-8478399929292:blk_1081495827_7755233 does not exist or is not under Construction
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkUCBlock(FSNamesystem.java:6721)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.updateBlockForPipeline(FSNamesystem.java:6789)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.updateBlockForPipeline(NameNodeRpcServer.java:931)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.updateBlockForPipeline(ClientNamenodeProtocolServerSideTranslatorPB.java:979)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2351)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2347)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2347)

 

we can see that - `8478399929292:-blk_1081495827_7755233` does not exist or is not under Construction

 

but what could be the reasons that yarn complain about this?

Michael-Bronson
8 REPLIES 8

avatar
Expert Contributor

Hi Mike,

 

Can you do quick check below -

 

**BP-484874736-172.2.45.23-8478399929292:blk_1081495827_7755233 does not exist or is not under Construction

>>

1. Are all Datanodes up and running fine within cluster

2. Check on the NN UI and see if any Datanode is NOT reporting blocks in Datanode tab or any Missing blocks reported on NN UI

3. You can run fsck [unless cluster is huge and loaded with data] and check of the block exist and which all nodes has the replica.

 

It might help to drill down the issue.

avatar

hi

 

1.  all Datanodes are up and running fine 

2. I not see corrupted block or under replica 

3, We runs the fsck and hdfs is healthy 

 

any other possibility's? 

Michael-Bronson

avatar

 

 

we also do the following

 

su hdfs

hadoop fsck / -files -blocks >/tmp/file

 

and we bot found the block - blk_1081495827_7755233 in the file - /tmp/file

 

so what is the reason that block removed?

Michael-Bronson

avatar
Expert Contributor

1. Is the job failed due to above reason?  If "NO", then is the error occurring in logs eveything for other BP XXX also?

2. Can you check using fsck which nodes had copied of the BP specified above?

avatar

please send me the fsck cli that you want me to run 

Michael-Bronson

avatar
Expert Contributor

If you know the file name then -

 

hdfs fsck /myfile.txt -files -blocks -locations

Else

 

hdfs fsck |grep <blkxxx>

 

 

avatar

 

by the following

 

hdfs fsck / -files -blocks -locations | grep blk_xxxxxx_xxxxxx

 

as:

 

su hdfs

hdfs fsck / -files -blocks -locations | grep blk_1081495827_7755233 

 

we not get any results 

 

so I guess its mean that blk_xxxxx_xxxx isnt exist in HDFS file-system 

 

what next ?

Michael-Bronson

avatar
Expert Contributor

1. Is the job failed due to above reason?

If "NO" - then is it the error occurring displayed in logs for all spark jobs  or just for this job?