Support Questions

Find answers, ask questions, and share your expertise

distcp failing intermittently to copy the file from one HDFS and another HDFS

avatar
Contributor

Hi All,

 

I am running distcp command which copies all the audit logs HDFS folder to another HDFS folder for further processing purpose . 

 

The distcp command used to work fine till 2 weeks ago and started failing since last week .I checked detailed MR logs and understand that only particular file copy failed and other folder/files of audit logs like kafka,hive,nifi and hbase are copied . some specific files copy processing is failing.

 

distcp command :

hadoop distcp -filters $filter_file_loc ranger/audit /data/audit_logs/staging

 

Distribution : Cloudera Data Platform version 7.1.7

Please find the detail error messages .

 

java.io.IOException: File copy failed: hdfs://namenode/ranger/audit/kafka/kafka/20210927/kafka_ranger_audit_svl.host.int.log --> hdfs://namenode/data/audit_logs/staging/audit/kafka/kafka/20210927/kafka_ranger_audit_svl.host.int.log

Caused by: org.apache.hadoop.hdfs.CannotObtainBlockLengthException: Cannot obtain block length for LocatedBlock{BP-1024772623-10.107.146.29-1593441936031:blk_1183449574_109711397; getBlockSize()=64553182; corrupt=false; offset=0; locs=[DatanodeInfoWithStorage[10.107.145.208:9866,DS-b11e932b-0460-47b7-a281-3743ecf9c581,DISK]]} of /ranger/audit/kafka/kafka/20210927/kafka_ranger_audit_svl.host.int.log
	at org.apache.hadoop.hdfs.DFSInputStream.readBlockLength(DFSInputStream.java:370)
	at org.apache.hadoop.hdfs.DFSInputStream.getLastBlockLength(DFSInputStream.java:279)
	at org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:260)
	at org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:203)
	at org.apache.hadoop.hdfs.DFSInputStream.<init>(DFSInputStream.java:187)
	at org.apache.hadoop.hdfs.DFSClient.openInternal(DFSClient.java:1056)
	at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1019)
	at org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:338)
	at org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:334)
	at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
	at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:351)
	at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:954)
	at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.getInputStream(RetriableFileCopyCommand.java:331)

@distcp

2 REPLIES 2

avatar
Expert Contributor

Hi, 

 

I can see the error as "

Caused by: org.apache.hadoop.hdfs.CannotObtainBlockLengthException: Cannot obtain block length for LocatedBlock" 

This basically happens because the file is still in being-written state or has yet not been closed. Please check if the file is in use or is being written during the distcp.

avatar
Contributor

Hi @arunek95 

 

Yes,the workaround has been applied by following the community posts. As of now .we don't have any root-cause why many files were in OPENFORWRITE state for particular two days in our cluster.

 

https://community.cloudera.com/t5/Support-Questions/Cannot-obtain-block-length-for-LocatedBlock/td-p...

 

Thanks