Created on 10-03-2021 05:26 AM - edited 10-03-2021 05:31 AM
Hi All,
I am running distcp command which copies all the audit logs HDFS folder to another HDFS folder for further processing purpose .
The distcp command used to work fine till 2 weeks ago and started failing since last week .I checked detailed MR logs and understand that only particular file copy failed and other folder/files of audit logs like kafka,hive,nifi and hbase are copied . some specific files copy processing is failing.
distcp command :
hadoop distcp -filters $filter_file_loc ranger/audit /data/audit_logs/staging
Distribution : Cloudera Data Platform version 7.1.7
Please find the detail error messages .
java.io.IOException: File copy failed: hdfs://namenode/ranger/audit/kafka/kafka/20210927/kafka_ranger_audit_svl.host.int.log --> hdfs://namenode/data/audit_logs/staging/audit/kafka/kafka/20210927/kafka_ranger_audit_svl.host.int.log
Caused by: org.apache.hadoop.hdfs.CannotObtainBlockLengthException: Cannot obtain block length for LocatedBlock{BP-1024772623-10.107.146.29-1593441936031:blk_1183449574_109711397; getBlockSize()=64553182; corrupt=false; offset=0; locs=[DatanodeInfoWithStorage[10.107.145.208:9866,DS-b11e932b-0460-47b7-a281-3743ecf9c581,DISK]]} of /ranger/audit/kafka/kafka/20210927/kafka_ranger_audit_svl.host.int.log at org.apache.hadoop.hdfs.DFSInputStream.readBlockLength(DFSInputStream.java:370) at org.apache.hadoop.hdfs.DFSInputStream.getLastBlockLength(DFSInputStream.java:279) at org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:260) at org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:203) at org.apache.hadoop.hdfs.DFSInputStream.<init>(DFSInputStream.java:187) at org.apache.hadoop.hdfs.DFSClient.openInternal(DFSClient.java:1056) at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1019) at org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:338) at org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:334) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:351) at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:954) at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.getInputStream(RetriableFileCopyCommand.java:331)
@distcp
Created 10-11-2021 04:32 AM
Hi,
I can see the error as "
Caused by: org.apache.hadoop.hdfs.CannotObtainBlockLengthException: Cannot obtain block length for LocatedBlock"
This basically happens because the file is still in being-written state or has yet not been closed. Please check if the file is in use or is being written during the distcp.
Created 10-13-2021 05:47 AM
Hi @arunek95
Yes,the workaround has been applied by following the community posts. As of now .we don't have any root-cause why many files were in OPENFORWRITE state for particular two days in our cluster.
Thanks