Created 03-22-2017 10:12 AM
I use getkafka--updataAttribute--Puthdfs .And the Conflict Resolution Strategy is append
However there is something wrong when data in hdfs reached about 20M.
The log tells me is a problem about data blocks.
I also used a MergeContent Processor before the PutHdfs ,but still got the problem.
I found that sometimes the data lose when got error ,not route to failure.
Any help is much appreciated!! Thank you.
Created 03-22-2017 06:49 PM
@marson chu Can you post the log details? That would be helpful.
Created 03-23-2017 01:00 AM
2017-03-22 19:31:59,706 INFO [Write-Ahead Local State Provider Maintenance] org.wali.MinimalLockingWriteAheadLog org.wali.MinimalLockingWriteAheadLog@7bc3c59f checkpointed with 14 Records and 0 Swap Files in 7 milliseconds (Stop-the-world time = 1 milliseconds, Clear Edit Logs time = 0 millis), max Transaction ID 55 2017-03-22 19:32:03,351 INFO [Thread-51849] org.apache.hadoop.hdfs.DFSClient Exception in createBlockOutputStream java.io.EOFException: Premature EOF: no length prefix available at org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(PBHelper.java:2282) ~[hadoop-hdfs-2.7.3.jar:na] at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1343) [hadoop-hdfs-2.7.3.jar:na] at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1184) [hadoop-hdfs-2.7.3.jar:na] at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:454) [hadoop-hdfs-2.7.3.jar:na] 2017-03-22 19:32:03,352 WARN [Thread-51849] org.apache.hadoop.hdfs.DFSClient Error Recovery for block BP-1541383466-192.168.78.84-1489658920621:blk_1073772950_293619 in pipeline DatanodeInfoWithStorage[192.168.78.84:50010,DS-c9f30077-6122-48c1-bd02-9226498edacd,DISK], DatanodeInfoWithStorage[192.168.78.87:50010,DS-410c9e77-803d-43ad-ae83-b38d50842f96,DISK], DatanodeInfoWithStorage[192.168.78.86:50010,DS-5f1f2258-1e87-4f52-ac46-a0fece7c24bb,DISK]: bad datanode DatanodeInfoWithStorage[192.168.78.84:50010,DS-c9f30077-6122-48c1-bd02-9226498edacd,DISK] 2017-03-22 19:32:03,390 INFO [Thread-51849] org.apache.hadoop.hdfs.DFSClient Exception in createBlockOutputStream
java.io.IOException: Got error, status message , ack with firstBadLink as 192.168.78.85:50010 at org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:142) ~[hadoop-hdfs-2.7.3.jar:na] at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1359) [hadoop-hdfs-2.7.3.jar:na] at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1184) [hadoop-hdfs-2.7.3.jar:na] at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:454) [hadoop-hdfs-2.7.3.jar:na] 2017-03-22 19:32:03,390 WARN [Thread-51849] org.apache.hadoop.hdfs.DFSClient Error Recovery for block BP-1541383466-192.168.78.84-1489658920621:blk_1073772950_293619 in pipeline DatanodeInfoWithStorage[192.168.78.87:50010,DS-410c9e77-803d-43ad-ae83-b38d50842f96,DISK], DatanodeInfoWithStorage[192.168.78.86:50010,DS-5f1f2258-1e87-4f52-ac46-a0fece7c24bb,DISK], DatanodeInfoWithStorage[192.168.78.85:50010,DS-f33972da-8d93-4edd-9c14-6a956973b7a2,DISK]: bad datanode DatanodeInfoWithStorage[192.168.78.85:50010,DS-f33972da-8d93-4edd-9c14-6a956973b7a2,DISK]
2017-03-22 19:32:03,392 WARN [Thread-51849] org.apache.hadoop.hdfs.DFSClient DataStreamer Exception java.io.IOException: Failed to replace a bad datanode on the existing pipeline due to no more good datanodes being available to try. (Nodes: current=[DatanodeInfoWithStorage[192.168.78.87:50010,DS-410c9e77-803d-43ad-ae83-b38d50842f96,DISK], DatanodeInfoWithStorage[192.168.78.86:50010,DS-5f1f2258-1e87-4f52-ac46-a0fece7c24bb,DISK]], original=[DatanodeInfoWithStorage[192.168.78.87:50010,DS-410c9e77-803d-43ad-ae83-b38d50842f96,DISK], DatanodeInfoWithStorage[192.168.78.86:50010,DS-5f1f2258-1e87-4f52-ac46-a0fece7c24bb,DISK]]). The current failed datanode replacement policy is DEFAULT, and a client may configure this via 'dfs.client.block.write.replace-datanode-on-failure.policy' in its configuration.
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:925) ~[hadoop-hdfs-2.7.3.jar:na] at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:988) ~[hadoop-hdfs-2.7.3.jar:na] at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1156) ~[hadoop-hdfs-2.7.3.jar:na] at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:454) ~[hadoop-hdfs-2.7.3.jar:na] 2017-03-22 19:32:03,437 ERROR [Timer-Driven Process Thread-5] o.apache.nifi.processors.hadoop.PutHDFS PutHDFS[id=f5aaac3a-015a-1000-4930-d89685499d91] Failed to write to HDFS due to org.apache.nifi.processor.exception.ProcessException: IOException thrown from PutHDFS[id=f5aaac3a-015a-1000-4930-d89685499d91]: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException): failed to create file /user/root/pstest/hdfs-1/322 for DFSClient_NONMAPREDUCE_1319066147_88 for client 192.168.78.87 because current leaseholder is trying to recreate file.