Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

NIFI data from kafka to hdfs

NIFI data from kafka to hdfs

New Contributor

I use getkafka--updataAttribute--Puthdfs .And the Conflict Resolution Strategy is append

However there is something wrong when data in hdfs reached about 20M.

The log tells me is a problem about data blocks.

I also used a MergeContent Processor before the PutHdfs ,but still got the problem.

I found that sometimes the data lose when got error ,not route to failure.

Any help is much appreciated!! Thank you.

2 REPLIES 2
Highlighted

Re: NIFI data from kafka to hdfs

Rising Star

@marson chu Can you post the log details? That would be helpful.

Re: NIFI data from kafka to hdfs

New Contributor

2017-03-22 19:31:59,706 INFO [Write-Ahead Local State Provider Maintenance] org.wali.MinimalLockingWriteAheadLog org.wali.MinimalLockingWriteAheadLog@7bc3c59f checkpointed with 14 Records and 0 Swap Files in 7 milliseconds (Stop-the-world time = 1 milliseconds, Clear Edit Logs time = 0 millis), max Transaction ID 55 2017-03-22 19:32:03,351 INFO [Thread-51849] org.apache.hadoop.hdfs.DFSClient Exception in createBlockOutputStream java.io.EOFException: Premature EOF: no length prefix available at org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(PBHelper.java:2282) ~[hadoop-hdfs-2.7.3.jar:na] at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1343) [hadoop-hdfs-2.7.3.jar:na] at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1184) [hadoop-hdfs-2.7.3.jar:na] at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:454) [hadoop-hdfs-2.7.3.jar:na] 2017-03-22 19:32:03,352 WARN [Thread-51849] org.apache.hadoop.hdfs.DFSClient Error Recovery for block BP-1541383466-192.168.78.84-1489658920621:blk_1073772950_293619 in pipeline DatanodeInfoWithStorage[192.168.78.84:50010,DS-c9f30077-6122-48c1-bd02-9226498edacd,DISK], DatanodeInfoWithStorage[192.168.78.87:50010,DS-410c9e77-803d-43ad-ae83-b38d50842f96,DISK], DatanodeInfoWithStorage[192.168.78.86:50010,DS-5f1f2258-1e87-4f52-ac46-a0fece7c24bb,DISK]: bad datanode DatanodeInfoWithStorage[192.168.78.84:50010,DS-c9f30077-6122-48c1-bd02-9226498edacd,DISK] 2017-03-22 19:32:03,390 INFO [Thread-51849] org.apache.hadoop.hdfs.DFSClient Exception in createBlockOutputStream

java.io.IOException: Got error, status message , ack with firstBadLink as 192.168.78.85:50010 at org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:142) ~[hadoop-hdfs-2.7.3.jar:na] at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1359) [hadoop-hdfs-2.7.3.jar:na] at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1184) [hadoop-hdfs-2.7.3.jar:na] at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:454) [hadoop-hdfs-2.7.3.jar:na] 2017-03-22 19:32:03,390 WARN [Thread-51849] org.apache.hadoop.hdfs.DFSClient Error Recovery for block BP-1541383466-192.168.78.84-1489658920621:blk_1073772950_293619 in pipeline DatanodeInfoWithStorage[192.168.78.87:50010,DS-410c9e77-803d-43ad-ae83-b38d50842f96,DISK], DatanodeInfoWithStorage[192.168.78.86:50010,DS-5f1f2258-1e87-4f52-ac46-a0fece7c24bb,DISK], DatanodeInfoWithStorage[192.168.78.85:50010,DS-f33972da-8d93-4edd-9c14-6a956973b7a2,DISK]: bad datanode DatanodeInfoWithStorage[192.168.78.85:50010,DS-f33972da-8d93-4edd-9c14-6a956973b7a2,DISK]

2017-03-22 19:32:03,392 WARN [Thread-51849] org.apache.hadoop.hdfs.DFSClient DataStreamer Exception java.io.IOException: Failed to replace a bad datanode on the existing pipeline due to no more good datanodes being available to try. (Nodes: current=[DatanodeInfoWithStorage[192.168.78.87:50010,DS-410c9e77-803d-43ad-ae83-b38d50842f96,DISK], DatanodeInfoWithStorage[192.168.78.86:50010,DS-5f1f2258-1e87-4f52-ac46-a0fece7c24bb,DISK]], original=[DatanodeInfoWithStorage[192.168.78.87:50010,DS-410c9e77-803d-43ad-ae83-b38d50842f96,DISK], DatanodeInfoWithStorage[192.168.78.86:50010,DS-5f1f2258-1e87-4f52-ac46-a0fece7c24bb,DISK]]). The current failed datanode replacement policy is DEFAULT, and a client may configure this via 'dfs.client.block.write.replace-datanode-on-failure.policy' in its configuration.

at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:925) ~[hadoop-hdfs-2.7.3.jar:na] at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:988) ~[hadoop-hdfs-2.7.3.jar:na] at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1156) ~[hadoop-hdfs-2.7.3.jar:na] at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:454) ~[hadoop-hdfs-2.7.3.jar:na] 2017-03-22 19:32:03,437 ERROR [Timer-Driven Process Thread-5] o.apache.nifi.processors.hadoop.PutHDFS PutHDFS[id=f5aaac3a-015a-1000-4930-d89685499d91] Failed to write to HDFS due to org.apache.nifi.processor.exception.ProcessException: IOException thrown from PutHDFS[id=f5aaac3a-015a-1000-4930-d89685499d91]: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException): failed to create file /user/root/pstest/hdfs-1/322 for DFSClient_NONMAPREDUCE_1319066147_88 for client 192.168.78.87 because current leaseholder is trying to recreate file.