Support Questions
Find answers, ask questions, and share your expertise

NIFI: PutHDFS processor writing 0 bytes

Highlighted

NIFI: PutHDFS processor writing 0 bytes

New Contributor

Hi Guys,

I am new on NIFI. I have 2 instances one is for HDF and another one is for HDP.
I want to ingest data into HDFS but it is not available into my HDF so i created another vm for HDP to use HDFS.
I have researched about data ingestion into HDFS and performed the following steps.

1- Downloaded core-site.xml and hdfs-site.xml from HDP and uploaded into the instance of HDF.
2- Replaced "sandbox-hdp.hortonworks.com" by IP of HDP that is "http://192.168.12.11" in above files.
3- Added a property in to hdfs-site.xml
<property> <name>dfs.client.use.datanode.hostname</name> <value>true</value> </property>
4- Open the nifi UI and add 2 processors "GetFile" and "PutHDFS" as you can see (image: 1-Error).
5- Configured "PutHDFS" processor as shown in (Image: 2-PutHDFS_Configuration).

6- My source file's location can be seen in (Image: 3-Source (Instance-1))

7- My destination folder has full rights as you can see in (Image: 4-Destination (Instance-2))

The issue is PutHDFS writing 0 bytes (creates an empty file in HDFS) and throws an exception as shown in (image: 1-Error).

Looking for your help.

Thanks,
Mohsin

72721-1-error.jpg

72722-2-puthdfs-configuration.jpg

72723-3-source-instance-1.jpg

72724-4-destination-instance-2.jpg

72725-5-emptyfile.jpg

2 REPLIES 2
Highlighted

Re: NIFI: PutHDFS processor writing 0 bytes

@Mohsin Aqee

Check communication from Nifi to Datanodes. I believe file is created because Nifi is able to talk to Namenode but fails to push content to DataNode. You want to check that the firewall rules/network rules and ports to datanode are working. Also I'm not clear on why you changed dfs.client.use.datanode.hostname=true, maybe you want to test having that false as well.

Finally the full IOException will be printed on the nifi-app.log. If you still have issues after checking all this please share the full error.

Note: if you add a comment to this post make sure you tag my name using @

HTH

Re: NIFI: PutHDFS processor writing 0 bytes

Explorer

@Mohsin Aqee

I am also facing similar issue. PutHDFS processor is writing empty files. My HDFS is running in Kubernetes cluster and namenode and datanodes running on different pods in cluster.

I am able to connect to namenode with external hostname address for namnenode with this -> hdfs://<Kubernetes-ip>:9000 in core-site.xml.

PutHDFS processor not giving me any error if I have this property dfs.client.use.datanode.hostname=true but if it is false then I would get IOException as below:

Caused by: org.apache.hadoop.ipc.RemoteException: File /.test.txt could only be replicated to 0 nodes instead of minReplication (=1).  There are 2 datanode(s) running and 2 node(s) are excluded in this operation.
at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1547)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getNewBlockTargets(FSNamesystem.java:3107)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3031)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:724)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:492)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)

I think this means its not able to connect to internal hostname in cluster,. Hence I gave external address to datanode port in hdfs-site.xml but still didn't work.

I have knox gateway in my cluster too. Do you know if I can write files with webhdfs via knox using Nifi?