Support Questions
Find answers, ask questions, and share your expertise

NIFI: PutHDFS processor writing 0 bytes


NIFI: PutHDFS processor writing 0 bytes

New Contributor

Hi Guys,

I am new on NIFI. I have 2 instances one is for HDF and another one is for HDP.
I want to ingest data into HDFS but it is not available into my HDF so i created another vm for HDP to use HDFS.
I have researched about data ingestion into HDFS and performed the following steps.

1- Downloaded core-site.xml and hdfs-site.xml from HDP and uploaded into the instance of HDF.
2- Replaced "" by IP of HDP that is "" in above files.
3- Added a property in to hdfs-site.xml
<property> <name>dfs.client.use.datanode.hostname</name> <value>true</value> </property>
4- Open the nifi UI and add 2 processors "GetFile" and "PutHDFS" as you can see (image: 1-Error).
5- Configured "PutHDFS" processor as shown in (Image: 2-PutHDFS_Configuration).

6- My source file's location can be seen in (Image: 3-Source (Instance-1))

7- My destination folder has full rights as you can see in (Image: 4-Destination (Instance-2))

The issue is PutHDFS writing 0 bytes (creates an empty file in HDFS) and throws an exception as shown in (image: 1-Error).

Looking for your help.








Re: NIFI: PutHDFS processor writing 0 bytes

@Mohsin Aqee

Check communication from Nifi to Datanodes. I believe file is created because Nifi is able to talk to Namenode but fails to push content to DataNode. You want to check that the firewall rules/network rules and ports to datanode are working. Also I'm not clear on why you changed dfs.client.use.datanode.hostname=true, maybe you want to test having that false as well.

Finally the full IOException will be printed on the nifi-app.log. If you still have issues after checking all this please share the full error.

Note: if you add a comment to this post make sure you tag my name using @


Re: NIFI: PutHDFS processor writing 0 bytes


@Mohsin Aqee

I am also facing similar issue. PutHDFS processor is writing empty files. My HDFS is running in Kubernetes cluster and namenode and datanodes running on different pods in cluster.

I am able to connect to namenode with external hostname address for namnenode with this -> hdfs://<Kubernetes-ip>:9000 in core-site.xml.

PutHDFS processor not giving me any error if I have this property dfs.client.use.datanode.hostname=true but if it is false then I would get IOException as below:

Caused by: org.apache.hadoop.ipc.RemoteException: File /.test.txt could only be replicated to 0 nodes instead of minReplication (=1).  There are 2 datanode(s) running and 2 node(s) are excluded in this operation.
at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getNewBlockTargets(
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$
at org.apache.hadoop.ipc.RPC$
at org.apache.hadoop.ipc.Server$Handler$
at org.apache.hadoop.ipc.Server$Handler$
at Method)
at org.apache.hadoop.ipc.Server$

I think this means its not able to connect to internal hostname in cluster,. Hence I gave external address to datanode port in hdfs-site.xml but still didn't work.

I have knox gateway in my cluster too. Do you know if I can write files with webhdfs via knox using Nifi?