Support Questions
Find answers, ask questions, and share your expertise

How to use putHDFS to ingest the data in HDP 5 node cluster via Nifi running on Hortonworks Sandbox

So, I have Nifi running on sandbox and using GetHTTP to ingest the data from server X, but I want this data to be ingested into my HDP2.4 5 node cluster ( not sandbox ). I have used putHDFS processor in nifi and provided hdfs-site.xml and core-site.xml of HDP cluster rather than sandbox hadoop related xmls.

FYI: core-site.xml and hdfs-site.xml pointing to my HDP 5 node cluster with their public IPs.

I am able to ingest the data from server X but putHDFS is unable to write data in HDP cluster nodes.

Getting the IO expcetion like ...

File /directory_of_hdp_cluster/subdirectory could oly be replicated to 0 nodes instead minReplication (=1). There are 3 datanode(s) running and 3 node(s) are excluded in this operation.

However I have checked the above error on google and given solutions arn't working in my case. My all data nodes are running perfectly fine though.

Screenshot attached.



Usually this means that the server where NiFi is running can't access the servers where the data nodes are running. So basically NiFi communicated with the namenode of your HDP cluster, which told it about which data nodes to store the file on, and then NiFi couldn't communicate with those data nodes.

I would check all your network connectivity from the sandbox to each data node, make sure any hostnames and IP addresses referenced in core-site.xml and hdfs-site.xml are reachable from the sandbox.

Super Guru

Can you access those servers from the NiFi server (log in as the same user as NIFI) and try to curl the WebHDFS port. You can also install the HDFS command line client and see if you can access.

You have to use the same IP and port you would use from any other HDFS client.

Generally something like this: hdfs://server.stuff:8020

I think your HDFS may be setup wrong for that directory.

Recreate that directory from the command line and manually push to there from your HDP cluster.

hdfs dfs -rmdir --ignore-fail-on-non-empty /directory_of_hdp_cluster/subdirectory

hdfs dfs -mkdir -p /directory_of_hdp_cluster/subdirectory

hdfs dfs -chmod -R 777 /directory_of_hdp_cluster/subdirectory

hdfs dfs -chown -R admin:hdfs /directory_of_hdp_cluster/subdirectory

echo "MyFileIsAwesome" > test.txt

hdfs dfs -put test.txt /directory_of_hdp_cluster/subdirectory

Check the permissions make sure they look the same as other working directories in HDFS

hdfs dfs -ls /