Created on 05-10-2018 08:46 AM - edited 08-18-2019 12:47 AM
Hi Guys,
I am new on NIFI. I have 2 instances one is for HDF and another one is for HDP.
I want to ingest data into HDFS but it is not available into my HDF so i created another vm for HDP to use HDFS.
I have researched about data ingestion into HDFS and performed the following steps.
1- Downloaded core-site.xml and hdfs-site.xml from HDP and uploaded into the instance of HDF.
2- Replaced "sandbox-hdp.hortonworks.com" by IP of HDP that is "http://192.168.12.11" in above files.
3- Added a property in to hdfs-site.xml
<property>
<name>dfs.client.use.datanode.hostname</name>
<value>true</value>
</property>
4- Open the nifi UI and add 2 processors "GetFile" and "PutHDFS" as you can see (image: 1-Error).
5- Configured "PutHDFS" processor as shown in (Image: 2-PutHDFS_Configuration).
6- My source file's location can be seen in (Image: 3-Source (Instance-1))
7- My destination folder has full rights as you can see in (Image: 4-Destination (Instance-2))
The issue is PutHDFS writing 0 bytes (creates an empty file in HDFS) and throws an exception as shown in (image: 1-Error).
Looking for your help.
Thanks,
Mohsin
Created 05-10-2018 11:58 AM
Check communication from Nifi to Datanodes. I believe file is created because Nifi is able to talk to Namenode but fails to push content to DataNode. You want to check that the firewall rules/network rules and ports to datanode are working. Also I'm not clear on why you changed dfs.client.use.datanode.hostname=true, maybe you want to test having that false as well.
Finally the full IOException will be printed on the nifi-app.log. If you still have issues after checking all this please share the full error.
Note: if you add a comment to this post make sure you tag my name using @
HTH
Created 05-17-2018 07:42 PM
I am also facing similar issue. PutHDFS processor is writing empty files. My HDFS is running in Kubernetes cluster and namenode and datanodes running on different pods in cluster.
I am able to connect to namenode with external hostname address for namnenode with this -> hdfs://<Kubernetes-ip>:9000 in core-site.xml.
PutHDFS processor not giving me any error if I have this property dfs.client.use.datanode.hostname=true but if it is false then I would get IOException as below:
Caused by: org.apache.hadoop.ipc.RemoteException: File /.test.txt could only be replicated to 0 nodes instead of minReplication (=1). There are 2 datanode(s) running and 2 node(s) are excluded in this operation. at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1547) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getNewBlockTargets(FSNamesystem.java:3107) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3031) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:724) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:492) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)
I think this means its not able to connect to internal hostname in cluster,. Hence I gave external address to datanode port in hdfs-site.xml but still didn't work.
I have knox gateway in my cluster too. Do you know if I can write files with webhdfs via knox using Nifi?