Support Questions
Find answers, ask questions, and share your expertise

Unable to copy files from NiFi to HDFS

Solved Go to solution

Unable to copy files from NiFi to HDFS

Explorer

I used NiFi from HDF sandbox (PC A) to copy files to HDFS of HDP sandbox (PC B). NiFi can read files list from HDFS without any error. The flowchart is shown below:

110091-1564116989634.png


However, NiFi can not get files or put files on HDFS, the errors are shown below:

GetHDFS processor error:

110072-1564117162241.png

PutHDFS processor error:

110082-1564117011391.png

Can anyone help me? Thank you very much.

1 ACCEPTED SOLUTION

Accepted Solutions

Re: Unable to copy files from NiFi to HDFS

Mentor

@Figo C

The reason is by design NiFi as a client communicates with HDFS Namenode on port 8020 and it returns the location of the files using the data node which is a private address. Now that both your HDF and HDF are sandboxes I think you should switch both to host-only-adapter your stack trace will be a statement that the client can’t connect to the data node, and it will list the internal IP instead of 127.0.0.1. That causes the minReplication issue, etc.

Change the HDP and HDF sandbox VM network settings from NAT to Host-only Adapter.

Here are the steps:

1. Shutdown gracefully the HDF sandbox

2. Change Sandbox VM network from NAT to Host-only Adapter It will automatically pick your LAN or wireless save the config.

3. Restart Sandbox VM

4. Log in to the Sandbox VM and use ifconfig command to get its IP address, in my case 192.168.0.45

5. Add the entry in /etc/hosts on my host machine, in my case: 192.168.0.45 sandbox.hortonworks.com

6. Check connectivity by telnet: telnet sandbox.hortonworks.com 8020

7. Restart NiFi (HDF)

By default HDFS clients connect to DataNodes using the IP address provided by the NameNode. Depending on the network configuration this IP address may be unreachable by the clients. The fix is letting clients perform their own DNS resolution of the DataNode hostname. The following setting enables this behavior.

If the above still fails make the below changes in the hdfs-site.xml that NiFi is using set dfs.client.use.datanode.hostname to true in your

<property>
  <name>dfs.client.use.datanode.hostname</name>
  <value>true</value>
  <description>Whether clients should use datanode hostnames when
    connecting to datanodes.
  </description>
</property>


Hope that helps

View solution in original post

6 REPLIES 6

Re: Unable to copy files from NiFi to HDFS

Mentor

@Figo C

Can you check running status/logs of datanode/namenode and copy-paste it here. Did ou add these 2 files to your nifi config Core-site.xml and hdfs-site.xml

Re: Unable to copy files from NiFi to HDFS

Explorer

Hi Geoffrey,

Thank you for your reply. Yes, I added these two files in NiFi. So ListHDFS processor in NiFi is able to show the list of files in HDFS. But PutHDFS can not put files in HDFS. I guess that the communication between NiFi and the namenode is ok, but something wrong between NiFi and the datanode. I can not know what configurations would affect that.

Core-site.xml and hdfs-site.xml are attached.

hdfs-site.xmlcore-site.xml

The statuses from Ambari are shown below:

110073-1564266231334.png

Do you know where to get logs of Namenode and Datanode?

Re: Unable to copy files from NiFi to HDFS

Explorer

log of datanode:

110068-1564280499561.png

log of namenode:

110069-1564280676756.png

Re: Unable to copy files from NiFi to HDFS

Explorer

Core-site.xml and hdfs-site.xml are attached.

hdfs-site.xml

core-site.xml

Re: Unable to copy files from NiFi to HDFS

Mentor

@Figo C

The reason is by design NiFi as a client communicates with HDFS Namenode on port 8020 and it returns the location of the files using the data node which is a private address. Now that both your HDF and HDF are sandboxes I think you should switch both to host-only-adapter your stack trace will be a statement that the client can’t connect to the data node, and it will list the internal IP instead of 127.0.0.1. That causes the minReplication issue, etc.

Change the HDP and HDF sandbox VM network settings from NAT to Host-only Adapter.

Here are the steps:

1. Shutdown gracefully the HDF sandbox

2. Change Sandbox VM network from NAT to Host-only Adapter It will automatically pick your LAN or wireless save the config.

3. Restart Sandbox VM

4. Log in to the Sandbox VM and use ifconfig command to get its IP address, in my case 192.168.0.45

5. Add the entry in /etc/hosts on my host machine, in my case: 192.168.0.45 sandbox.hortonworks.com

6. Check connectivity by telnet: telnet sandbox.hortonworks.com 8020

7. Restart NiFi (HDF)

By default HDFS clients connect to DataNodes using the IP address provided by the NameNode. Depending on the network configuration this IP address may be unreachable by the clients. The fix is letting clients perform their own DNS resolution of the DataNode hostname. The following setting enables this behavior.

If the above still fails make the below changes in the hdfs-site.xml that NiFi is using set dfs.client.use.datanode.hostname to true in your

<property>
  <name>dfs.client.use.datanode.hostname</name>
  <value>true</value>
  <description>Whether clients should use datanode hostnames when
    connecting to datanodes.
  </description>
</property>


Hope that helps

View solution in original post

Re: Unable to copy files from NiFi to HDFS

Explorer

Hi Geoffrey,

Thank you very much. The problem is solved. I found that 50010 port (data transfer port to the HDFS Datanode ) is not added to HDP docker container. But the 50070 port (port to the HDFS namenode) is added. So I can read the metadata of Files in HDFS with the ListHDFS processor but can not get from HDFS and put files to HDFS with the two processors.

I am using VirtualBox HDP sandbox, so I still use the default NAT network but changed the IP addresses to the current localhost IP at the setting of Port forwarding.

And access to docker host terminal:

root@sandbox-hdp.hortonworks.com 2200

Modified the sandbox proxy file generate-proxy-deploy-script.sh (/sandbox/proxy)

Add 50010 port as follow, then run the shell program, ./ generate-proxy-deploy-script.sh,

110070-1564360642248.png

Then the proxy-deploy program is updated;

Then run the shell (./proxy-deploy.sh) to generate a proxy container and start it (docker start sandbox-proxy).

Run NiFi:

110098-1564360793410.png

Files are loaded in HDFS at HDP and fetched from HDFS to NiFi local directory: