Support Questions

Find answers, ask questions, and share your expertise

Unable to copy files from NiFi to HDFS

avatar
Explorer

I used NiFi from HDF sandbox (PC A) to copy files to HDFS of HDP sandbox (PC B). NiFi can read files list from HDFS without any error. The flowchart is shown below:

110091-1564116989634.png


However, NiFi can not get files or put files on HDFS, the errors are shown below:

GetHDFS processor error:

110072-1564117162241.png

PutHDFS processor error:

110082-1564117011391.png

Can anyone help me? Thank you very much.

1 ACCEPTED SOLUTION

avatar
Master Mentor

@Figo C

The reason is by design NiFi as a client communicates with HDFS Namenode on port 8020 and it returns the location of the files using the data node which is a private address. Now that both your HDF and HDF are sandboxes I think you should switch both to host-only-adapter your stack trace will be a statement that the client can’t connect to the data node, and it will list the internal IP instead of 127.0.0.1. That causes the minReplication issue, etc.

Change the HDP and HDF sandbox VM network settings from NAT to Host-only Adapter.

Here are the steps:

1. Shutdown gracefully the HDF sandbox

2. Change Sandbox VM network from NAT to Host-only Adapter It will automatically pick your LAN or wireless save the config.

3. Restart Sandbox VM

4. Log in to the Sandbox VM and use ifconfig command to get its IP address, in my case 192.168.0.45

5. Add the entry in /etc/hosts on my host machine, in my case: 192.168.0.45 sandbox.hortonworks.com

6. Check connectivity by telnet: telnet sandbox.hortonworks.com 8020

7. Restart NiFi (HDF)

By default HDFS clients connect to DataNodes using the IP address provided by the NameNode. Depending on the network configuration this IP address may be unreachable by the clients. The fix is letting clients perform their own DNS resolution of the DataNode hostname. The following setting enables this behavior.

If the above still fails make the below changes in the hdfs-site.xml that NiFi is using set dfs.client.use.datanode.hostname to true in your

<property>
  <name>dfs.client.use.datanode.hostname</name>
  <value>true</value>
  <description>Whether clients should use datanode hostnames when
    connecting to datanodes.
  </description>
</property>


Hope that helps

View solution in original post

9 REPLIES 9

avatar
Master Mentor

@Figo C

Can you check running status/logs of datanode/namenode and copy-paste it here. Did ou add these 2 files to your nifi config Core-site.xml and hdfs-site.xml

avatar
Explorer

Hi Geoffrey,

Thank you for your reply. Yes, I added these two files in NiFi. So ListHDFS processor in NiFi is able to show the list of files in HDFS. But PutHDFS can not put files in HDFS. I guess that the communication between NiFi and the namenode is ok, but something wrong between NiFi and the datanode. I can not know what configurations would affect that.

Core-site.xml and hdfs-site.xml are attached.

hdfs-site.xmlcore-site.xml

The statuses from Ambari are shown below:

110073-1564266231334.png

Do you know where to get logs of Namenode and Datanode?

avatar
Explorer

log of datanode:

110068-1564280499561.png

log of namenode:

110069-1564280676756.png

avatar
Explorer

Core-site.xml and hdfs-site.xml are attached.

hdfs-site.xml

core-site.xml

avatar
Master Mentor

@Figo C

The reason is by design NiFi as a client communicates with HDFS Namenode on port 8020 and it returns the location of the files using the data node which is a private address. Now that both your HDF and HDF are sandboxes I think you should switch both to host-only-adapter your stack trace will be a statement that the client can’t connect to the data node, and it will list the internal IP instead of 127.0.0.1. That causes the minReplication issue, etc.

Change the HDP and HDF sandbox VM network settings from NAT to Host-only Adapter.

Here are the steps:

1. Shutdown gracefully the HDF sandbox

2. Change Sandbox VM network from NAT to Host-only Adapter It will automatically pick your LAN or wireless save the config.

3. Restart Sandbox VM

4. Log in to the Sandbox VM and use ifconfig command to get its IP address, in my case 192.168.0.45

5. Add the entry in /etc/hosts on my host machine, in my case: 192.168.0.45 sandbox.hortonworks.com

6. Check connectivity by telnet: telnet sandbox.hortonworks.com 8020

7. Restart NiFi (HDF)

By default HDFS clients connect to DataNodes using the IP address provided by the NameNode. Depending on the network configuration this IP address may be unreachable by the clients. The fix is letting clients perform their own DNS resolution of the DataNode hostname. The following setting enables this behavior.

If the above still fails make the below changes in the hdfs-site.xml that NiFi is using set dfs.client.use.datanode.hostname to true in your

<property>
  <name>dfs.client.use.datanode.hostname</name>
  <value>true</value>
  <description>Whether clients should use datanode hostnames when
    connecting to datanodes.
  </description>
</property>


Hope that helps

avatar
Explorer

Hi Geoffrey,

Thank you very much. The problem is solved. I found that 50010 port (data transfer port to the HDFS Datanode ) is not added to HDP docker container. But the 50070 port (port to the HDFS namenode) is added. So I can read the metadata of Files in HDFS with the ListHDFS processor but can not get from HDFS and put files to HDFS with the two processors.

I am using VirtualBox HDP sandbox, so I still use the default NAT network but changed the IP addresses to the current localhost IP at the setting of Port forwarding.

And access to docker host terminal:

root@sandbox-hdp.hortonworks.com 2200

Modified the sandbox proxy file generate-proxy-deploy-script.sh (/sandbox/proxy)

Add 50010 port as follow, then run the shell program, ./ generate-proxy-deploy-script.sh,

110070-1564360642248.png

Then the proxy-deploy program is updated;

Then run the shell (./proxy-deploy.sh) to generate a proxy container and start it (docker start sandbox-proxy).

Run NiFi:

110098-1564360793410.png

Files are loaded in HDFS at HDP and fetched from HDFS to NiFi local directory:


avatar
Explorer

Hi,

I'm very new with HDP sandbox. I have same issue and I'm trying to solve it. I tried changing network settings from NAT to Host Only but it doesn't work, I have this while the VM is starting:

 

himock_0-1632936081775.png

 

now I'm trying your solution, but what do you mean with "I still use the default NAT network but changed the IP addresses to the current localhost IP at the setting of Port forwarding". How can I do it?

 

Thank you so much

avatar
Community Manager

@himock as this is an older post, you would have a better chance of receiving a resolution by starting a new thread. This will also be an opportunity to provide details specific to your environment that could aid others in assisting you with a more accurate answer to your question. You can link this thread as a reference in your new post.



Regards,

Vidya Sargur,
Community Manager


Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Learn more about the Cloudera Community:

avatar
New Contributor

it works!!!