Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

How can we fetch files from a HDFS to local machine using NiFi running on a HDFS cluster.?

avatar
Super Collaborator

I have my NIFI running on a remote machine which i access from my local machine using http://remoteurl:9090/nifi

i could fetch files from HDFS using List\FetchHDFS and if i want to PutFiles on to my local machine , it will not work as NiFi is running on remote machine and it wont have access to my local folders. i think we can achive this by running another NiFi instance locally on my machine and thru Remote process groups. but our firewall is only open oneway from local to remote. i could call a remote process on remote machine from my local nifi , but i cannot call a remote process on local machine from remote nifi.

any ideas on how to solve?

1 ACCEPTED SOLUTION

avatar
Master Guru

You should be able to have the NiFi on your local machine pull from the NiFi on the remote machine...

Remote machine would have ListHDFS -> FetchHDFS -> Output Port

Local machine would have Remote Process Group pointing to remote NiFI, and then the connection from the output port to whatever you want to do locally.

The remote NiFi will need site-to-site enabled by setting nifi.remote.input.socket.port and that port will also need to be open through the firewall.

View solution in original post

3 REPLIES 3

avatar
Master Guru

You should be able to have the NiFi on your local machine pull from the NiFi on the remote machine...

Remote machine would have ListHDFS -> FetchHDFS -> Output Port

Local machine would have Remote Process Group pointing to remote NiFI, and then the connection from the output port to whatever you want to do locally.

The remote NiFi will need site-to-site enabled by setting nifi.remote.input.socket.port and that port will also need to be open through the firewall.

avatar
Super Mentor

@Saikrishna Tarapareddy

Just want to make sure I understand completely.

You can establish a connection from your local machine out to your remote NiFi; however, you cannot have yoru remote NiFi connect to your local machine. correct?

In this case you would install a NiFi instance on your local machine and the Remote Process Group (RPG) would be added to the canvas on that local NiFi instance. The NiFi instance running the RPG is acting as the client in the connection between NiFi instances. On your remote NiFi instance, your dataflow that is fetching files from your HDFS would need to route those files to an output port located on the root canvas level. (output and input ports allow FlowFiles to transfer from one level up in the dataflow. So at the root level they allow you to interface with another NiFi.)

For this transfer to work your local instance of NiFi will need to be able to communicate with the http(s) port of your remote NiFi instance (NCM http(s) port if remote is a NiFi cluster). Your local instance will also need to be able to communicate with the configured Site-To-Site (S2S) port on your remote instance (Need to be able to communicate with S2S port on every Node if remote is a NiFi cluster).

nifi.properties file

# Site to Site properties
nifi.remote.input.socket.host=<remote instance FQDN>
nifi.remote.input.socket.port=<S2S port number>

The dataflow on your remote NiFi would look something like this:

7105-screen-shot-2016-08-30-at-45056-pm.png

The dataflow on your local NiFi would look something like this:

7106-screen-shot-2016-08-30-at-45158-pm.png

As you can see in this setup the local NiFi is establishing the connection to the remote NiFi and pulling the data from the output port "outLocal".

Thanks, Matt

avatar
Super Collaborator

@mclark @Bryan Bende

I tried the same thing , I had the List-->Fetch-->Output port inside a process group on remote machine.

My local NiFi was not able to find that port , But when I removed it from the process group and copied on to main canvas then my local NiFi was able to find it.

Thank you both.