- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
ListSftp works but FetchSftp doesn't work in Cluster mode
- Labels:
-
Apache NiFi
Created ‎08-09-2016 07:01 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I deployed a 3 nodes cluster in AWS. One of them is NCM.
The embedded zookeeper servers are set in the two work nodes.
The data flow is: ListSftp -> FetchSftp -> PutFile.
The ListSftp is scheduled in the Primary node.
The issue is:
ListSftp works well. The test files are queued before coming into FetchSftp.
The error in FetchSftp is:
18:36:05 UTCERROR5cdfac90-2d07-443e-97b6-b06a1a883a22 172.31.48.155:8080FetchSFTP[id=5cdfac90-2d07-443e-97b6-b06a1a883a22] FetchSFTP[id=5cdfac90-2d07-443e-97b6-b06a1a883a22] failed to process due to org.apache.nifi.processor.exception.ProcessException: IOException thrown from FetchSFTP[id=5cdfac90-2d07-443e-97b6-b06a1a883a22]: java.io.IOException: error; rolling back session: org.apache.nifi.processor.exception.ProcessException: IOException thrown from FetchSFTP[id=5cdfac90-2d07-443e-97b6-b06a1a883a22]: java.io.IOException: error
I tried GetSftp -> PutFile with the same sftp setting. It works well.
I was wondering whether the issue is related with zookeeper or primary node talking with the other work node.
I didn't setup site-to-site property in nifi.properties.
Didn't setup distributed cache service.
How could I get more log details about this processor IOException?
Thanks.
Created ‎08-09-2016 08:57 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You certainly need to configure state-management.xml and fill in the "Connect String"
The Admin guide of the NIFI docs under "Help" link from within the UI has the steps to stand up the embedded zookeeper.
Also not sure if you saw my previous comment it could be you need that:
On FetchSFTP are you putting in:
"${path}/${filename}"
For Remote path setting?
Created ‎08-09-2016 08:52 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Is it because I didn't setup State Provider?
In /conf/state-management.xml
<cluster-provider>
<id>zk-provider</id>
<class>org.apache.nifi.controller.state.providers.zookeeper.ZooKeeperStateProvider</class>
<property name="Connect String"></property>
<property name="Root Node">/nifi</property>
<property name="Session Timeout">30 seconds</property>
<property name="Access Control">CreatorOnly</property>
<property name="Username">nifi</property>
<property name="Password">nifi</property>
</cluster-provider>
Created ‎08-09-2016 08:57 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You certainly need to configure state-management.xml and fill in the "Connect String"
The Admin guide of the NIFI docs under "Help" link from within the UI has the steps to stand up the embedded zookeeper.
Also not sure if you saw my previous comment it could be you need that:
On FetchSFTP are you putting in:
"${path}/${filename}"
For Remote path setting?
Created ‎08-10-2016 01:29 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @jsequeiros
These two settings solved the issue for me.
The IOException refers to zookeeper state reading.
One more question:
In my test, the listSftp(on primary node) sends a list of 4 files to fetchSftp.
However, I found only the primary node fetches all the 4 files rather than evenly distributing tasks to two workers.
Any idea about the task allocation from listSftp to fetchSftp?
Created ‎08-10-2016 02:37 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
In order to distribute your data among your cluster nodes you need to use site to site.
Basically primary gets a list > send the list to a remote processing group ( RPG ) > configure an input port ( Name it something distinctive ) > send that relation to FetchSFTP
*** In order to see your Input port when you connect to the RPG, make sure you have configured the input port.
https://community.hortonworks.com/articles/16120/how-do-i-distribute-data-across-a-nifi-cluster.html
https://docs.hortonworks.com/HDPDocuments/HDF1/HDF-1.2/bk_UserGuide/content/site-to-site.html
Created ‎08-10-2016 06:13 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @jsequeiros
After setting up site-to-site properties on all nodes, the fetchSftp works as expected. Both worker nodes are involved in file fetching.
How do we distribute tasks evenly among workers? since I found sometimes one worker took all the files. Sometimes, one took 3, while the other one took 1. I have total 4 test files.
Thanks.

- « Previous
-
- 1
- 2
- Next »