Created 08-09-2016 07:01 PM
Hello,
I deployed a 3 nodes cluster in AWS. One of them is NCM.
The embedded zookeeper servers are set in the two work nodes.
The data flow is: ListSftp -> FetchSftp -> PutFile.
The ListSftp is scheduled in the Primary node.
The issue is:
ListSftp works well. The test files are queued before coming into FetchSftp.
The error in FetchSftp is:
18:36:05 UTCERROR5cdfac90-2d07-443e-97b6-b06a1a883a22 172.31.48.155:8080FetchSFTP[id=5cdfac90-2d07-443e-97b6-b06a1a883a22] FetchSFTP[id=5cdfac90-2d07-443e-97b6-b06a1a883a22] failed to process due to org.apache.nifi.processor.exception.ProcessException: IOException thrown from FetchSFTP[id=5cdfac90-2d07-443e-97b6-b06a1a883a22]: java.io.IOException: error; rolling back session: org.apache.nifi.processor.exception.ProcessException: IOException thrown from FetchSFTP[id=5cdfac90-2d07-443e-97b6-b06a1a883a22]: java.io.IOException: error
I tried GetSftp -> PutFile with the same sftp setting. It works well.
I was wondering whether the issue is related with zookeeper or primary node talking with the other work node.
I didn't setup site-to-site property in nifi.properties.
Didn't setup distributed cache service.
How could I get more log details about this processor IOException?
Thanks.
Created 08-09-2016 08:57 PM
You certainly need to configure state-management.xml and fill in the "Connect String"
The Admin guide of the NIFI docs under "Help" link from within the UI has the steps to stand up the embedded zookeeper.
Also not sure if you saw my previous comment it could be you need that:
On FetchSFTP are you putting in:
"${path}/${filename}"
For Remote path setting?
Created 08-09-2016 08:52 PM
Is it because I didn't setup State Provider?
In /conf/state-management.xml
<cluster-provider>
<id>zk-provider</id>
<class>org.apache.nifi.controller.state.providers.zookeeper.ZooKeeperStateProvider</class>
<property name="Connect String"></property>
<property name="Root Node">/nifi</property>
<property name="Session Timeout">30 seconds</property>
<property name="Access Control">CreatorOnly</property>
<property name="Username">nifi</property>
<property name="Password">nifi</property>
</cluster-provider>
Created 08-09-2016 08:57 PM
You certainly need to configure state-management.xml and fill in the "Connect String"
The Admin guide of the NIFI docs under "Help" link from within the UI has the steps to stand up the embedded zookeeper.
Also not sure if you saw my previous comment it could be you need that:
On FetchSFTP are you putting in:
"${path}/${filename}"
For Remote path setting?
Created 08-10-2016 01:29 PM
Hi @jsequeiros
These two settings solved the issue for me.
The IOException refers to zookeeper state reading.
One more question:
In my test, the listSftp(on primary node) sends a list of 4 files to fetchSftp.
However, I found only the primary node fetches all the 4 files rather than evenly distributing tasks to two workers.
Any idea about the task allocation from listSftp to fetchSftp?
Created 08-10-2016 02:37 PM
In order to distribute your data among your cluster nodes you need to use site to site.
Basically primary gets a list > send the list to a remote processing group ( RPG ) > configure an input port ( Name it something distinctive ) > send that relation to FetchSFTP
*** In order to see your Input port when you connect to the RPG, make sure you have configured the input port.
https://community.hortonworks.com/articles/16120/how-do-i-distribute-data-across-a-nifi-cluster.html
https://docs.hortonworks.com/HDPDocuments/HDF1/HDF-1.2/bk_UserGuide/content/site-to-site.html
Created 08-10-2016 06:13 PM
Hi @jsequeiros
After setting up site-to-site properties on all nodes, the fetchSftp works as expected. Both worker nodes are involved in file fetching.
How do we distribute tasks evenly among workers? since I found sometimes one worker took all the files. Sometimes, one took 3, while the other one took 1. I have total 4 test files.
Thanks.