Member since
07-30-2019
3118
Posts
1558
Kudos Received
907
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
138 | 12-13-2024 10:58 AM | |
293 | 12-05-2024 06:38 AM | |
235 | 11-22-2024 05:50 AM | |
212 | 11-19-2024 10:30 AM | |
193 | 11-14-2024 01:03 PM |
03-17-2017
03:03 PM
1 Kudo
@Mohammed El Moumni
Here is one possible dataflow design that can be used to make sure both FlowFiles in a pair end up on the same node after being distributed via the Remote Process Group (RPG): While it requires adding 5 additional processor to you flow, overhead is relatively light since you are dealing with very small FlowFiles all the way up to the point of the FetchFile processor. You are still only fetching the ~700 MB content after cluster distribution. Thanks, Matt
... View more
03-17-2017
01:59 PM
@mayki wogno If you are a secured NiFi cluster, make sure all you nodes have been granted the "modify the data" access policy for those connections (or the containing process group if connections are inheriting policies). As a authenticated and authorized user, when you make a request logged in to one node, that request is replicated to the other nodes. So the purge of data is being done on your behalf by the node you are currently logged in to. Authorizing your nodes to be able to modify the data should allow you to empty the queue successfully. Another option is to temporarily set file expiration on the connection to 1 sec so that NiFi purges the queue itself. Just don't forget to change it back to avoid data loss when you don't want purging to occur. Of course as Bryan had noted, you can always top nifi and delete everything in FlowFile and content repositories to purge all data form you dataflow, but that may not always be desired. Thanks, Matt
... View more
03-17-2017
01:23 PM
1 Kudo
@Joshua Adeleke With NiFi dataflows it is typically to see large numbers of open file handles and user processes because of the concurrent thread operation supported by its many components. In many cases you will find that the default ulimits for both open files and processes will fall short of what is needed for most dataflows. I recommend setting these values to 50000 out the gate. You may find your self depending on volumes and complexity of your dataflow(s) needing to even set this higher. The default 1024 is almost always guaranteed to be an issue. /etc/security/limits.conf * hard nproc 50000
* soft nproc 50000
* hard nofiles 50000
* soft nofiles 50000 /etc/security/limits.d/90-nproc.conf * hard nproc 50000
* soft nproc 50000 Thanks, Matt
... View more
03-17-2017
01:13 PM
@mayki wogno What type of downstream processor is the queue you are trying to empty connect to? Some processors such as the MergeContent processor when running have ownership of FlowFiles in the incoming queue. The MergeContent assigns FlowFiles on its incoming queue(s) to bins. You will not be able to clear the queue of any FlowFiles that are currently assigned to a bin. If you stop the processor downstream of your queue (Processor must show no running threads), can you then successfully empty the queue? Thanks, Matt
... View more
03-16-2017
03:50 PM
@mayki wogno How are you executing your provenance query? Are you selecting "Data Provenance" from within the hamburger menu in the upper right of the Ui or are you selecting "Data provenance" from the context menu that appears by right clicking on your listHDFS processor? The above performs a global provenance query of all your dataflows by default unless you add a filter. triggering a provenance query through a specific components context menu will add a filter based upon that components UUID. Thanks, Matt
... View more
03-16-2017
02:06 PM
1 Kudo
@Thangarajan Pannerselvam
If your GetFTP processor is configured with "delete original" set to false, every time this processor runs it will pull all th the files it finds including those pulled in the last run of the getFP processor. The ListFTP processor maintains state unlike the GetFTP processor. so if you replace your GetFTP with both ListFTP and FetchFTP processors, you will not see the same files pulled twice unless the timestamp on the files on the FTP server are updated. Thanks, Matt
... View more
03-15-2017
01:05 PM
@mayki wogno Same question as this thread:
https://community.hortonworks.com/questions/88962/nifi-processor-not-the-most-up-to-date.html
... View more
03-15-2017
12:43 PM
1 Kudo
@nyakkanti FlowFiles consist of FlowFile attributes and FlowFile content. - FlowFile attributes are kept in heap during processing and persisted to the FlowFile repository. - FlowFile content is kept in claims within the content repository. A claim is moved is moved to archive once their no longer exists any FlowFiles still active anywhere in your dataflow pointing at it. Archiving is enabled by default but can be disabled in the nifi.properties file: nifi.content.repository.archive.enabled=true If you disable archiving, the claim is purged from NiFi's content repository rather the being archived. What is important to understand is how claims work. By default in the nifi.properties file, claims can contain up to 100 FlowFiles or a min 10 MB of data (whichever occurs first). So a claim will not be purged until every piece of content in that claim has completed processing. As long as just one piece of content in that claim is still referenced, the entire claim will still exist in the content repository. As far as FlowFile attributes are concerned, they are persisted in NiFi provenance based on the configured retention in the nifi.properties file. You can perform provenance searches within NiFi to return FlowFile history and look at the attributes of those FlowFiles at any point int their lineage. Thanks, Matt
... View more
03-15-2017
12:31 PM
@mayki wogno Are you issuing commands against the rest api or are you trying to make a change within the UI when this occurs? Sounds like multiple changes being made against the same component at the same time. Each component has a revision number so that two people can't make changes to the exact same component at the same time. So when the second change is applied using the same revision as the first request which was successful, you get these responses. Two ways this can occur... 1. Two authenticated users are making a change to the configuration of the same processor. User 1 hits apply and that change is applied. User 2 then hits apply and a conflict response occurs from the first node that receives the request. 2. Multiple rest api call are being made against the same component without updating the revision number in the subsequent rest api calls. As far as node going down.... Do you mean you lose the api and have to refresh the browser? or does the cluster completely go down forcing you to restart nodes to get nodes to rejoin cluster? Thanks, Matt
... View more
03-14-2017
06:25 PM
1 Kudo
@Raj B
Thank you... Sometimes the most important piece of information is in the fine details. Other give away that it was clustered was that both FlowFiles in that queue had same position "1". Two FlowFiles in the same queue on the same node cannot occupy the same position.
... View more