About MattWho

MattWho · ‎07-11-2018

@Mohammad Soori Just to make sure I understand correctly... - The TailFile is producing only 20 output FlowFiles; however, all 500 records are included within those 20 FlowFiles. correct? - With a Run Schedule of 0 secs, the processor will be scheduled toe execute and then scheduled to execute again immediately following completion of last execution. During its execution, it will consume all new lines seen since last execution. There is no configuration option that will force this processor to output a separate FlowFile for each line read from the file being tailed. - You could however feed the output FlowFiles to a splitText processor to split each FlowFile in to a separate FlowFile per line. - Thank you, Matt - When an "Answer" addresses/solves your question, please select "Accept" beneath that answer. This encourages user participation in this forum.

MattWho · ‎07-10-2018

@umang s Based on your flow design above, it looks like you are trying to route FlowFiles by comparing attribute between two different FlowFiles? That will not work. NiFi is looking for both ${temp_array} and ${category} to exist on same flowfile being evaluated by the RouteOnAttribute processor.

MattWho · ‎07-10-2018

@Benjamin Bouret - The listHDFS processor does not retrieve the actual content of the files. It produces 0 byte FlowFiles that have metadata about the target content. Any hash you produce on these files will not match what the hash produced on the original source ftp server. - If I am not following above correctly, I am not really clear on exactly where you are performing this second hash. How you plan to compare the two hashes. Manually? - NiFi has guaranteed delivery when it writes data to HDFS. If the transfer fails for any reason the FlowFile is routed to failure. - FetchFTP processor also has handling of failures in retrieving the Content: - This check seems like a lot of overhead that should not be necessary. - Thank you, Matt - When an "Answer" addresses/solves your question, please select "Accept" beneath that answer. This encourages user participation in this forum.

MattWho · ‎07-09-2018

@Derek Calderon - Short answer is no. The ExecuteSQL processor is written to write the output to the FlowFile's content. - There is an alternative solution. You have some processor currently feeding FlowFiles to your ExecuteSQL processor via a connection. My suggestion would be to feed that same connection to two different paths. The first connection feeds to a "MergeContent" processor via a funnel and the second feeds to your "ExecuteSQL" processor. The ExecuteSQL processor performs the query and retrieves the data you are looking for writing it to the content of the FlowFile. You then use a processor like "ExtractText" to extract that FlowFIles new content to FlowFile Attributes. Finally you use a processor like "ModifyBytes" to remove all content of this FlowFile. Finally you feed this processor to the same funnel as the other path. The MergeContent processor could then merge these two flowfiles using the "Correlation Attribute Name" property (assuming "filename" is unique, that could be used), min/max entries set to 2, and "Attribute Strategy" set to "Keep All Unique Attributes". The result should be what you are looking for. - Flow would look something like following: Having multiple identical connections does not trigger NiFi to write the 200 mb of content twice to the the content repository. a new FlowFile is created but it points to the sam content claim. New content is only generated when the executeSQL is run against one of the FlowFiles. So this flow does not produce any additional write load on the content repo other then when the executeSQL writes its output which i am assuming is relatively small? - Thank you, Matt

MattWho · ‎07-09-2018

@Henrik Olsen - NiFi's various timeout settings are very aggressive. They are more ideal for a standalone NiFi instance running a fairly simple dataflow. In a NiFi cluster the following timeouts should be increased: nifi.cluster.node.connection.timeout=5 secs (increase to 30 secs) nifi.cluster.node.read.timeout=5 secs (Increase to 30 secs) nifi.zookeeper.connect.timeout=3 secs (increase to 60 secs) nifi.zookeeper.session.timeout=3 secs (Increase to 60 secs) - A restart of NiFi will be needed after making these changes. Another thing you could do when this condition is present is to use the browser developer tools to try to catch what action is timing out. Are you seeing a lot of full garbage collection occurring (if these stop-the-world events are long enough, it can also cause this). - Thank you, Matt

MattWho · ‎07-06-2018

@Derek Calderon - Sorry to hear that. I did share this HCC link with a few devs I know if they have time to assist. - Thanks, Matt

MattWho · ‎07-06-2018

@Derek Calderon - Will need a developer with more custom processor development experience then me if this did not solve issue. I just know that adding additional jars/nars to NiFi's default lib dir can introduce class loader issues with regards to dependencies shared by multiple components. If issue still exists after clearing work directory and restarting, going to need a developer to suggest possible changes to your custom code. - Thanks Matt

MattWho · ‎07-06-2018

@Derek Calderon - If you create a custom lib directory and place your new processor jar and its dependencies in there instead of NiFi's default lib directory, do you experience the same issue(s)? - It is not recommended that users added any new jars or nars to NiFi's default lib directory. Adding additional custom lib directories to NiFi is easy and also will make upgrading easier: Simply add a new property in the nifi.properties file: example: nifi.nar.library.directory.custom-lib1=/nifi/custom-lib1 - Then make sure the NiFi service user has proper ownership and permissions for this directory and your custom jars/nars you place in there. - A restart of NiFi is needed anytime you make a configuration change to nifi.properties or add a new jar/nar to any of the lib directories. - Before restarting NiFi, you should delete the existing NiFi "work" directory so it is recreated cleanly after this change - Thank you, Matt - If you found this Answer addressed your original question, please take a moment to login and click "Accept" on the answer.

MattWho · ‎07-06-2018

@Gitanjali Bare - When the NiFi Site-to-Site (S2S) capability was added to NiFi, the input and output ports were designed to allow for the movement of FlowFiles between NiFi process groups. Input and output ports allow for the movement of FlowFile to and from a parent process group. Once a FlowFile is on the "root" process group level, the effective parent would be another NiFi instance. S2S has been around a lot longer then NiFi's multi-tenant authorizations, thus at the time of development every user who could authenticate in to NiFi had access to everywhere on the NiFi canvas. - There is considerable design changes required to change this functionality in NiFi. The following Jira was opened as one suggested approach: https://issues.apache.org/jira/browse/NIFI-2933 - But in any design change, NiFi must consider how that design change will affect existing user during upgrade. The above change type may leave existing users with invalid flows after upgrade requiring potential substantial re-work. - At this time. the only option for NiFi S2S is having your input/output ports at the root canvas level (remote ports). - Another limitation in multi-tenant NiFi installations is authorized access to these remote input/output ports. The dataflows built by authenticated users do not execute as those users. All flows are executed as the NiFI service user. This means when NiFi A uses S2S to send/retrieve FlowFiles from NiFi B, the servers themselves are being authenticated and authorized in that connection. This means that once NiFI A has been authorized to see X number of remote ports on NiFi B, all users on NiFi A who add a Remote Process Group (RPG) pointing at NiFI B will be able to see and transfer FlowFiles to/from all those Remote ports. - Thank you, Matt - If you found this Answer addressed your original question, please take a moment to login and click "Accept" below the answer.

MattWho · ‎07-06-2018

@umang s The following NiFi Expression Language statement will return "true" if a match is found: ${anyDelineatedValue("${temp_array}", ","):contains("${category}")} - Thanks, Matt - If you found this Answer addressed your original question, please take a moment to login and click "Accept" below the answer.

Online	Online
Last Visited	‎02-03-2026 12:58 AM

Member Since	‎07-30-2019 10:41 AM
Last Visited	‎02-03-2026 12:58 AM
Posts	3,432
Kudos received	1628

Cloudera Community

Re: Setting TTL per key when writing to redis

Re: Best Practice for configuring registry flows

Re: Nifi 2.7.2 Start Problem

Re: Error importing NiFi workflow template from ve...

Re: nifi 2.6 registry security scan results

Re: Nifi TailFile Processor does not detect vast i...

Re: How to use (in) to compare variable with array...

Re: How to perform a reliable check of data integr...

Re: Is there any way to route the result of Execut...

Re: NiFi web UI timeouts

Re: Created a custom nifi processor, after placing...

Re: Created a custom nifi processor, after placing...

Re: Created a custom nifi processor, after placing...

Re: Hi, Why Input port should be created on top le...

Re: How to use (in) to compare variable with array...