About MattWho

MattWho · ‎10-21-2016

@mayki wogno Every Processor has a well defined job to do. When that job is executed, there are 1 to many possible outcomes. Those outcomes are represented in the form of relationships. As a FlowFile traverses through your dataflow it will be routed to these various relationships. Eventually you will reach the end of you dataflow and the auto-terminate relationship capability gives the user the ability to tell NiFi I am not done with the FlowFile you may get rid of it. A processor is never coded to ever delete FlowFiles by default. For example, lets assume the last processor in my dataflow is a putSFTP. What this processor actually does when it receives a FlowFile is send a copy of that FlowFiles content to the SFTP destination. Upon confirmed delivery it routes the FlowFile to the "success" relationship. You may choose to send that FlowFile via that "success" relationship on for further processing in additional processors or "auto-terminate" it within the putSFTP thus ending its life as FlowFile. In NiFi it is the dataflow designers job to determine when a FlowFile has reached the end of the dataflow and that is exactly what "auto-terminate relationships" is used for. Thanks, Matt

MattWho · ‎10-20-2016

@Sanaz Janbakhsh Re-installing from scratch is not necessary. Shutdown your NiFi instance, create the user on your system that you want to run NiFi as and change ownership of all the files and directories used by NiFi to that user. This includes all 4 NiFi repositories (Database, Provenance, Content, and FlowFile). The "NiFi user" must be able to read and write to the repos, nifi logs, and state directories. If you are unsure where to find these directories your user needs access to, look in the various config files found in NiFi's conf directory. NiFi will be able to continue working on FlowFiles that were still active in the flow as long as ownership of those files was successfully changed. After that you can either start NiFi while logged in as that new user or set the "run.as=" property in the NiFI bootstrap.conf file. After starting NiFi as the user, tail the nifi-app.log and watch for any permission denied errors. If you encounter any, adjusted permissions on the reported file/dir and you should be good to go. There is no such thing as NiFi version 2.4.2.0? Are you running an Apache Nifi release (0.x or 1.x) or a HDF release (1.x or 2.0)? You can see your NiFi version by clicking on "about" in the upper right corner of the NiFi UI. Thanks, Matt

MattWho · ‎10-18-2016

@srinivas padala hbase has a MAX_ROW_LENGTH value of 32767 https://hbase.apache.org/apidocs/constant-values.html

MattWho · ‎10-17-2016

@milind pandit Is this a flow.xml.gz you copied from another instance of NiFi? All the sensitive properties inside the flow.xml.gz file are encrypted using the sensitive property defined in the nifi.properties file (If blank NiFi uses and internal default). If you move your flow.xml.gz file to another NiFi, the sensitive property value used must be the same or NiFi will fail to start because it cannot decrypt the sensitive properties in the file. Matt

MattWho · ‎10-17-2016

@Paul Yang Are you trying to have every node in your cluster execute the same SQL statements? or are you trying to evenly distribute all the generated SQL statements across your cluster so that every node runs different SQL statements?

MattWho · ‎10-17-2016

@Josh Elser @srinivas padala The "Read/Write" stats on the processor have nothing to do with writing to your SQL end-point. This particular stat is all about reads from and writes to the NIFi content Repository. This helps identify where in your flow you may have disk high disk I/O in the form of either reads or more expensive writes. From the screenshot above, I see that this processor brought in off inbound connections 35,655 FlowFiles in the past 5 minutes. It read 20.87 MB of content from the content repository in that same timeframe. The processor then output 0 FlowFiles to any outbound connection (This indicates all files where either routed to a an auto-terminated relationship). Assuming only the "success" relationship was auto-terminated, all data was sent successfully. If the "failure" relationship (which should not be auto-terminated here) is routed to another processor, the 0 "out" indicates that in the past 5 minutes 0 files failed. The Tasks shows a cumulative total CPU usage reported over the past 5 minutes. A high "Time" value indicates a cpu intensive processor. Thanks, Matt

MattWho · ‎10-14-2016

@Harry You should be able to simply use two ReplaceText processors in series to create the XML structure you are looking for: First replace text configured to prepend text to the binary content as follows: and the second to append text to the resulting content from the above: *** Note: if you hold shift key while hitting enter, you will create a new line in the text editor as shown in the above examples. An example of the output from the above would look like this: There you have the binary content between <flowfile-content> and </flowfile-content> Now all you need to do is adjust the specific prepend and append values you need. Thanks, Matt

MattWho · ‎10-12-2016

The mergeContent Processor simply bins and merges the FlowFiles it sees on an incoming connection at run time. In you case you want each bin to have a min 100 FlowFiles before merging. So you will need to specify that in the "Minimum number of entries" property. I never recommend setting any minimum value without also setting the "Max Bin Age" property as well. Let say you only ever get 99 FlowFiles or the amount of time it takes to get to 100 exceeds the useful age of the data being held. Those Files will sit in a bin indefinitely or for excessive amount of time unless that exit age has been set. Also keep in mind that if you have more then one connection feeding your mergeContent processor, on each run it looks at the FlowFiles on only one connection. It moves in round robin fashion from connection to connection. NiFi provides a "funnel" which allows you to merge FlowFiles from many connections to a single connection. Matt

MattWho · ‎10-12-2016

@boyer It may be helpful to see your dataflow to completely understand what you are doing. When you say you "call 2 other apis at the same time", does that mean you are forking the success relationship from the HandleHttpRequest to two downstream processors? Then you are taking the successes from those processors and merging them back together into a single FlowFile before sending to the HandleHttpResponse processor? Assuming the above is true, how do you have your mergeContent processor configured? 1. I would suggest you use the "http.context.identifier" as the "correlation Attribute Identifier" so that only FlowFiles originating form the same handleHttpRequest are merged together. 2. I also suggest setting "attribute strategy" to "Keep all unique attributes". (if 'Keep All Unique Attributes' is selected, any attribute on any FlowFile that gets bundled will be kept unless its value conflicts with the value from another FlowFile. ) This will be useful if your two intermediate processors set any unique attributes you want to keep on the resulting merged FlowFile. You also want to make sure that your FlowFile makes it from the Request to Response before the configured expiration in your "StandardHttpContextMap" controller service. Thanks, Matt

MattWho · ‎10-12-2016

@Saikrishna Tarapareddy The FlowFile repo will never get close to 1.2 TB in size. That is a lot of wasted money on hardware. You should inquire with your vendor about having them split that Raid in to multiple logical volumes, so you can allocate a large portion of it to other things. Logical Volumes is also a safe way to protect your RAID1 where you OS lives. If some error condition should occur that results in a lot of logging, the application logs may eat up all your disk space affecting you OS. With logical volumes you can protect your root disk. If not possible, I would recommend changing you setup to a a bunch of RAID1 setups. With 16 x 600 GB hard drives you have allocated above, you could create 8 RAID1 disk arrays. - 1 for root + software install + database repo + logs (need to make sure you have some monitioring setup to monitor disk usage on this RAID if logical volumes can not be supported) - 1 for flowfile repo - 3 for content repo - 3 for provenance repo Thanks, Matt

Member Since	‎07-30-2019 10:41 AM
Last Visited
Posts	3,146
Kudos received	1562

Cloudera Community

Re: Cloudera NiFi - Automatic policy creation

Re: MergeRecord generates multiple files

Re: Flowfile stuck in Wait in EnforceOrder process...

Re: Untrusted proxy error Authentication Failed o....

Re: REST API Configuration for NiFi 2.0

Re: NIFI : Processor "Auto Terminate RelationShips

Re: Switch root user to non-root in NIFI

Re: Nifi - PutSQL Row length exception for Phoenix...

Re: NiFi startup Exception.

Re: How to fetch rows from a table in parallel wh...

Re: Nifi - putsql for phoenix upsert very slow - i...

Re: [NiFi] Combine FlowFile Attribute and Content

Re: Load balancing while the fetching of file fro...

Re: handlehttprequest / handlehttpresponse error w...

Re: NiFI Server Configuration