About MattWho

MattWho · ‎01-24-2019

@Adam J - You can execute your rest-api calls against any node in the NIFi cluster. It does not have to be the primary node, - Thank you, Matt

MattWho · ‎01-23-2019

@john y There are a core set of attributes that will exist on all FlowFiles: 1. entryDate 2. lineageStartDate 3. fileSize 4. uuid 5. filename 6. path - The first four cannot be changed by users. filename and path can have their values edited by users via something like UpdateAttribute processor. - You can insert a logAttribute processor anywhere in your flow to output the key/value attribute map for FlowFiles that pass through it to the nifi-app.log. Just keep in mind that leaving this processor in your flow will result in potentially a lot of log output. - Thanks, Matt

MattWho · ‎01-16-2019

@Michael Vikulin - Your nifi.properties file is configured to look for an Authoriuzer with the identifier Managed-authorizer. nifi.security.user.authorizer=managed-authorizer The shared authorizers.xml does not contain a "managed-authorizer". If you want to use the "file-provider" you need to update your nifi.properties file. - I also see that you are using ldap-provider for logging in to your NiFi. It is configured with: <propertyname="Identity Strategy">USE_USERNAME</property> This means that whatever string the user enters in the username login box will be parsed by any configured Identity.mapping.pattens configured in nifi.properties file and then resulting value string passed to authorizer. - So even once you fix your auithorizer.xml or nifi.properties file, You are likely going to send "admin" to your authorizer rather then the admin user's full DN. - Thanks, Matt - If you found this answer addressed your question, please take a moment to login in and click the "ACCEPT" link.

MattWho · ‎01-16-2019

@Jose Paul - A bin would be eligible for merge with only 1 FlowFile in it since you set minEntries to 1. - When the Processor get scheduled to execute (based on configured run schedule and scheduling strategy), It will look at one of possible many incoming connections and look at only the queued FlowFile at that exact moment in time. It will then bin those FlowFiles based on configuration. So it multiple FlowFiles happen to exist in that connection with sam filename attribute value, they will be placed in same bin. At completion of of placing those FlowFiles in bins, the bins are evaluated if they are eligible to be merged. In your case since minEntries is 1 all bins with 1 or more FlowFiles would be merged. - If you run schedule is set to run as fast as possible (Timer Driven with run schedule of 0 sec), it may be reading the inbound connection so fast that it only contains 1 or just a few FlowFiles per execution. - The other scenario is an inbound connection with over 500 queued FlowFiles at time of execution. If we assume there are more than 500 FlowFiles with unique values assigned to the filename attribute, each would end up be placed in new bin (correlation attribute config). As soon as bin 500 has a FlowFile assigned to it and MergeContent tries to bin unique filename number 501, it has no available bins left so it forces the merging of the oldest bin to free a bin. - Thank you, Matt

MattWho · ‎01-16-2019

@Gillu Varghese Have you considered upgrading to NiFi 1.8 to take advantage of the load distribution capability of connections? I am assuming you script is executing on each node in your cluster? So the script is essentially looking for 50 flowfiles on each node which would explain why it just sits there. I am not a groovy script writer, so i am of little help there. --- The only other option that comes to mind is incrementing a value in a distributedMapCache server per node. Then have a side flow that constantly checks the sum of those cache values until it equals 50. That flow then notifies all 50 files were written and resets the per node cache values back to zero. Processors Flow 1: --> PutSFTP ---> FetchDistirbutedMapCache (get current stored value for node) --> ReplaceText (Replace content with retrieved value +1) ---> PutDistributedMapCache (write new value to cache) Flow 2: GenerateFlowFile (primary node only) ---> FetchDistributedMapCache (x3 to retreive stored cache value for each node) --> RouteOnAttribute (add relationship for when sum of all cache values equals 50, terminate unmatched) --> PutEmail (notification) --- Thanks, Matt

MattWho · ‎01-14-2019

@Mr Anticipation - *** Community Forum Tip: Try to avoid starting a new answer in response to an existing answer. Instead use comments to respond to existing answers. There is no guaranteed order to different answer which can make it hard following a discussion. - 1. NiFi and NiFi-registry are two totally different pieces of software. Each of these services are likely running as different service users. HDF service user defaults: NiFi service --- default service user is "nifi" NiFi Registry service ---> default service user is "nifiregistry" - 2. The NiFi service is where you are building your dataflows on the canvas. The NiFi-Registry service is used to store version controlled dataflows from your NiFi. - 3. Make sure that the directory you are trying to ingest files is accessible by the nifi service user. Suggest accessing server via command line and becoming the nifi service user (#sudo su - nifi) and then navigate to the target directory cd /home/xxx/receive. Keep in mind that even though the "receive" directory may be set to 777, if the nifi service user can't access /home or /home/xxx they will not be able to see "/home/xxx/receive" regardless of what permissions are set on that directory. - Thank you, Matt

MattWho · ‎01-11-2019

@Mr Anticipation - The ERROR says you have a permissions issue. The user who owns the NiFi java process does not have permissions to navigate down the path /home/xxx/receive and/or does not have permissions to files you want to ingest. - Ambari by default creates the "nifi" service user account which is used to run NiFi. As such, that "NiFi" user must have access to traverse that directory path and consume the target file(s). - following command can be used to see what user owns the two nifi processes. # ps -ef|grep -i nifi - Thank you, Matt - If you found this answer addressed your question, please take a moment to login in and click the "ACCEPT" link.

MattWho · ‎01-10-2019

@Gillu Varghese I would inspect your content repository to see if the referenced claim (StandardResourceClaim[id=XXX, container=default, section=490], offset=0, length=190]) still exists. Within content_repository directory, I would look for sub-folder "490". Then within that folder look for file named XXX (assuming you replaced actual claim number here with XXX) - It sounds like this file my have been deleted. Do you have some external process that may be access your content repository? Was the content repository maybe moved? Do you maybe have multiple node NiFi cluster where every node is trying to share same mounted content repo? Was NIFi restarted as different user? This could result in some files in repo being owned by different users which may lead to permissions issues access those files. - FlowFiles are what move from processor to processor. The FlowFile metadata (stored in flowfile repository) includes information on the size and location of the physical content within one of the content repositories (default in this case). In this case, the FlowFile has reached a processor where actually retrieving that content was needed, but could not be found. - Thank you, Matt

MattWho · ‎01-03-2019

@Adam J The Remote Process Group (RPG) was not designed with any logic to make sure specific FlowFiles went to one node versus another. IT was designed to simply build a delivery model based on load on target NiFi cluster nodes. That delivery model will change potentially each time the lates cluster status is retrieved. - If you need to be very specific as to which node get a specific FlowFile, you best bet is to use a direct delivery dataflow design. The best option here is to have your splitText processor send to a routeOnContent processor that sends the split with URL 1/2 to one new connection and the flowfile with url 3/4 to another connection. Each of these connections would feed to a different postHTTP processor (this processor can be configured to send as flowfile). One of the would be configured to send to a listenHTTP processor on node 1 and the other configured to point at same listenHTTP processor on node 2. - You may want to think about this setup from a HA standpoint. If you lose either node 1 or 2, those flowfiles will just stack up and not transfer until the node is back online. at the same time the other urls continue to transfer. - Something else you may want to look into is the new load-balanced connections capability introduced in NiFi 1.8: https://blogs.apache.org/nifi/entry/load-balancing-across-the-cluster - There is a "Partition by Attribute" option with this new feature which would make sure flowfiles with matching attribute go to same node. While you still can't specify a specific node, it does allow similar flowfiles to get moved to same node. if node goes down you don't end up with an outage, but files with matching attributes will stay together going to different node that is still available. - Thanks, Matt - If you found this answer addressed your question, please take a moment to login in and click the "ACCEPT" link.

MattWho · ‎01-02-2019

@Nimrod Avni The config.json generated as output when you stood up your NIFi CA (server) is there to simplify the execution of the client mode so that you do not have to manually pass all the server info to the client mode input. This was just a choice made by the development team to generate this file rather then just expect user to remember what they entered when the stood up the Server. You can delete this file if you want to as long as you have stored or can remember the pertinent information yourself for running the tls-toolkit client mode later. as far as client mode goes, the generated config.json is also just there to provide you the pertinent information about the client keystore that was created this is all information you should already know (unless you did not provide a password and toolkit auto-generated one for you which they you would need to get form the output config.json file.) - Thanks, Matt

Online	Offline
Last Visited	‎12-23-2025 03:12 AM

Member Since	‎07-30-2019 10:41 AM
Last Visited	‎12-23-2025 03:12 AM
Posts	3,406
Kudos received	1618

Cloudera Community

Re: Error importing NiFi workflow template from ve...

Re: Error importing NiFi workflow template from ve...

Re: How to elevate a default nifi user to admin - ...

Re: NiFi EnvokeHTTP - putting current date on HTTP...

Re: Invoking Nifi rest api in Data Flow

Re: How to access Nifi Rest-API in all Nifi nodes?

Re: about getsftp processor's attributes

Re: NIFI Unable to locate initial admin CN=admin,O...

Re: nifi: how to handle with mergeContent ...

Re: How to count the flowfiles from incoming queue...

Re: HDF NiFi issue. Not recognizing directories

Re: HDF NiFi issue. Not recognizing directories

Re: Nifi processing files as zero bytes

Re: How to send flowfile to other Nifi instances i...

Re: nifi security configuration