About MattWho

MattWho · ‎11-02-2021

@AnnaBea Let me make sure I am clear on your ask here: 1. You have successfully split your source file in to 3 parts (header line, body line(s), and footer line). 2. You have successfully modified all three split files as needed. 3. You are having issues re-assembling the three split files back in to one file in order of header, body, footer using MergeRecord processor? With this particular dataflow design, the MergeRecord processor is not likely what you want to use. You probably want to be using the MergeContent processor instead with a "Merge Strategy" of "Defragment". But to get these three source FlowFiles merged in a specific order would require some additional work in your upstream flow. In order to use "Defragment" your three source FlowFiles all would need o have these FlowFile Attributes: fragment.identifier All split FlowFiles produced from the same parent FlowFile will have the same randomly generated UUID added for this attribute fragment.index A one-up number that indicates the ordering of the split FlowFiles that were created from a single parent FlowFile fragment.count The number of split FlowFiles generated from the parent FlowFile 1. Add one UpdateAttribute processor before your RouteText and configure it to create the "fragement.identifier" attribute with a value of "${UUID()}" and another Attribute "Fragment.count" with a value of "3". Each FlowFIle produced by RouteText should then have these two attribute set on it. 2. Then add one UpdateAttribute processor to each of teh 3 flow paths to set the "fragment.index" attribute uniquely per each dataflow path. value=1 for header, value=2 for body, and value=3 for footer. 3. Now the MergeContent will have what it needs to bin these three files by the UUID and merge them in the proper order. There are often times many ways to solve the same use case using NiFi components. Some design choices are better than others and use less resources to accomplish the end goal. While above is one solution, there are others I am sure. Cloudera's professional services is a great resource that can help with use case designs. If you found this response assisted with your query, please take a moment to login and click on "Accept as Solution" below this post. Thank you, Matt

MattWho · ‎11-02-2021

@Yemre The following response you see in the NiFi UI after supplying a username and password in the tells you that the issue happened during the user authentication process: "Unable to validate the supplied credentials. Please contact the system administrator." NiFi has not even tried to do any authorization yet, so your authorizers.xml setup has not come in to the equation yet. Unfortunately, the error produced by the openldap client is rather generic and could mean any of the following could be the issue: 1. incorrect ldap/AD manager DN 2. Incorrect ldap/AD manager password 3. Incorrect username 4. Incorrect user password 5. Incorrect user search filter in the login-identity-providers.xml file In your case it looks like number 5 may be your issue: The ldap-provider expects that the username typed in the login window is passed via the "User Search Filter" so that the entered user's credentials can be verified. I noticed you are using full DNs to login with which is extremely rare. The more common approach here is to configure your ldap-provider with "Identity strategy" of "USE_USERNAME" instead of "USE_DN". This means upon successful user authentication, it is the user string entered in the login window that is used to authorize your user instead of the user's full DN. This means your initial admin string should match your username as you would type it in at the login prompt. In order to pass the entered string at the login prompt to the ldap-provider, your "User Search Filter" would need to look something like this: <property name="User Search Filter">(cn={0})</property> or <property name="User Search Filter">(sAMAccountName={0})</property> You should inspect your user ldap/AD entry to see which attribute in your ldap entry contain your username that you type in the login prompt. The user entered username at login is substituted in place of "{0}" in the User Search Filter. When you change the initial admin user string from the full DN to just the username, you would need to remove the old authorizations.xml (NOT the authoirizers.xml) file that was built originally with the full DN by the file-access-policy-provider in your authorizers.xml. The authorizatiions.xml file is only seeded via the file-access-policy-provider if the file does not already exist. Once it exist all future edits to content of this file is handled via changes made from within the NiFi UI. If you found this response assisted with your query, please take a moment to login and click on "Accept as Solution" below this post. Thank you, Matt

MattWho · ‎11-02-2021

@Ankit13 My recommendation would be to only automate the enabling/disabling and starting/stopping of the NiFi processor component that is ingesting the data in to your NiFi dataflow and leave all downstream processors always running, so that any data that is ingested to your dataflow has every opportunity to be processed through your dataflow to the end. When a "running" processor is schedule to execute, but has no FlowFiles queued in its inbound connection(s), it is pauses instead of running immediately over and over again to prevent excessive CPU usage, so it is safe to leave these downstream components running all the time. Thank you, Matt

MattWho · ‎10-27-2021

@galvinpaul1718 I 'd suggest verifying your download was good. Then remove the nifi-registry work directory before restarting. The work directory is rebuilt from the nifi-registry lib dir contents. Make sure you did not run out of disk space. If you found this response assisted with your query, please take a moment to login and click on "Accept as Solution" below this post. Thank you, Matt

MattWho · ‎10-27-2021

@Apoo The EvaluateJsonPath processor dynamic properties do not support NiFi Expression language, so being able to pass dynamic strings to these dynamic properties from FlowFile attributes is not possible. The dynamic properties only support NiFi parameters. You may want to raise an Apache NiFi jira requesting adding NiFi EL support to these dynamic properties or even contribute the the open source code if you so choose. Thank you, Matt

MattWho · ‎10-27-2021

@AhmedAlghwinem You are correct that this typically means you are missing some authorization for the currently authenticated user. To help you with this issue, I would need to know a lot more about your NiFi-Registry setup configuration. 1. nifi-registry.properties file would tell me which method of user authentication you have setup, any identity.mappings you have setup, and which authorizer you are using. 2. identity.providers.xml file tells me how the login provider if used specified in the above nifi-registry.properties file is configured. 3. authorizers.xml file tells you how the configured authorizer specified in the above nifi-registry.properties file is configured and what user-group-providers are being used. 4. Depending on configurations used in authorizers.xml, you may have a users.xml and authorizations.xml file generated as well or you may be using an external authorizer like Ranger. 5. I would need to know your user string (case sensitive) that is displayed in the upper right corner of the NiFi-Registry UI after you login/authenticate into nifi-registry, so that it can be checked against the configured policies to see what your user is missing. The policies used by NiFi-Registry are covered in the admin guide here: https://nifi.apache.org/docs/nifi-registry-docs/html/administration-guide.html#access-policies You will want to look at the "Special Privilege Policies" which include what would be needed by an admin user to create new buckets. Providing the above details in a Cloudera Support ticket provided you have a support subscription with Cloudera would allow support to quickly and easily assist you with this issue. If you found this response assisted with your query, please take a moment to login and click on "Accept as Solution" below this post. Thank you, Matt

MattWho · ‎10-20-2021

@RB764 Your EvaluateJsonPath processor configuration is good. This processor evaluates the json path expressions against the content of the inbound FlowFile and then with "Destination"set to "flowfile-attribute", it will create a new attribute for each dynamic property added to the processor with the value that results from the JsonPath. Your issue here is that your inbound FlowFile has no content for the EvaluateJsonPath processor to run the json path against. I see that in your screenshot of the GenerateFlowFile processor you have added a new dynamic property "value" with a value of "{"Country":"Austria","Capital":"Vienna"}". Dynamic properties become FlowFile attributes themselves on the FlowFile produced and not content. If you want to specify specific content via GenerateFlowFIle processor, you need to use the "Custom Text" property to do so: If you found this response assisted with your query, please take a moment to login and click on "Accept as Solution" below this post. Thank you, Matt

MattWho · ‎10-20-2021

@Apoo Not sure if this is the best solution, but you could use a combination of EvaluateJsonPath and ReplaceText to convert you sample source in to you sample output. EvaluateJsonPath processor: 'new"dynamic property (can use any property name) = $.data[*] this would result in this output based on your example: [{"timestamp_start":0,"timestamp_stop":0}] So we can then use the replaceText to trim off the leading "[" and trailing "]": Search Value = (^\[)|(\]$) Then you have you desired output of: {"timestamp_start":0,"timestamp_stop":0} If you found this response assisted with your query, please take a moment to login and click on "Accept as Solution" below this post. Thank you, Matt

MattWho · ‎10-19-2021

@AA24 The easiest way to accomplish this is to use the PutDistributedMapCache processor in one flow to write the attributes values you want to share to a cache server and on your other flow use the FetchDistributedMapCache processor to retrieve those cached attributes and add them to your other FlowFiles that need them. Another option is to use the MergeContent processor. On flow one where it looks like you are extracting your session_id and job_id you would use the ModfiyBytes processor to zero out the content leaving you with a FlowFile that only has attributes and then use MergeContent to combine this FlowFile with the FlowFile in your second flow. In the MergeContent processor you would configure "Attribute Strategy" to use "Keep All Unique Attributes". If you found this response assisted with your query, please take a moment to login and click on "Accept as Solution" below this post. Thank you, Matt

MattWho · ‎10-19-2021

@TRSS_Cloudera It i snot clear to me how you have designed your dataflow to remove all files from source SFTP server except newest file? Assuming state was not an issue (since you said you flow works if you manually clear state), how do you have your flow built? There exists a GetSFTP processor that does not maintain state. So you could have your flow that uses the listSFTP and FetchSFTP to always get the newest "history" file and record that that latest "history" files last modified timestamp in something like a distributedMapCache server. Then have your GetFile run once a day using the "Cron driven" scheduling strategy to get all files (Delete Original= false)in that directory (would get latest history file also) and then get the current stored last modified time from the map cache and then via a RouteOnAttribute send any to FlowFiles where last modified stored is newer then what is on files retrieved by GetFile and finally send to a processor to remove them from source SFTP processor. While above would work in an "ideal" world. You would run in to issues when their was an interruption in the running dataflow causing multiple new files to get listed by the listSFTP processor because you would not know which one end up having its last modified timestamp stored in distributedMapCache. But in such a case the worst case if you have a couple files left lingering until the next run results in just one history file being listed and it goes back to expected. Otherwise, there are script base processor you could use to build you own scripted handling here. To be honest it seems like wasted IO to have NiFi consume these files int NiFi just to auto-terminate them when you could use an ExecuteStreamCommand processor to invoke a script that connects to your SFTP server and simply removes what you do not want without needing to pull anything across the network or write file content to NiFi that you don't need Hopefully this gives you some options to think about. If you found this response assisted with your query, please take a moment to login and click on "Accept as Solution" below this post. Thank you, Matt

Online	Offline
Last Visited	‎12-26-2025 01:55 PM

Member Since	‎07-30-2019 10:41 AM
Last Visited	‎12-26-2025 01:55 PM
Posts	3,406
Kudos received	1619

Cloudera Community

Re: Error importing NiFi workflow template from ve...

Re: Error importing NiFi workflow template from ve...

Re: How to elevate a default nifi user to admin - ...

Re: NiFi EnvokeHTTP - putting current date on HTTP...

Re: Invoking Nifi rest api in Data Flow

Re: Apache Nifi - Transform Fixed Width File into ...

Re: Nifi LDAPS Configuration

Re: Apache Nifi PutFile Processor

Re: Apache nifi registry in windows 10 Pro

Re: Unable to extract a special character nested d...

Re: Nifi Registry - No Settings icone - unable to ...

Re: Can't get NiFi to consume a JSON

Re: Unable to extract a special character nested d...

Re: How to use FlowFile attributes from one flow i...

Re: ListSFTP - Grab All Files Every Time