Member since
07-30-2019
3471
Posts
1642
Kudos Received
1020
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 173 | 06-03-2026 06:06 PM | |
| 466 | 05-06-2026 09:16 AM | |
| 849 | 05-04-2026 05:20 AM | |
| 507 | 05-01-2026 10:15 AM | |
| 635 | 03-23-2026 05:44 AM |
11-02-2021
06:54 AM
@Yemre The following response you see in the NiFi UI after supplying a username and password in the tells you that the issue happened during the user authentication process: "Unable to validate the supplied credentials. Please contact the system administrator." NiFi has not even tried to do any authorization yet, so your authorizers.xml setup has not come in to the equation yet. Unfortunately, the error produced by the openldap client is rather generic and could mean any of the following could be the issue: 1. incorrect ldap/AD manager DN 2. Incorrect ldap/AD manager password 3. Incorrect username 4. Incorrect user password 5. Incorrect user search filter in the login-identity-providers.xml file In your case it looks like number 5 may be your issue: The ldap-provider expects that the username typed in the login window is passed via the "User Search Filter" so that the entered user's credentials can be verified. I noticed you are using full DNs to login with which is extremely rare. The more common approach here is to configure your ldap-provider with "Identity strategy" of "USE_USERNAME" instead of "USE_DN". This means upon successful user authentication, it is the user string entered in the login window that is used to authorize your user instead of the user's full DN. This means your initial admin string should match your username as you would type it in at the login prompt. In order to pass the entered string at the login prompt to the ldap-provider, your "User Search Filter" would need to look something like this: <property name="User Search Filter">(cn={0})</property>
or
<property name="User Search Filter">(sAMAccountName={0})</property> You should inspect your user ldap/AD entry to see which attribute in your ldap entry contain your username that you type in the login prompt. The user entered username at login is substituted in place of "{0}" in the User Search Filter. When you change the initial admin user string from the full DN to just the username, you would need to remove the old authorizations.xml (NOT the authoirizers.xml) file that was built originally with the full DN by the file-access-policy-provider in your authorizers.xml. The authorizatiions.xml file is only seeded via the file-access-policy-provider if the file does not already exist. Once it exist all future edits to content of this file is handled via changes made from within the NiFi UI. If you found this response assisted with your query, please take a moment to login and click on "Accept as Solution" below this post. Thank you, Matt
... View more
11-02-2021
05:48 AM
@Ankit13 My recommendation would be to only automate the enabling/disabling and starting/stopping of the NiFi processor component that is ingesting the data in to your NiFi dataflow and leave all downstream processors always running, so that any data that is ingested to your dataflow has every opportunity to be processed through your dataflow to the end. When a "running" processor is schedule to execute, but has no FlowFiles queued in its inbound connection(s), it is pauses instead of running immediately over and over again to prevent excessive CPU usage, so it is safe to leave these downstream components running all the time. Thank you, Matt
... View more
10-27-2021
02:20 PM
@Apoo The EvaluateJsonPath processor dynamic properties do not support NiFi Expression language, so being able to pass dynamic strings to these dynamic properties from FlowFile attributes is not possible. The dynamic properties only support NiFi parameters. You may want to raise an Apache NiFi jira requesting adding NiFi EL support to these dynamic properties or even contribute the the open source code if you so choose. Thank you, Matt
... View more
10-20-2021
11:10 AM
@RB764 Your EvaluateJsonPath processor configuration is good. This processor evaluates the json path expressions against the content of the inbound FlowFile and then with "Destination"set to "flowfile-attribute", it will create a new attribute for each dynamic property added to the processor with the value that results from the JsonPath. Your issue here is that your inbound FlowFile has no content for the EvaluateJsonPath processor to run the json path against. I see that in your screenshot of the GenerateFlowFile processor you have added a new dynamic property "value" with a value of "{"Country":"Austria","Capital":"Vienna"}". Dynamic properties become FlowFile attributes themselves on the FlowFile produced and not content. If you want to specify specific content via GenerateFlowFIle processor, you need to use the "Custom Text" property to do so: If you found this response assisted with your query, please take a moment to login and click on "Accept as Solution" below this post. Thank you, Matt
... View more
10-20-2021
10:53 AM
@Apoo Not sure if this is the best solution, but you could use a combination of EvaluateJsonPath and ReplaceText to convert you sample source in to you sample output. EvaluateJsonPath processor: 'new"dynamic property (can use any property name) = $.data[*] this would result in this output based on your example: [{"timestamp_start":0,"timestamp_stop":0}] So we can then use the replaceText to trim off the leading "[" and trailing "]": Search Value = (^\[)|(\]$) Then you have you desired output of: {"timestamp_start":0,"timestamp_stop":0} If you found this response assisted with your query, please take a moment to login and click on "Accept as Solution" below this post. Thank you, Matt
... View more
10-19-2021
02:04 PM
@AA24 The easiest way to accomplish this is to use the PutDistributedMapCache processor in one flow to write the attributes values you want to share to a cache server and on your other flow use the FetchDistributedMapCache processor to retrieve those cached attributes and add them to your other FlowFiles that need them. Another option is to use the MergeContent processor. On flow one where it looks like you are extracting your session_id and job_id you would use the ModfiyBytes processor to zero out the content leaving you with a FlowFile that only has attributes and then use MergeContent to combine this FlowFile with the FlowFile in your second flow. In the MergeContent processor you would configure "Attribute Strategy" to use "Keep All Unique Attributes". If you found this response assisted with your query, please take a moment to login and click on "Accept as Solution" below this post. Thank you, Matt
... View more
10-19-2021
01:55 PM
1 Kudo
@TRSS_Cloudera It i snot clear to me how you have designed your dataflow to remove all files from source SFTP server except newest file? Assuming state was not an issue (since you said you flow works if you manually clear state), how do you have your flow built? There exists a GetSFTP processor that does not maintain state. So you could have your flow that uses the listSFTP and FetchSFTP to always get the newest "history" file and record that that latest "history" files last modified timestamp in something like a distributedMapCache server. Then have your GetFile run once a day using the "Cron driven" scheduling strategy to get all files (Delete Original= false)in that directory (would get latest history file also) and then get the current stored last modified time from the map cache and then via a RouteOnAttribute send any to FlowFiles where last modified stored is newer then what is on files retrieved by GetFile and finally send to a processor to remove them from source SFTP processor. While above would work in an "ideal" world. You would run in to issues when their was an interruption in the running dataflow causing multiple new files to get listed by the listSFTP processor because you would not know which one end up having its last modified timestamp stored in distributedMapCache. But in such a case the worst case if you have a couple files left lingering until the next run results in just one history file being listed and it goes back to expected. Otherwise, there are script base processor you could use to build you own scripted handling here. To be honest it seems like wasted IO to have NiFi consume these files int NiFi just to auto-terminate them when you could use an ExecuteStreamCommand processor to invoke a script that connects to your SFTP server and simply removes what you do not want without needing to pull anything across the network or write file content to NiFi that you don't need Hopefully this gives you some options to think about. If you found this response assisted with your query, please take a moment to login and click on "Accept as Solution" below this post. Thank you, Matt
... View more
10-19-2021
01:18 PM
@AA24 NiFi was designed as an always on type of dataflow design. As such the NiFi processor components support "Timer Driven" and "Cron Driven" Scheduling Strategy types. That being said, the ability to tell a processor to "Run Once" exists within NiFi. You could manually do from within the UI by right clicking on the NiFi processor component and selecting "run once" from the pop-up context menu. The next thing to keep in mind is that anything that you can do via the UI, you can also do via a curl command. So it is possible to build a dataflow that could trigger the "run once" api call against the processor you want to fetch from the appropriate DB. You can not execute "run once" against a PG nor would I recommend doing so. You want to only trigger the file responsible for ingesting your data and leave all the other processor running all the time so they process whatever data they have queued at anytime. First you to create your trigger flow, so you could have a getFile to consume the trigger file and use maybe a RouteOnContent processor to send the FlowFile to either an InvokeHTTP configured to invoke run-once on your Oracle configured processor or an invokeHTTP configured to invoke run-once on your MySQL configured processor. Using your browser's developer tools is an easy way to capture the rest-api calls that are made when you manually perform them the action via the UI. If you found this response assisted with your query, please take a moment to login and click on "Accept as Solution" below this post. Thank you, Matt
... View more
10-19-2021
12:04 PM
@DayDream The ExecuteStreamCommand processor executes a system level command and not something native within NiFi, so its impact on CPU is completely dependent on what the command being called is doing. You mention that the ExecuteStreamCommand is just executing a CP command and that issue happens when you are dealing with a large file. The first thing I would be looking in to is disk I/O of the source and destination directory location where the file is being copied from and copied to. You also mention that the PutFile is writing out a large FlowFile to disk. This means that the processors is reading FlowFile content from the NiFi content_repository and then writing it to some target folder location. I would once again look at the disk I/O of both locations when this is happening. The CPU usage may be high simply because these threads are running a long time waiting on disk I/O. NiFi uses CPU for its core level functions and then you configure an additional thread pool that is used by the NiFi components you add to the NiFi canvas. This resource pool is configured via NiFi UI --> Global Menu (upper right corner of UI) --> Controller Settings: The "Event Driven" thread pool is experimental and deprecated and is used by processors configured to use the event driven scheduling strategy. Stay away from this scheduling strategy. The "Timer Driven" thread pool is used by controller services, reporting tasks, processors, etc... The Processors will use it when configured to use the "Timer Driven" or "Cron driven" scheduling strategies. This pool is what is available for the NiFi controller to hand out to all processors requesting time to execute. Setting this value to an arbitrarily high value will simply lead to many NiFi components getting threads to execute but then spending excessive time in CPU wait as the time on the limited cores is time sliced across all active threads. The general rule of thumb here is to set the pool to 2 to 4 times the number of available core on a single NiFi host/node. So for your 8 core server, you would want this between 16 and 32. This does not mean you can't set this higher, but should only do this in smaller increments while monitoring CPU usage over extended period of time. If you have 5 nodes, this setting is per node so you would have a thread pool of 16 - 32 on each NiFi host/node. Another thing you may want to start looking at is the GC stats for your JVM. Is GC (young and old) running very often? Is it taking a long tome to run? All GC is a stop-the-world event, so the JVM simply is paused while this is going on which can also impact how long a thread is "running". You can get some interesting details about your running NiFi using the built in NiFi diagnostics tool. <path to NiFi>/bin/nifi.sh diagnostics --verbose <path/filename where output should be written> For a NiFi node to remain connected to it must be successful at sending a heartbeat to the elected cluster coordinator at least 1 out of 8 scheduled heartbeat intervals. Let's say the heartbeat interval is configured in the nifi.properties file for 5 secs, then the elected CC must successfully process at least 1 heartbeat every 40 secs or that node would get disconnected for lack of heartbeat. The node would initiate a reconnection once a heartbeat is received after having been disconnected for above reason. Configuring a larger heartbeat interval will help avoid this disconnect/reconnect by allowing from time before heartbeat is considered lost. This would allow more time if the node is going through a long GC pause or the CPU is so saturated it can't get a thread to create a heartbeat. I also recommend reading through this community article: https://community.cloudera.com/t5/Community-Articles/HDF-CFM-NIFI-Best-practices-for-setting-up-a-high/ta-p/244999 If you found this response assisted with your query, please take a moment to login and click on "Accept as Solution" below this post. Thank you, Matt
... View more
10-19-2021
11:32 AM
@vikrant_kumar24 The ExecuteScript processor has been around for over 6 years as part of Apache NiFi. It has had many improvements and bug fixes over those years just like many other well used components. I'd be reluctant from calling it "experimental" any longer regardless of what the embedded Apache NiFi docs say. The only thing to note here is that the ExecuteScript processor does not really execute the "Python" script engine. It is executing "Jython" instead which is a Java implementation of Python. Jython is not 100% compatible with Python, so you must test you script thoroughly. Thanks, Matt
... View more