About MattWho

MattWho · ‎01-12-2022

@LuisLeite First think to note is the age of the NiFi version being used. HDF 3.1.2 was released back in 2018. There have been many improvements and advancements within NiFi since then including to the record based processors and controller services. 1. Does your source AVRO being produced via the ExecuteSQL processor contain the avro schema embedded in the content? 2. The exception you shared deals with writing the Avro Schema which is a function of the record writer and not the ConvertRecord or Record reader. I am assuming you are using the CSVRecordSetWriter in your use case. What do you have the "Schema Write Strategy" set to? Do you get same exception if you set this to "Set 'avro.schema' Attribute" instead? If above does not help, sharing the exact configuration of your convertRecord processor and controller services being utilized may help those in the community offer additional guidance. Of course a sample source file example would also be helpful. If you found this response assisted with your query, please take a moment to login and click on "Accept as Solution" below this post. Thank you, Matt

MattWho · ‎01-11-2022

@Neil_1992 I strongly recommend not setting your NiFi heap to 200GB. Java reserves the XMS space and grows to the XMX space as space is requested. Java Garbage Collection (GC) execution to reclaim heap no longer being used does not kick in to ~80% of heap is used. That means GC in your case would kick in at around 160+ GB of used heap. All GC execution is stop-the-world activity which means your NiFi will stop doing anything until GC completes. This can lead to long pauses resulting node disconnections, issues with timeouts with dataflows to external services ,etc. When it comes to flow.xml.gz file, you are correct that it is uncompressed and loaded into heap memory. The flow.xml.gz contains: Everything you add via the NiFi UI to the canvas (processors, RPG, input/output ports, funnels. labels, PGs, controller services, reporting tasks, connections, etc.). This includes all the configuration of fro each of those components. NiFi Templates are also stored in the flow.xml.gz, uncompressed and loaded in to heap as well. Once a NiFi template is created, it should be downloaded, stored outside of NiFi, and local copy of template inside of NiFi deleted. As far as your specific flow.xml.gz, I see a closing tag "</property>" following that indicates that some component has a property which typically consists of a "name" and "value" with the huge null laced null strings in the value field. I'd scroll up to see which component this property belongs to and then check why this value was set. Maybe it was a copy paste issue? Maybe this is just part of some template that was created with this large string for some purpose? Nothing here says with any certainty that there is a bug. If you found this response assisted with your query, please take a moment to login and click on "Accept as Solution" below this post. Thank you, Matt

MattWho · ‎01-10-2022

@techNerd I think your scenario may need a bit more detail to understand what you are doing and what it is doing versus what you want the flow to do. The ListFile only listed information about file(s) found in the target directory. It then generates a one of more FlowFiles from the listing that was performed. A corresponding FetchFile processor would actually retrieve the content for each of the listed files. From the sounds of your scenario, you have instituted a 20 sec delay somehow between that ListFile and FetchFile processor? Or you have configured the run schedule on the ListFile processor to "20 secs"? Setting the run schedule only tells the processor how often it should request a thread from the NiFi controller that can be used to execute the processor code. Once the processor gets its thread, it will execute. The ListFile processor will list all files present in the target source directory based on the configured file and path filters. For each File listed it will produce a FlowFile. Run schedule does not mean it executes for a full 20 seconds continuously checking the input directory to see if new files arrive. The run schedule also not impacted by how long it takes a listing to complete. It will request a thread every 20 seconds (00:00:20, 00:00:40, 00:01:00, etc...). The configured "concurrent tasks" controls whether the processor can execute multiple listing in parallel. Let say the thread that was executed at 00:01:00 was still executing 20 seconds later. Since that thread is still using the default 1 concurrent task, the listFile would not be allowed to request another thread from the controller at that time. Since the run schedule is independent of the thread execution duration, there is no way to dynamically alter the schedule. There is also no way for a new file to get listed at same time as a previous file (unless both were already present at time of listing) within the same thread execution. The listFile use the configured "Listing Strategy" to control how it handles listing of files. A "tracking" strategy is used to prevent the ListFile processor from listing the same file twice by recording some information in a state provider or a cache. If "No Tracking" is configured, the listFile will list all found files every time it executes. ListFile does not remove the source file from the directory. Removal of the source file is a function optionally handled by the corresponding FetchFile processor. If this is not clear, share more details around your use case and flow design specific so I can provide more direct feedback. Here is the documentation around processor scheduling (works the same no matter which processor is being used): https://nifi.apache.org/docs/nifi-docs/html/user-guide.html#scheduling-tab If you found this response assisted with your query, please take a moment to login and click on "Accept as Solution" below this post. Thank you, Matt

MattWho · ‎01-10-2022

@Griggsy The "Advanced" UI capability in the UpdateAttribute processor allows you to setup rules to accomplish exactly what you are describing here in your use case. When you right click on the UpdateAttribute processor you will see the "Advanced" button in the lower left corner. From the Advanced UI you can configure conditional rules. 1. Click "+" to the right of "Rules" to create your first rule. 2. Then click "+" to right of "Conditions" to create your boolean NiFi Expression Language (NEL) statement(s). When using multiple statements, all statements must result in true before the corresponding "Action(s)" would be applied. 3. Then click "+" to right of "Actions" to set create one or more actions that are applied if the condition(s) for this rule are true. The thing is that your example or condition does not make sense to me. You are returning the value from a FlowFile attribute named "hostname" and checking if it contains the string "Mickey" in both NEL statements. However, for each you want to return a different output value? Since both would always return either "true" or "false", which output would be desired? Here is example where i check if "Hostname" attribute contains "Mickey" and if that returns true I set FlowFile attribute "hive_database" to "cartoon". I then have a second rule that checks the "Hostname" FlowFile attribute to see if it contains "Bond" and iid that returns true, I set "hive_database" FlowFile" attribute to "movie". The "conditions" and "actions" look similar for the "Bond" rule except "Mickey" is changed to "Bond" and cartoon is set to "movie" instead of "cartoon". So above rules give you an If/then capability. Another feature of using the advanced UI is the "else" capability. Outside the "Advanced" UI set a attribute for "hive_database": If I were to define actions that set the "hive_database" FlowFile attribute and upon evaluating a FlowFile none of the configured rules that set the "hive_database" attribute are applied, the update Attribute processor will set apply the value set outside the advanced UI. If a rule sets the the attribute "hive_database", then value defined outside advanced UI is ignored/not set. If you found this response assisted with your query, please take a moment to login and click on "Accept as Solution" below this post. Thank you, Matt

MattWho · ‎01-10-2022

@Elmer What you described here is the use case for the NiFi Parameters capability. Parameters allow you to set configurations values within NiFi components thaT are unique per environment. Parameters can be used in both non-sensitive and sensitive property ( password) fields. Then in each unique environment you set up a parameter context that uses the parameter name, but with the unique value for that specific environment. Then when you port over your flow definition or template, you can configure the NiFi process group in which contains your ported over flow to use the cluster specific parameter context. No need to go to each component and reconfigure them anymore. If you found this response assisted with your query, please take a moment to login and click on "Accept as Solution" below this post. Thank you, Matt [1]https://nifi.apache.org/docs/nifi-docs/html/user-guide.html#Parameters

MattWho · ‎12-14-2021

@gvkarthik93 The HandleHTTPRequest processor only support user based authentication via a mutual TLS handshake. But there is no authorization built in to the processor. So if the truststore configured in the configured SSLContextService contains the needed trustedCertEntries to trust a user/client certificate presented in the handshake, that user would be allowed to send traffic to this listener. The processor could then route the success relationship via a connection to a routeOnAttribute processor that checks the dn set in the "http.subject.dn" FlowFile attribute created by the HandleHttpRequest processor to see if matches a list of DN. Based in that outcome decide to either route the FlowFile to a HandleHTTPResponse processor that responds with not authorized or route down an authorized path and respond accordingly. I want to check if the username and password is correct or not Check against what? NiFi has no local users with passwords for authentication, nor do any NiFi processors have any integration in to an external user authenticator like ldap. Even if you were to pass a username and password via a header on the request, there is no native processor that could take that username and password and verify it. Maybe you could use a scripting processor to validate the username and password against an external service like ldap. But between HandleHTTPRequest and that scripting processor, that user's name and password would be in plaintext within the FlowFiles attributes which is not ideal. Also keep in mind that if you do not use TLS at all, anything you send over the connection is not encrypted and in plain text. Hope this helps answer your query, Matt

MattWho · ‎12-14-2021

@Saraali I'd love to help, but I have never done any scripting to convert to excel format. All I can tell you is there is no native processor available in NiFi that does this conversion. I'd recommend raising a new feature request jira in apache for NiFi: https://issues.apache.org/jira/projects/NIFI/issues/ Perhaps others in the community may be looking for the same capability who are willing to contribute code? Thanks, Matt

MattWho · ‎12-13-2021

@armensano Just to make sure we are clear on your ask here. You have a secured NiFi and NiFi-Registry installation that uses user/client certificates to authenticate your user via the mutual TLS handshake? If that is the case, you should not be seeing a NiFi/NiFi-registry login window. Can you share screenshots? What do you see in the nifi-user.log when you attempt to access NiFi? Thanks, Matt

MattWho · ‎12-13-2021

@sunny_de NiFi can only support a single "authorizer". Your authorizers.xml file attached shows that you have two configured: 1. managed-authorizer 2. file-provider <-- (legacy and deprecated in favor of above) Which authorizer is actually being used by NiFi's core is determined by what you have configured in the following property in the nifi.properties file: nifi.security.user.authorizer= So make sure above is set to "managed-authorizer" This provider then references the: file-access-policy-provider This provider is responsible for generating the "authorizations.xml" file which contains all the authorizations created for your users. It will initially seed this file with the minimum authorizations needed for the configured "initial admin identity" string value (Case sensitive), NOTE: The authorizations.xml file is only generated if it does not already exist. If this file already exists and changes made to the file-access-policy-provider will not be reflected in that file. The expectation is once the initial admin and node policies have been seeded, all additional authorization are granted from within the NiFi UI. The "file-access-policy-provider" then references the: file-user-group-provider This provider is designed to generate the users.xml file if it does not exist and seed it with the initial user identities (initial admin and NiFi nodes if clustered). Seeing as how you're able to see the NiFi UI, that tells me that your initial admin was successfully created and given at least "view the user interface" policy permissions. The initial policy setup will only grant polciy permission related to the canvas if a flow.xml.gz was already present when NiFi was secured or started for the first time. In this case, it appears that you did not have a flow.xml.gz and NiFi created one on first startup which happens after the initial admin was created and authorized. You will however have admin policies assigned for your initial admin that will allow you to setup any additional policy you need. I can see from the "operate" panel in the left hand side of the canvas that the following icon is NOT greyed out: This means that your authenticated user is authorized to set policies on the current selected component (in this case the root process group). Click on this icon and grant yourself at a minimum the following policies: 1. view the component 2. modify the component 3. view the data 4. modify the data 5. view provenance After applying above, the icon across the top of the UI should no longer be greyed out since you are now authorized to make modification to the root process group. So now you can drag and drop items to the canvas. When you add new components (processors, input/output ports, funnels, child process groups, etc) to the canvas they will inherit there policies from the parent process group, but you can always select any component and specifically set policies per component. In a multi-team use NiFi, it is common to create a child process group for each team or person and authorized on the desired members to each process group. This prevents user1 from modifying user 2's process group and components within it and vice versa. For more information on setting users, groups, and policies, here is the link to the NiFi documentation that talks about this topic: https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#config-users-access-policies If you found this response assisted with your query, please take a moment to login and click on "Accept as Solution" below this post. Thank you, Matt

MattWho · ‎12-13-2021

@sam_001 It appears you are hitting a known bug in Apache NiFi 1.12: https://issues.apache.org/jira/browse/NIFI-7856 This provenance related bug was addressed in Apache NiFi 1.13. I recommend you upgrade. If you found this response assisted with your query, please take a moment to login and click on "Accept as Solution" below this post. Thank you, Matt

Online	Offline
Last Visited	‎12-26-2025 01:55 PM

Member Since	‎07-30-2019 10:41 AM
Last Visited	‎12-26-2025 01:55 PM
Posts	3,406
Kudos received	1619

Cloudera Community

Re: Error importing NiFi workflow template from ve...

Re: Error importing NiFi workflow template from ve...

Re: How to elevate a default nifi user to admin - ...

Re: NiFi EnvokeHTTP - putting current date on HTTP...

Re: Invoking Nifi rest api in Data Flow

Re: NiFi: Transform flowfile from AVRO to TXT (CSV...

Re: Nifi out of memory error java.io.ByteArrayOutp...

Re: NiFi List File Processor - Interrupt run sched...

Re: IF statement for NiFi attribute result - NiFi

Re: Configure Apache NiFi process endpoints via se...

Re: How do I implement basic authentication on Nif...

Re: how to convert CSV to excel using apache nifi

Re: can't open working page after login with certi...

Re: No enabled components in top toolbar after log...

Re: Post upgrade of apache nifi cluster getting F...