About MattWho

MattWho · ‎01-19-2022

@OliverGong I would avoid dataflow design when possible where you are extracting the entire contents of a FlowFile to FlowFile attribute(s). While FlowFile content only exists on disk (unless read in to memory by a processor during processing), FlowFile attributes are held in NiFi's JVM heap memory all the time (There is per connection swapping that happens when a specific connection reaches the swap threshold set in the nifi.properties file). FlowFiles with lots of attributes and/or large attribute values will consume considerable amounts of JVM heap which can lead to JVM Out Of Memory (OOM) exceptions, long stop-the-world JVM Garbage Collection (GC) events, etc... When options exist that avoid adding large attributes, those should be utilized. Thanks, Matt

MattWho · ‎01-19-2022

@Wisdomstar Thank you, I appreciate that and glad I could help. Matt

MattWho · ‎01-18-2022

@Kilynn So as i mentioned in my last response, once memory usage go to high, OS level OOM Killer was most likely killing the NiFi service to protect the OS. The NiFi bootstrap process would have detected the main process died and started it again assuming OOM killer did not kill the parent process.

MattWho · ‎01-18-2022

@oopslemon NiFi only encrypts and obscures values in properties that support sensitive properties (so those properties which are specifically coded as sensitive properties like "password" properties). So there is no way at this time to encrypt all or portions of property values not coded as sensitive. Keep in mind it is not just what is visible in the UI, your unencrypted passwords will be in plaintext with the NiFi flow.xml.gz file as well. My recommendation to you is to use mutual TLS based authentication instead. You can create a clientAuth certificate to use in your rest API calls. Then you need to make sure that your clientAuth certificate is authorized to perform the actions the rest-api call is making. This is not going to be possible while using the single user login mode as it does not allow you to setup additional users and authorizations. This single users authentication and authorization providers where added to protect users from unprotected access to their NiFis. It was not meant to be the desired choice when securing your NiFi. It is one step above an unsecured default setup that existed prior to NiFi 1.14. It protects you, but also has limitations that go with its very basic functionality. So step one is to switch to another method of authentication and authorization to you NiFi. TLS is always enabled for authentication as soon as NiFi is configured for HTTPS. You can configure additional authentication methods like ldap/AD. https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#user_authentication The authorizer configured in the authorizers.xml file allows you to establish policies that control user/client permissions. https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#multi-tenant-authorization Then you can configure your invokeHTTP processor to simply use a SSLContextService that you would configure with your clientAuth certificate keystore and a truststore. The password fields in this controller service would be encrypted. No more need to constantly get a new bearer token. All you need to worry about is getting a new client certificate before the old one expires which is typically every 2 years, but that is configurable when you create it and get it signed. If you found this response assisted with your query, please take a moment to login and click on "Accept as Solution" below this post. Thank you, Matt

MattWho · ‎01-18-2022

@Wisdomstar You should be able to use the JoltTransformJson NiFi processor to accomplish this. There is a thread here where a jolt specification example exists fro doing what you are trying to do: https://stackoverflow.com/questions/54696540/jolt-transformation-lowercase-all-keys Add the Jolt specification in to the "Jolt Specification" property in the JoltTransformJson NiFi processor. Set "Pretty Print" to true if you still want the nice multi-line formatted json to be produced. If you found this response assisted with your query, please take a moment to login and click on "Accept as Solution" below this post. Thank you, Matt

MattWho · ‎01-14-2022

@sandip87 It would be difficult to point our the specific misconfiguration without seeing your nifi.properties, login-identity-providers.xml, and authorizers.xml files. Also sharing the complete stack trace thrown during NiFi service startup would be helpful as well. There should be more than what you shared that was logged in the nifi-app.log. Thanks, Matt

MattWho · ‎01-14-2022

@samarsimha Did you make any recent changes before you restarted your NiFi nodes to things like hostnames or ports? You could try stopping you NiFi nodes, removing the NiFi local state directory on all nodes, and then restarting NiFi again. You can check the state-management.xml configuration file to see where each node is keeping local state. The default for Apache NiFi is "./state/local". If you found this response assisted with your query, please take a moment to login and click on "Accept as Solution" below this post. Thank you, Matt

MattWho · ‎01-14-2022

@gm There are a few changes to you command you will need to make. All configuration changes need to include the current revision. You can get the current revision by first executing the following: curl 'https://<hostname>:<port>/nifi-api/processors/5964ef54-017e-1000-0000-0000219f4de1' \ -H 'Authorization: Bearer <BT>' \ --compressed \ --insecure Then to make a change you can then use: curl 'https://<hostname>:<port>/nifi-api/processors/5964ef54-017e-1000-0000-0000219f4de1' \ -X 'PUT' \ -H 'Authorization: Bearer <BT>' \ -H 'Content-Type: application/json' \ -d '{"component":{"id":"5964ef54-017e-1000-0000-0000219f4de1","config":{"properties":{"bootstrap.servers":"${MIL_KAFKA_KERB_BROKERS}"}}},"revision":{"version":<revision number>}}' \ -i \ -k Things to take careful note of: 1. The user friendly property names shown in processors on the NiFi UI may not always match with the actual property name being modified. Above is a perfect example since the consume and publish Kafka processor displays "Kafka Brokers"; however the actual kafka property being set is "bootstrap.servers". 2. It might be safer to use --data-raw in stead of just -d since the content may have = and @ used in it. 3. Start with '{" instead of '" only. 4. Be carful when copying from a text editor as the ' and " may get altered/changed by the editor. 5. All changes require a correct revision number. The first command I provided will return you the current revision for the component. Then use that revision number as shown in above example when you PUT the change. Making use of the "Developer tools" provided within your browser will make it easier when trying to troubleshoot NiFi rest-api requests. Simply open developer tools, make change to property, click "apply" on the component and observer the call made in the Network tab of the developer tools. In most developer tools you can right click on the call and select "Copy as curl". Then paste that copied command in your editor of choice for review. Keep in mind the what you copy will have some additional unnecessary headers. If you found this response assisted with your query, please take a moment to login and click on "Accept as Solution" below this post. Thank you, Matt

MattWho · ‎01-14-2022

@Kilynn I see that you have set some very high values (330 GB) for your NiFi heap space. This is really not recommended. Can also see from what you have shared that the particular NiFi instance has been up for ~2.5 hours and within that 2.5 hour timeframe it has been in a stop-the-world state due to Java Garbage Collection (GC) for ~20 minutes. This means that your NiFi is spending ~13% of its uptime doing nothing by stop-the-world garbage collection. Is there a reason you set your heap this high? Setting large heap simply because your server has the memory available is not a good reason. Java GC kicks in when heap usage is around 80% usage. When an object in heap is no longer being used it is not actually cleaned out of heap and space reclaimed at that time. Space is only reclaimed via GC. The larger the heap the longer the GC events are going to take. Seeing as how you current heap usage is ~53%, I am guessing with each GC event a considerable amount of heap space is being released. Long GC pauses (stop-the-world) can result in node disconnection because NiFi node heartbeats are not being sent. When NiFi seems to be restarting with no indication at all in the nifi-app.log, you may want to take a look at your systems /var/log/messages file for any indications of the Out Of Memory (OOM) killer being executed. With your NiFi instance being the largest consumer of memory on the host, it would be the top pick process by the OS OOM Killer when available OS memory gets too low. In most case you should not need more the 16GB of heap to run most dataflows. If you are building dataflows that utilize a lot of heap, I'd recommend taking a close look at your dataflow designs to see if there are better design choices. This that can lead to large heap usage is creating very large FlowFile attributes (for example, extracting large amount of a FlowFile's content in to a FlowFile attribute). FlowFile attributes all live inside the JVM heap. Some other dataflow designs elements that can lead to high heap usage include: - Merging a very large number of FlowFiles in a single Merge processor. Better to use multiple merge processors in series with first merging up 10,000 to 20,000 FlowFiles and then second merging those in to even larger files - Splitting a very large file in a single split processor resulting in a lot of FlowFiles produced in a single transaction. Better to use multiple or even look at ways of processing the dat without needing to split the file in to multiple files (look at using the available "record" based processors) - Reading in entire content of a large FlowFile in to memory to perform action on it. Like ExtractText processor configured for entire text instead of line-by-line mode. While I 100% agree that you should be looking in to your "thread" allocation choices in your nifi.properties file. Many of them seem unnecessarily high for a 7 node cluster. You should understand that each node in your cluster executes independently of the others, so the settings applied pertain to each node individually. Following the guidance in the NiFi app guide (can be found in NIFi UI under help or on Apache NiFi site) is strongly advised. If you found this response assisted with your query, please take a moment to login and click on "Accept as Solution" below this post. Thank you, Matt

MattWho · ‎01-14-2022

@LejlaKM Sharing your dataflow design and processor component configurations may help get you more/better responses to your query. Things you will want to look at before and when you run this dataflow: 1. NiFi heap usage and general memory usage on the host 2. Disk I/O and Network I/O 3. NiFi Host CPU Utilization (If your flow consumes 100% of the CPU(s) during execution, this can lead to what you are observing. Does UI functionality return once copy is complete?) 4. Your dataflow design implementation including components used, configurations, concurrent tasks etc. While most use cases can be accomplished through dataflow implementations within NiFi, not all use cases are a good fit for NiFi. IN this case your description points at copying a large Table from One Oracle DB to another. You made not mention of any filtering, modifying, enhancing, etc being done to the Table data between this move which is where NiFi would fit in. If your use case is a straight forward copying from A to B, then NiFi may not be the best fit for this specific us case as it will introduce unnecessary overhead to the process. NiFi ingest content and writes it a content_repository and creates FlowFiles with attributes/metadata about the ingested data stored in a FlowFile_repository. Then it has to read that content as it writes ti back out to a destination. For simple copy operations where not intermediate manipulation or routing of the DB contents needs to be done, a tool that directly streams from DB A to DB B would likely be much faster. If you found this response assisted with your query, please take a moment to login and click on "Accept as Solution" below this post. Thank you, Matt

Online	Offline
Last Visited	‎12-26-2025 02:55 PM

Member Since	‎07-30-2019 10:41 AM
Last Visited	‎12-26-2025 02:55 PM
Posts	3,406
Kudos received	1619

Cloudera Community

Re: Error importing NiFi workflow template from ve...

Re: Error importing NiFi workflow template from ve...

Re: How to elevate a default nifi user to admin - ...

Re: NiFi EnvokeHTTP - putting current date on HTTP...

Re: Invoking Nifi rest api in Data Flow

Re: How can I used JoltTransformJson Processor to ...

Re: How to lower case all JSON attributes in a flo...

Re: NIFI Cluster Node Restarts

Re: Accessing NiFi RestAPI with InvokeHTTP

Re: How to lower case all JSON attributes in a flo...

Re: NiFi Cluster error when using managed-authoriz...

Re: Nifi : Unable to elect cluster coordinator

Re: Update NiFi processor property using api

Re: NIFI Cluster Node Restarts

Re: Oracle nifi big data