About MattWho

MattWho · ‎08-31-2017

@Kiem Nguyen In a NiFi cluster, NiFi wants to make sure consistency across all nodes. You can't have each node in a NiFi cluster running a different version/state of the flow.xml.gz file. In a cluster, NiFi will replicate a request (such as stop x processor(s)) to all nodes. Since a node is not connected, that replication cannot occur. So to protect the integrity of the cluster, the NiFi canvas is essentially read-only while a node is disconnected. Your two options are: 1. Reconnect the disconnected node and then stop your dataflow(s). 2. Drop the disconnected node form your cluster via the "cluster" UI found in the hamburger menu in the upper right corner of the UI. This will make your cluster a 2 of 2 cluster and will return UI to full functionality. You will need to then restart that dropped node in order to get it to try to join the cluster again once fixed. Thanks, Matt

MattWho · ‎08-30-2017

@sally sally What user own the running NiFi process? You need to make sure that user can read those keystore files. I suggest becoming that user on the NiFi server and making sure you can change directories to the location of cacerts.jks file. Also as that user run the keytool list command. keytool -v --list -keystore <path to keystore>/cacerts.jks The keystore file must contain a "PrivateKeyEntry" for your user. (The issuer will be the user if it is self-signed or it will be a CA if it was signed by a CA.) The truststore file must contain a "trustedCertEntry" for your user (self-signed) or the CA that signed the cert for you user. Also make sure that these files exist in the same location on every node if you are running a NiFi cluster. Thanks, Matt

MattWho · ‎08-29-2017

@dhieru singh The NiFi expression Language is used to evaluate and operate against the attributes of a FlowFile, Variable Registry key/value pair, a NiFi JVM pre-defined property, or a pre-defined system environment variable. What you are trying to operate against is the content of a FlowFile. Processor like RouteText and RouteOnContent as mentioned by @kdoran are the correct processor to use in this scenario. These processors expect you to create custom new properties that use Java Regular expressions instead of the NiFi Expression Language to parse against the content of a FlowFile. Using your example and a RouteOnContent processor, you might want to add a new property as follows: .*?([Cc][Hh][Ee][Cc][Kk]).*? The Java regular expression looks at the content for 0 or more characters, followed by "check" (case incentive), followed by 0 or more characters. The RouteOnContent will then have 2 relationships: "containsCheck" (user added above) and "unmatched" (default: always exists) Any FlowFiles with content not containing check (case incentive) will be routed to unmatched. You can choose to auto-terminate this relationship if you just want to throw these unmatched FlowFiles away. Thanks, Matt

MattWho · ‎08-29-2017

@Wesley Bohannon Glad you came up with a solution. Sorry I did not get back to you sooner. Vacation got in the way. 🙂

MattWho · ‎08-29-2017

@sally sally Many processors in NiFi support using a the NiFi "SSL Context Service". Looking at InvokeHTTP processor as an example: You can create as many SSL context service as you want. Each can be configured to use its own keystore and truststore files. There are plenty of resources on-line for taking your pem file and loading it into a PKCS12 (.p12 or .pfx file) keystore. For the truststore, I would suggest using the JKS truststore already in use by your secured NiFi. Thanks, Matt

MattWho · ‎08-29-2017

@Wesley Bohannon I setup a similar dataflow that is working as expected. The only difference is you made your fragment.index values 0-3 and I made mine 1-4. Is the FlowFile Attribute "table_name" set on all four FlowFiles? Is the value associated to the FlowFile Attribute "table_name" on all 4 FlowFiles exactly the same? Below is my test flow that worked: As you can see one 4 FlowFile merge was successful and a second is waiting for that 4th file before being merged. Thanks, Matt

MattWho · ‎08-16-2017

@nesrine salmene The Database repository consists of two H@ databases: nifi-user-keys.h2.db nifi-flow-audit.h2.db When NiFi is running you will see two additional lock files that correspond to these databases. The nifi-user-keys.h2.db is only used when NiFi has been secured and it contains information about who has logged in to NiFi. The same information here is also output to the nifi-user.log. You can parse the nifi-user.log to audit who has logged in to a particular NiFi instance. The nifi-flow-audit.h2.db is used by NiFi to keep track of all configuration changes made within the NiFi UI. The information contained in this DB is viewable via the "Flow Configuration History" embedded UI found under the Upper right corner hamburger menu in NiFi's UI: You can use NiFi's rest API to query the Flow Configuration History. Thanks, Matt

MattWho · ‎08-16-2017

@Wesley Bohannon Is this a NiFi standalone or a NiFi cluster? If cluster, are the FlowFiles being produced by each of your SelectHiveQL processors being produced on the same node? The MergeContent processor will not merge FlowFiles from different cluster nodes. Assuming that all FlowFiles are on same NiFi instance, the only way I could reproduce your scenario was: Each FlowFile had a different value assigned to the "table_name" FlowFile Attribute and Merge Strategy was set to "Bin-Packing Algorithm". This caused each FlowFile to be placed in its own bin. At the end of 5 minutes max bin age, each bin of 1 was merged. If the intent is always to merge one FlowFile from each incoming connection, what is the purpose of setting a "Correlation Attribute Name" Setting Maximum number of bins to 1 and having 4 source FlowFiles become queued at different times. The "Defragment" Merge Strategy will bin FlowFiles based on FlowFiles with matching values in the "fragment.identifier" FlowFile Attribute. It will then merge the flowFiles using the "fragment.index" and "fragment.count" attributes. Since you have also specified a correlation attribute, the MergeContent processor will instead use the value associated to that attribute instead of "fragment.identifier" to bin your files. If I have unique values on each FlowFile for "table_name", then each FlowFile ends up in a different bin and are routed to failure right away (if bins set to 1) or after 5 minutes max bin age since not all fragments where present. The other possibility is that "fragment.count" and "fragment.index" is set to 1 on every FlowFile. I would stop your MergeContent processor and allow 1 FlowFile to queue in each connection feeding it. Then use the "list queue" capability to inspect the attributes on each queued FlowFile. What values are associated to each FlowFile for the following attributes: fragment.identifier fragment.count fragment.index table_name Thank you, Matt

MattWho · ‎08-16-2017

@Pierre Leroy Splitting such a large file may result in Out Of Memory (OOM) errors in NiFi. NiFi must create every split FlowFile before committing those splits to the "splits" relationship. During that process NiFi holds the FlowFile attributes (metadata) on all those FlowFile being produced in heap memory space. What you image above shows is that you issue a stop on the processor. What this indicates is that you have stopped the processor scheduler form triggering again. The processor will still allow any existing running threads to complete. The small number "2" in the upper right corner indicates the number of threads still active on this processor. If you have run out of memory for example, this process will probably never complete. A restart of NiFi will kill off these threads When splitting very large files, it is common practice to use multiple splitText processors in series with one another. The first SplitText is configured to split the incoming files in to large chucks (say every 10,000 to 20,000 lines). The second SplitText processor then splits those chunks in to the final desired size. This greatly reduces the heap memory footprint here. Thanks, Matt

MattWho · ‎08-15-2017

@Hadoop User It is unlikely you will see the same performance out of Hadoop between reads and writes. The Hadoop Architecture is designed in such a way to favor multiple many readers and few data writers. Increasing the number of concurrent tasks may help but performance since you will then have multiple files being written concurrently. 1 - 2 KB files are very small and do not make optimal use of your Hadoop architecture. Commonly, NiFi is used to merge bundles of files together to a more optimal size for storage in Hadoop. I believe 64 KB is the default optimal size. You can remove some of the overhead of each connection by mergeing files together in to larger files using the MergeContent processor before writing to Hadoop. Thanks, Matt

Online	Online
Last Visited	‎01-31-2026 10:35 AM

Member Since	‎07-30-2019 10:41 AM
Last Visited	‎01-31-2026 10:35 AM
Posts	3,427
Kudos received	1628

Cloudera Community

Re: Setting TTL per key when writing to redis

Re: Best Practice for configuring registry flows

Re: Nifi 2.7.2 Start Problem

Re: Error importing NiFi workflow template from ve...

Re: Error importing NiFi workflow template from ve...

Re: Can not stop processor in cluster when a node ...

Re: Using ssl cert file for autentification in...

Re: NiFi Expression language to check if the file ...

Re: Nifi MergeContent Not Merging

Re: Using ssl cert file for autentification in...

Re: Nifi MergeContent Not Merging

Re: Database repository nifi

Re: Nifi MergeContent Not Merging

Re: Nifi SplitText Big File

Re: puthdfs is writing slow