Member since
07-30-2019
2281
Posts
1246
Kudos Received
643
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
49 | 06-29-2022 11:12 AM | |
78 | 06-21-2022 08:37 AM | |
112 | 06-17-2022 05:56 AM | |
94 | 06-15-2022 11:39 AM | |
50 | 06-15-2022 06:27 AM |
07-01-2022
08:57 AM
1 Kudo
@Brenigan Are you running your dataflow on a standalone NiFi install or a NiFi cluster install? If a multi node NiFi cluster, are all 200 FlowFiles on the same NiFi node? Does your partition_number start at 0? Do you see your FlowFiles getting routes to the overtook relationship after 10 minutes? Assuming all the following: 1. All FlowFiles are on same NiFi node 2. partition_number starts at "0" and "increments consistently by "1" 3. All FlowFiles have same filename 4. Wait relationship is route via a connection back to the EnforceOrder processor. You should be seeing: 1. All FlowFiles routed to the "wait" relationship until a FlowFile with attribute "partition_number" equal to "0" is processed which will result in that FlowFile routing to success. 2. Other FlowFiles meeting above 4 criteria will continue to loop through wait until "partition_number" attribute with value "1" is seen and routed to success. 3. If a FlowFile in incremental order is missing, all FlowFiles with a partition_number higher than the next expected integer will continue to route to wait relationship. 4. after the configured "wait timeout" any FlowFile that has been waiting this long will be routed to the "overtook" relationship. You can right click on a connection holding the FlowFiles and list the queue. From there you can select the "view details" icon to the far left to examine the FlowFiles current attributes. You should see a new attribute " EnforceOrder.expectedOrder" that contains the next expected integer value that the group this FlowFile belongs to is waiting for. You will also find your "partition_number" which will have the current integer for this FlowFile. If you have your FlowFiles distributed across multiple nodes in a NiFi cluster, you will need to get all FlowFiles with the same "group identifier" moved to the same NiFi node in order to enforce order (you can not enforce order across different nodes in a NiFi cluster). You can accomplish this by editing the connection feeding your enforceOrder processor and under settings select a "Load Balancing Strategy" of "Partition by Attribute" using the "filename" attribute that you are using as your group identifier in the Enforce Order processor. If you found this response assisted with your query, please take a moment to login and click on "Accept as Solution" below this post. Thank you, Matt
... View more
06-30-2022
12:34 PM
1 Kudo
@pandav You can not offload a NiFi node that is down. Can you clarify what you mean by "down"? Was the NiFi service not running on the nodes you attempted to offload? The offload option from the cluster UI sends a request to the disconnected (not down) node to offload its queued FlowFiles to nodes still connected to the cluster. If your nodes are down, you'll need to start the service on those nodes again. On startup (assuming no issues), these nodes will rejoin your cluster. If you plan to decomission a node later, you can use the NiFi cluster UI to manually disconnect a node and then offload that nodes FlowFiles. Once the FlowFiles have been successfully offloaded, the node can be deleted from the cluster using the NiFi cluster UI. Note: restarting a node that has been dropped/deleted from the cluster will trigger that node to start heartbeating to the cluster and thus reconnect unless you edit the configuration of the node so it does not use the same zookeeper znode as the current cluster ( nifi.zookeeper.root.node property in nifi.properties file). https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#basic-cluster-setup As far as your nodes going down on a configuration change, you'll want to inspect the NiFi logs for an exceptions or timeouts that may have occurred. Network issues, long Garbage Collection (GC) pauses, and resource congestion/exhaustion can lead to nodes not responding or receiving the replicated change request. As a result a node can get disconnected. In the scenarios like this if you are using the latest Apache NiFi release, those nodes should automatically reconnect. Upon reconnect, if the nodes flow does not match the cluster flow, the node will automatically take the clusters flow and join. In order release a flow mismatch would between connecting node and cluster flow, would require manual intervention (copying the flow.xml.gz from a node still in the cluster to the node not connecting). If you found this response assisted with your query, please take a moment to login and click on "Accept as Solution" below this post. Thank you, Matt
... View more
06-30-2022
07:14 AM
@Gogineni Good to see that you are now getting a 401 instead of a 400. So this becomes more of an issue with what you endpoint rest-api expects in the get method. I am not sure what is in the content of you flowfile, but are you sure you want to send this in your GET method at this point in your dataflow? I am sure it was used earlier to fetch your access token, but probably not needed now. So try changing "Send Message Body" to "false". Also not sure how long the token you obtained is valid. Have you tried performing the same request via curl from command line? If you found this response assisted with your query, please take a moment to login and click on "Accept as Solution" below this post. Thank you, Matt
... View more
06-30-2022
06:04 AM
@araujo Once " nifi.content.repository.archive.enabled=false" is set to false, content claims that no longer have any FlowFiles referencing them will no longer get moved to "Archive" sub-directories. They instead simply get removed. The logic built around the backpressure checks to see if there is archived content claims still and if none, allows content to continue to be written until disk is full. If the archive claimant count is not zero, backpressure kicks in until that count goes to zero through archive removal. This backpressure mechanism is in place so that the archive clean-up process can catch-up. The fact that NiFi will allow content repo writes until disk full is why it is important that users do not co-locate any of the other NiFi repositories on the same disk as the content repository. If disk filling became an issue, the processors that write new content would not just stop executing. They would start throwing exceptions in the logs about insufficient disk space. @Drozu The original image shared of your canvas shows a MergeContent processor in a "running" state , but does not indicate an active thread at time of image capture. An active thread shows as a small number in the upper right corner of the processor. The processor image all shows that this MergeContent processor also executed 2,339 threads in just the last 5 minutes. Execution does not mean an output FlowFile will always be produced. If none of the bins are eligible for merge, then nothing is going to be output. When the processor is in a state of "not working" do all that processor stats go to 0 including the "Tasks/Time"? Does it also at that same time indicate a number in its upper right corner? This would indicate that the processor has an active thread that has been in execution for over 5 minutes. In a scenario like this, it is best to get a series of ~4 NiFi thread dumps to see why this thread is running so long and what it is waiting on. If the stats go to zeros and you do not see an active thread number on the processor, this indicates the processor is not getting a thread in last 5 minutes from the Timer Driven thread pool. Then you need to look at thread pool usage per node. Is the complete thread pool in use by other components? Thanks, Matt
... View more
06-30-2022
05:27 AM
@Gogineni The dynamic property you added to the invokeHTTP processor is not using valid NiFi Expression Language (NEL). The name of your custom property "Authorization" is the header name and the evaluated NEL becomes the value to that header. What you have is: $.Authorization However, the valid NEL to use to return the value from the NiFi FlowFile Attribute "Authorization" created by the earlier UpdateAttribute processor would be: ${Authorization} If you found this response assisted with your query, please take a moment to login and click on "Accept as Solution" below this post. Thank you, Matt
... View more
06-29-2022
01:06 PM
@rafy Have you looked at the following processors: 1. GenerateTableFetch 2. QueryDatabaseTable 3. QueryDatabaseTableRecord Thanks, Matt
... View more
06-29-2022
12:49 PM
@Drozu Switching off the content repository archiving would not result in an automatic clean-up of your archived content claim. Make sure that all the "archive" sub-directories in the numbered directories within the content-repository are empty. After disabling archive, any change in disk utilization on your 3 nodes? Did content repository disk fill to 100%? There are many things that go into evaluating performance of your NiFi and its dataflows. Anytime you add new components via the NiFi canvas, the dynamics can change. How many components are running? (if all 50 timer driven threads are currently in use by other components, other components will just be waiting for an available thread) How often is JVM garbage collection (GC) happening? How many timer driven threads are in use at time processors seems to stop? How are the queued FlowFiles to the MergeContent distributed across your 3 nodes? How many concurrent tasks on MergeContent? What do the Cluster UI stats show for per node thread count, GC stats, cpu load average, etc.? Any other WARN or ERROR log output going on in the nifi-app.log. (Maybe related to OOM or Open File limits for example)? Looks like you are using your mergeContent processor to merge two FlowFiles together that have the same filename attribute value. Does one inbound connection contain 1 FlowFile and the other contain the other FlowFile in the pair? The MergeContent is not going to parse through the queued FlowFiles looking for a match. How are you handling "Failure" with the MergeContent? It round robins each connection, so in execution A, it reads from connection 1 and bins those FlowFiles. Then on next execution, it reads from connection B. Try adding a funnel before your MergeContent and redirecting your two source success connection in to that funnel and dragging a single connection from the funnel to the MergeContent. Thank you, Matt
... View more
06-29-2022
12:17 PM
@ajignacio You should carefully read all the migration guidance leading up to 1.16 starting with: Migrating from 1.x.x to 1.10.0 Take special note of: 1. Any nars that may have been removed and make sure your dataflows are not using any processors from those removed nars. 2. Any reported changes to specific components you may use in your dataflows. 3. Check that your dataflow does not have any processors with inbound connection scheduled to execute on "Primary Node" only (small P in upper left corner of processor). 4. Take note of migration step involving sensitive .props.key. If you had not set one previously, you may want to use the nifi toolkit to create a new user defined one and re-encrypt the sensitive property values in the flow.xml.gz using that new sensitive props key. 5. Make sure you upgrade to a java 8 NiFi 1.16 supported Java version before migration. NOTE: While Apache NiFi has limits on the maximum size fo the service forcing deprecation of older nars, Cloudera's CFM distributions of Apache NiFi do not and include almost all Apache nars in addition to Cloudera specific nars. If you found this response assisted with your query, please take a moment to login and click on "Accept as Solution" below this post. Thank you, Matt
... View more
06-29-2022
11:12 AM
2 Kudos
@rafy Each node in a NiFi Cluster has its own copy of the dataflow and executes independently of the other nodes. Some NiFi components are capable of writing cluster state to zookeeper to avoid data duplication across nodes. Those NiFi ingest type components that support this should be configured to execute on primary node only. In a NiFi cluster, one node will be elected as the primary node (which node is elected can change at any time). So if a primary node change happens, the same component on a different node will now get scheduled and will retrieve cluster state to avoid ingesting same data again. Often in these types of components, the one that records sate does not typically retrieve the content. It simply generates metadata/attributes necessary to later get the content with the expectation that in your flow design you distribute those FlowFiles across all nodes before content retrieval. For example: - ListSFTP (primary node execution) --> success connection (with round robin LB configuration) --> FetchSFTP (all node execution) The ListSFTP creates a 0 byte FlowFIle for each source file that will be fetched. The FetchSFTP processor uses that metadata/attributes to get the actual source content and add it to the FlowFile. Another example your query might be: GenerateTableFetch (primary node execution) --> LB connection --> ExecuteSQL The goal with these dataflows is to void having one node ingest all the content (added network and Disk I/O) only to then add more network and disk I/O to spread that content across all nodes. So instead we simply get details about the data to be fetched so that can be distributed across all nodes, so each nodes gets only specific portions of the source data. If you found this response assisted with your query, please take a moment to login and click on "Accept as Solution" below this post. Thank you, Matt
... View more
06-29-2022
06:45 AM
@shuhaib3 The nifi.properties file is not the correct file to pass to the "-p" option for the NiFi Toolkit cli.sh. The "-p" expects you to pass a properties file you build with specific properties in it. For example: baseUrl=https://<target node hostname>:<target node port>
keystore=/path/to/keystore.jks
keystoreType=JKS
keystorePasswd=changeme
keyPasswd=changeme
truststore=/path/to/truststore.jks
truststoreType=JKS
truststorePasswd=changeme
proxiedEntity=nifiadmin The nifi.properties will not include these exact property names and include other properties not used by cli.sh. The following exception: ERROR: Error executing command 'current-user' : PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target indicates a trust chain issue between client (cli.sh) and server (target NiFi). This means that the truststore is missing one or more TrustedCertEntry for the PrivateKeyEntry presented from the keystore in the mutual TLS handshake. Essentially the client initiates a connection to the server. The server responds with its serverAuth certificate along with a list of trusted authorities (TrustedCertEntry entries) from servers truststore. Every certificate private (PrivateKeyEntry) or public (TrustedCertEntry) has an owner (certificates distinguished name (DN)) and issuer (Distinguished name (DN) of signer of that certificate). The client looks at the issuer of the sever's certificate and checks it's truststore for a certificate owner with that same DN. If found it checks the issuer of that certificate to see if issuer and owner have same DN (self signed). If not the same, it looks again for a certificate with an owner matching that issuer DN. This continues until finds the root signing certificate (root certificate will have same issuer and owner). This compete chain of certificate authorities is known as the trust chain. If the complete trust chain is missing you get above exception. Same can happen in the other direction. Assume above is successful, then the client returns its clientAuth certifcate (keystore) to the server to authorize who the client is. The server (NiFi node) will verify trust in the same way using the truststore on the server side. So the complete trust chain for that client certificate must also exist on the server side. If complete trust chain exist here as well, the mutual TLS handshake can be successful. You can manually inspect the contents of your client and server side keystores and truststore files using the java keytool command. <path to java>/keytool -v -list -keystore <keystore or truststore> If you found this response assisted with your query, please take a moment to login and click on "Accept as Solution" below this post. Thank you, Matt
... View more
06-29-2022
06:20 AM
@Tryfan You mention this file comes in daily. You also mention that this file arrives through a load-balancer so you don't know which node will receive it. This means you can't configure your source processor for "primary node" only execution as you have done in your shared sample flow with the ListFile. As Primary Node only, the elected primary node will be the only node that executes that processor. So if the source file lands on any other node, it would not get listed. You could handle this flow in the following manor: GetFile ---> (7 success relationships) PostHTTP or InvokeHTTP (7 of these with one configured for each node in your cluster cluster) ListenHTTP --> UpdateAttribute --> PutFile So in this flow, no matter which node receives your source file, the GetFile will consume it. It will then get cloned 6 times (7 copies then exist) with one copy of the FlowFile getting routed to 7 unique PostHttp processors. Each of these targets the ListenHTTP processor listening on each node in your cluster. That ListenHTTP processor will receive all 7 copies (one copy per node) of the original source file. Then use the UpdateAttribute to set your username and location info before the putFile which place each copy in the desired location on the source node. If you add or remove nodes from your cluster, you would need to modify this flow accordingly which is a major downside to such a design. Thus the best solution is still one where the source file is placed somewhere all nodes can retrieve it from so it scales automatically. If you found this response assisted with your query, please take a moment to login and click on "Accept as Solution" below this post. Thank you, Matt
... View more
06-28-2022
11:12 AM
@VinceSailor Check your nifi.properties file for an identity mapping pattern that contains a Java regex that matches on your DNs. If one does match, the corresponding value is returned and passed to authorizer. so it might be possible your authorizer is only getting: nif1-adm.mydomain.com instead of: CN=nif1-adm.mydomain.com, OU=NIFI Thus resulting in your untrusted proxy exception. That untrusted proxy error should include the exact identity string the authorizer was passed. If you found this response assisted with your query, please take a moment to login and click on "Accept as Solution" below this post. Thank you, Matt
... View more
06-21-2022
01:39 PM
1 Kudo
@Drozu That was the exact exception I want you to find. NiFi writes the content for each of your FlowFiles to the content repository in to content claims. These content claims can contain the content for 1 too many FlowFiles. Once a content claims has zero FlowFiles referencing that claim anymore it gets moved to an "archive" sub-directory. A NiFi background thread then deletes those claims based on archive retention settings in the nifi.properties file. https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#content-repository To help prevent NiFi from filling the disk to 100% because of old archived claims, NiFi will stop all processors from being able to write to the content repository. So when you see this log message, NiFi is applying backpressure and blocking processors that write new content (like MergeContent) until archive cleanup free space or no archived claims exist anymore. As far as stopping and starting the processor goes, it think that action simply gave enough time for archive cleanup to get below the threshold. Then new content written by this processor along with other pushing threshold above the backpressure limit again. The below property controls this upper limit: nifi.content.repository.archive.backpressure.percentage If not set it defaults to 2% higher then what is set in this property: nifi.content.repository.archive.max.usage.percentage The above property defaults to "50%", which you stated is your current disk usage. So i think you are sitting right at that threshold and keep blocking and unblocking. Things you should do: 1. Make sure you do not leave FlowFiles lingering in your dataflow queues. Since a content claim contains the content for 1 too many FlowFiles, all it takes is one small queued FlowFiles to hold up the archival and removal of a much larger claim. <-- most important thing to do here. 2. Change the setting for " nifi.content.repository.archive.backpressure.percentage" to something other than default (for example: 75%). But keep in mind that if your are not properly handling your FlowFiles in your dataflow and allowing them to accumulate in dead end flows or connections to stopped processors, you are just pushing your issue down the road. You will hit 75% and potentially have sam issue. If you don't care about archiving of content, turn is off: nifi.content.repository.archive.enabled=false There were also some bugs in 1.13 related to proper archive claimant count that were addressed in later releases. https://issues.apache.org/jira/browse/NIFI-9993 https://issues.apache.org/jira/browse/NIFI-10023 So I also recommend upgrading as well to 1.16.2 or newer. If you found this response assisted with your query, please take a moment to login and click on "Accept as Solution" below this post. Thank you, Matt
... View more
06-21-2022
08:37 AM
@arshahin NiFi is a pure java application. What version of Java is your NiFi using (NiFi does not include Java JDK, so it must be installed on the host). NiFi will also not start of Java is not installed, so your host has java installed somewhere. NiFi supports Java JDK 8 and Java JDK 11 (in newer releases), and Java JDK 17 (in latest 1.16 releases). So perhaps your issue is related to the Java version you are using in conjunction with the NiFi version? Looking closely at you stack trace you see the snappy library issues is related to the PutHiveStreaming processor: at org.apache.nifi.processors.hive.PutHiveStreaming https://issues.apache.org/jira/browse/NIFI-9282 https://issues.apache.org/jira/browse/NIFI-9527 Try upgrading to Apache NiFi 1.16 latest. If you found this response assisted with your query, please take a moment to login and click on "Accept as Solution" below this post. Thank you, Matt
... View more
06-21-2022
08:23 AM
@Drozu - How does disk usage for your NiFi repositories look? - Any ERRORS or WARN logging related writing content? - What do you have set for your "Max Timer Driven Thread cCount" in controller settings (how many cores does the host where you NiFi is running have?)? - Is this a standalone NiFi or a NiFi multi-node cluster (what is the distribution of the FlowFiles on the inbound connections?)? - How has the MergeContent processor been configured? Thanks, Matt
... View more
06-21-2022
08:04 AM
@ThongPham Sounds like you may making a lot of unnecessary rest-api calls that could impact your NiFi's overall performance. Have you maybe looked at using the SiteToSiteBulletingReportingTask? This reporting task will send a FlowFile to a remote input port upon execution of bulletin(s) are produced. That Remote Input Port could then be built into a dataflow that makes notifications via putEmail. So instead of constantly calling the rest-api to see if something happened in the last 5 minutes, the flow will simply send something out when it happens only. If you found this response assisted with your query, please take a moment to login and click on "Accept as Solution" below this post. Thank you, Matt
... View more
06-17-2022
05:56 AM
1 Kudo
@ThongPham There is no such thing as a permanent Bearer token. How long a Bearer token stays valid is set in the provider that issuing that bearer token. In you case the ldap-provider. Also keep in mind that a bearer token is issued by a specific node in the NiFi cluster and can not be used to authenticate with every node in the NiFi cluster. Since a secured NiFi will always attempt mutual TLS authentication first. I suggest you instead you generate and use a client certificate to interact with the NiFi API. Mutual TLS based authentication does not use bearer tokens and the authentication will be successful until that client certificate expires which is configurable when generating the certificate. But generally speaking certificates are often valid for 12 or months. Since there is no bearer token, a client certificate can be used with any node in the cluster. Your other option is to build a flow within your NiFi to get a new bearer token automatically and store that token in maybe a distributedMapCache. Then in your other flow you fetch that bearer token before calling the rest-api endpoint. A failure should loop back to the FetchDistrubutedMapCache just in case you have a scenario where the bearer token expires between fetch and call. Out of curiosity, what rest-api endpoint are you calling every 20 seconds and why? If you found this response assisted with your query, please take a moment to login and click on "Accept as Solution" below this post. Thank you, Matt
... View more
06-16-2022
12:53 PM
@yagoaparecidoti Authentication is one piece of being able to access NiFi's UI. While the file-user-group-provider and file-access-policy-provider facilitate the automatic creation of the initial admin user identity and setting of Admin needed policies for that user, It is the responsibility of that admin to add additional users to NiFi and add that user to authorization policies. The initial admin user would accomplish this directly form within the NiFi UI. Global menu in upper right corner --> users (add additional user identities here for which you want to setup authorizations) Global menu in upper right corner --> Policies (add users you have added to select NiFi controller policies) From canvas --> operate panel on left side --> key icon to add policies for specific components added to canvas https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#config-users-access-policies Thank you, Matt
... View more
06-15-2022
11:39 AM
2 Kudos
@Techie123 When you say " Cooperates Linux Machine", are you referring to http://www.colinux.org/ installation? If so, this does not sound like an ideal place to try and run a NiFi instance or NiFi cluster as it appears to isolate that linux to a single core on the host hardware. NiFi can be a resource intensive application, especially when setup as a cluster. When NiFi is started, you can tail the nifi-app.log to observe the complete startup process. NiFi has completed its startup once you see the following lines: 2022-06-15 13:52:52,727 INFO org.apache.nifi.web.server.JettyServer: NiFi has started. The UI is available at the following URLs:
2022-06-15 13:52:52,727 INFO org.apache.nifi.web.server.JettyServer: https://nifi-node1:8443/nifi
2022-06-15 13:52:52,730 INFO org.apache.nifi.BootstrapListener: Successfully initiated communication with Bootstrap At this time the NiFi URL should be reachable. If not, I would be making sure a firewall or some issue with your " Cooperates Linux Machine" setup is not blocking access to the port on which the NiFi is bound. The latter error you shared: o.a.nifi.controller.StandardFlowService Failed to connect to cluster due to: org.apache.nifi.cluster.protocol.ProtocolException: Failed marshalling 'CONNECTION_REQUEST' protocol message due to: javax.net.ssl.SSLHandshakeException: PKIX path validation failed: java.security.cert.CertPathValidatorException: Path does not chain with any of the trust anchors implies that your NiFi cluster has been setup to start secured over HTTPS. Secured NiFi nodes must have a configured keystore and truststore. The above error is telling you that node's truststore does not contain the TrustedCertEntry(s) needed to trust the client certificate presented by another node (in this case the elected cluster coordinator node). The NiFi configured keystore must meet these requirements: 1. Contain only ONE PrivateKeyEntry. (this private key is typically unique per node) 2. That Private key must support both ClientAuth and ServerAuth EKUs 3. That PrivateKey must contain a SAN entry for the NiFI host on which it is being used. That SAN must mach the hostname of the host. The NiFi configured truststore must contain the complete trust chain for the nodes private keys. You can use the keytool command to inspect the contents of either the keystore or truststore and verify above requirements are satisfied, keytool -v -list -keystore <keystore or truststore> A certificate (private "PrivateKeyEntry" or public "TrustedCertEntry") will have an owner an issuer. The issuer is the authority for that key. A self-signed certificate will have the same owner and issuer. The truststore needs to have a TrustedCertEntry for each public certificate in the trusts chain. For example: Assume you have a PrivateKey in your keystore with: owner: cn=nifi-node1, ou=nifi
Issuer: cn=intermediate-ca, ou=trust Your Truststore would then need to have a TrustedCertEntry for the public key for: owner: cn=intermediate-ca, ou=trust
issuer: cn=root-ca, ou=trust and: owner: cn=root-ca, ou=trust
issuer: cn=root-ca, ou=trust You know you have the complete trust chain once you reach the public cert where owner and issuer have the same value. If you found this response assisted with your query, please take a moment to login and click on "Accept as Solution" below this post. Thank you, Matt
... View more
06-15-2022
06:27 AM
@Vapper Maybe I am not 100% clear on your question here.. When you say new "flow file" I assume you are talking about the flow.xml.gz file? In NiFi, a "FlowFile" is what you see being queued on connections between processor components and consists of FlowFile content and FlowFile Metadata/Attributes. In a NiFi cluster, all nodes must be running the exact same flow.xml.gz. If the dataflows loaded from the flow.xml.gz do not match the node will not be allowed to join the cluster. In the latest releases of NiFi, Nodes joining a cluster will inherit the cluster dataflow if the local dataflow does not match. Once a cluster is established, a cluster dataflow is elected. That becomes the clusters dataflow. Joining a node to that established cluster will require that joining nodes is using same flow.xm.gz as existing cluster and if not inherit the cluster elected dataflow over the existing flow on the joining node. NiFi will not assume that a joining nodes flow.xml.gz is "newer" or "desired" over the elected cluster dataflow and replace the elected cluster dataflow with that new dataflow. If you found this response assisted with your query, please take a moment to login and click on "Accept as Solution" below this post. Thank you, Matt
... View more
06-14-2022
12:03 PM
@yagoaparecidoti NiFi will treat the identity strings "user.bind" and "cn=user.bind,ou=USERS,ou=CLOUDERA,dc=lab,dc=local" as two different users. The identity string being passed to NiFi configured authorizer post successful authentication in yoru current configuration is "user.bind". However, it appears you have configured your initial admin configured in the authorizers.xml configuration file as "cn=user.bind,ou=USERS,ou=CLOUDERA,dc=lab,dc=local" which resulted in admin policies being initially setup in the authorizations.xml and users.xml files as this string. Now within the login-identity-providers.xml file you have your ldap-provider configured which is handling your authentication. One of the configurable properties in that ldap-provider can be configured two ways: <property name="Identity Strategy">USE_USERNAME</property> <property name="Identity Strategy">USE_DN</property> USE_USERNAME setting will pass whatever string was entered in the username login window to the authorizer if authentication was successful. USE_DN setting will pass the users DN (post any matching identity mapping pattern modification) to the authorizer. So you are either using the USE_USERNAME option or you have a identity mapping pattern configured in your nifi.properties file that is matching on the full DN returned by USE_DN and trimming just the "user.bind" from that DN before being passed to the Authorizer. Example:
nifi.security.identity.mapping.pattern.dn=^cn=(.*?),ou=(.*?),ou=(.*?),dc=lab,dc=(.*?)$
nifi.security.identity.mapping.value.dn=$1
nifi.security.identity.mapping.transform.dn=LOWER
Above PATTERN would match "cn=user.bind,ou=USERS,ou=CLOUDERA,dc=lab,dc=local"
and only capture group one ($1) "user.bind" VALUE would be returnedin all LOWERCASE (TRANSFORM). https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#identity-mapping-properties One other important thing to keep in mind here. The file-access-policy and file-user-group-providers in the authorizers.xml file will ONLY build the authorizations.xml and users.xml files if they do NOT already exist. So if you edit the configured initial admin string, what is already configured in those files will not get modified and that configuration change will have not affect. If you remove the existing users.xml and authorizations.xml files before restarting your NiFi if you decide to change your Initial Admin identity string, then on restart a new users.xml and authorizations.xml will be created with your change. If you found this response assisted with your query, please take a moment to login and click on "Accept as Solution" below this post. Thank you, Matt
... View more
06-14-2022
06:11 AM
@Tryfan I think the concept of sending a file to one node is what needs to change here. BY sending to a single node in the NiFi cluster you create a single point of failure. What happens if that one node on your 7 node cluster goes down? You end up with none of the nodes getting that file and outage to your dataflow. A better design is to place this file somewhere that all nodes can pull it from. Maybe it is a commonly mounted file system to all 7 nodes. (getFile processor)? Maybe an external SFTP server (GetSFTP processor)? etc... Then you construct a dataflow where all nodes are retrieving a file independently as needed. Thanks, Matt
... View more
06-14-2022
05:57 AM
1 Kudo
@Techie123 The ExecuteStreamCommand processor is working as designed: https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.16.2/org.apache.nifi.processors.standard.ExecuteStreamCommand/index.html Executes an external command on the contents of a flow file, and creates a new flow file with the results of the command. You could route both the "original" and "output stream " relationships via the same outbound connection to a mergeContent processor which can merge the content from both source FlowFiles into a single FlowFile. https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.16.2/org.apache.nifi.processors.standard.MergeContent/index.html If you found this response assisted with your query, please take a moment to login and click on "Accept as Solution" below this post. Thank you, Matt
... View more
06-10-2022
12:43 PM
1 Kudo
@IslamGamal Keep in mind that all the FlowFile attributes for a FlowFile are held in NiFi's JVM heap memory. Creating large attributes on your FlowFiles can quickly eat up a lot of heap memory and affect JVM performance. Thanks, Matt
... View more
06-10-2022
10:49 AM
@Abhishek27Apple Since you are not seeing anything in the NiFi log files... 1. Have you tried using a different web browser like Firefox? 2. Have you tried opening your browser's Developer tools and inspecting the actual rest-api call that was made when you attempt the various actions that fail from with the NiFi UI? 3. Are you going through a proxy or load balancer (is it configured to use sticky sessions?)? 4. Which Browser and version are you using? 5. Have you tried clearing your browser cache? 6. Does same behavior exist using an incognito window in your browser? 7. What java version is your NiFi using? Thank you, Matt
... View more
06-10-2022
08:28 AM
@Mridul_garg Sharing the complete stack trace(s) from the nifi-app.log maybe be helpful in helping you here. When you say you changed the open file limits, what ddi you change it to? what does the output from "ulimit -a" show. Make sure you run this command as the same user that owns your NiFi process. Thanks, Matt
... View more
06-10-2022
08:08 AM
@Elsaa Couple things I would check first. 1. Make sure you do not have two Success relationship connections stacked on top of each other between the "UpdateAttribute" processor and the "CalculateRecordStats" processor. They processor show 4 in and 12 out which makes be think 4 went to three different success connections. You can double click on a connection line to add a bend point that would allow you to click and drag that bend point to see if there is another connection under it. 2. If above is not the issue, take a look at the provenance data for your 8 generated FlowFiles to see at what point in your dataflow the clones happened. If you found this response assisted with your query, please take a moment to login and click on "Accept as Solution" below this post. Thank you, Matt
... View more
06-10-2022
07:51 AM
@Elsaa Is it a standardized filename structure with a standardized date format? Can you share examples? Thanks, Matt
... View more
06-10-2022
07:49 AM
@Abhishek27Apple Assuming you have NiFi configured to run securely (HTTPS enabled) then the nifi-iser.log should be generated. I'd suggest inspecting the logback.xml to make sure there are no mistakes in the appender or loggers setup for the nifi-users.log. Thanks, Matt
... View more
06-10-2022
07:44 AM
@Abhishek27Apple Something that strikes me as odd in the configuration file authorizers.xml you shared, I don't see the managed-authorizer That provider would look like this and come after the file-user-group-provider and the file-access-policy-provider: <authorizer>
<identifier>managed-authorizer</identifier>
<class>org.apache.nifi.authorization.StandardManagedAuthorizer</class>
<property name="Access Policy Provider">file-access-policy-provider</property>
</authorizer> The nifi.properties file you shared is configured to use this authorizer. However, I would have expected NiFi to fail to start if this authorizer was really missing. Matt
... View more