Member since
07-30-2019
3406
Posts
1622
Kudos Received
1008
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 311 | 12-17-2025 05:55 AM | |
| 372 | 12-15-2025 01:29 PM | |
| 349 | 12-15-2025 06:50 AM | |
| 339 | 12-05-2025 08:25 AM | |
| 587 | 12-03-2025 10:21 AM |
10-04-2021
12:42 PM
@dansteu Due to size constraints in Apache, the Apache NiFI distribution do not ship with all components to keep the size under the max allowed. The community removes from the default distro those components less commonly used or deprecated by newer components that do the same job better. The ReportLineageToAtlas is one of those less commonly used components that was removed to reduce size, but it can easily be downloaded from Maven and added in to your NiFi 1.14.0 install. https://mvnrepository.com/artifact/org.apache.nifi/nifi-atlas-nar/1.14.0 Look for the "Files" line and download the "nar" file (58 MB for Apache NiFi 1.1.4.0 release). Place this file in the NiFi 1.1.4.0 lib directory with the rest of the included nar files and restart your NiFi instance. This nar will get unpacked to NiFi work directory on startup and then be available for use through the UI. If you found this response assisted with your query, please take a moment to login and click on "Accept as Solution" below this post. Thank you, Matt
... View more
10-04-2021
12:03 PM
@CodeLa You can accomplish this via the ReplaceText processor using a multi line approach to your Java Regular Expression (regex). Search Value: Replacement Value: The downside to this approach is that you need to configure this processor with an Evaluation Mode of "Entire text", Evaluation Mode of "All", and make sure the configured buffer size is large enough to fit the entire text. This in turn means a higher heap memory utilization when this processor is executing against your FlowFile. If you found this response assisted with your query, please take a moment to login and click on "Accept as Solution" below this post. Thank you, Matt
... View more
10-04-2021
11:18 AM
@Phanikondeti Thanks for sharing the log output. Your NiFi is not up, It failed to start because it was unable to bind to the IP and port shown in the logs. That IP address would correlate to what you have set in either of the following properties in the nifi.properties file: (if unsecured) nifi.web.http.host= (if secured) nifi.web.https.host= You could use ifconfig command to see if your server has a network interface with that IP assigned to it. If it was a port issue, I'd expect you to see a log message about port already being in use or that you're trying to launch NiFi as a privileged user and tried to use a port number below 1024. If you found this response assisted with your query, please take a moment to login and click on "Accept as Solution" below this post. Thank you, Matt
... View more
10-04-2021
11:12 AM
@Phanikondeti When NiFi is the bootstrap process starts a child process that may take a little time to fully start depending on size fo flowfile_repository and the size of the flow.xml.gz being loaded. You will want to search the nifi-app.log for the following lines: 2021-10-04 18:05:57,212 INFO [main] org.apache.nifi.web.server.JettyServer NiFi has started. The UI is available at the following URLs:
2021-10-04 18:05:57,212 INFO [main] org.apache.nifi.web.server.JettyServer https://<nifi-hostname or IP>:<nifi port>/nifi
2021-10-04 18:05:57,216 INFO [main] org.apache.nifi.BootstrapListener Successfully initiated communication with Bootstrap Until you see these lines, NiFi is still coming up and the UI will not yet be reachable. If you do see these lines, you will want to make sure that your host where you have launched your browser can reach the hostname/IP logged in the above message. You should also check to see which network interface your NiFi bound to on startup if you have multiple interfaces available. If you found this response assisted with your query, please take a moment to login and click on "Accept as Solution" below this post. Thank you, Matt
... View more
10-04-2021
10:04 AM
@JelenaS Making a bucket public only only control whether an unauthenticated and authorized user can import flows from a bucket on to the NiFi instance. So ONLY make it public if you want anonymous users to be able to use your version controlled flows in that bucket. Users must still be authenticated and authorized in order to commit new flows to a public bucket. As far as the global polices you set up for your "CN=<domainname>.net, OU=NiFi", that looks correct (don't need "write" on buckets), but is only correct if that string matches exactly what is coming from the certificates used on your secured NiFi instance(s) post and identity mapping happening on the NiFi-Registry server. So check your nifi-registry.properties file for any configured Identity Mapping Properties: https://nifi.apache.org/docs/nifi-registry-docs/html/administration-guide.html#identity-mapping-properties For example: nifi.registry.security.identity.mapping.pattern.dn=^CN=(.*?), OU=(.*?)$
nifi.registry.security.identity.mapping.value.dn=$1
nifi.registry.security.identity.mapping.transform.dn=NONE with above and "CN=<domainname>.net, OU=NiFi", the string that would get checked for authorization in NiFi-Registry would be only "<domainname>.net" and thus be the string that would need to be authorized in instead of the full DN. When you are authenticated in to your NiFi instance as your nifi_admin user, what exact string is displayed in the upper right corner of the NiFi UI? Is it "nifi_admin" or "CN=nifi_admin, OU=NiFi" because whatever displayed there is going to be the exact user string that gets proxied to the NiFi-Registry. Also keep in mind that USER/CLIENT strings are case sensitive in both NIFi and NiFi-Registry. Mapping transforms can be used to convert strings to all uppercase (UPPER) or all lowercase (LOWER). Hope this helps, Matt
... View more
10-04-2021
09:48 AM
@Theoo Nice job on your path to solving the authorization issues, but you left out a few pieces - When NiFi cluster nodes or a standalone NiFi instance communicates with a secured NiFi-Registry that communication MUST be authenticated and authorized in the NiFi-Registry side. The established connection between NiFi and NiFi-Registry only supports authentication via a mutual TLS handshake (Client is identified via the certificate shared to the NiFi-Registry from NiFi). Both NiFi and NiFi-Registry have identity mapping properties that can be added to the nifi-properties/nifi-registry.properties file that are used to manipulated the DN that comes from the client certificate. For example a NiFi host certificate with DN of "CN=nifi-node01, OU=NIFI" could be manipulated so the client string is only "nifi-node-01". Users and NiFi nodes/instances to both NiFi and NiFi-Registry are just clients, there is no distinction between the two. What matters is what each client is uniquely authorized to do within each service. Whatever the client string happens to be, The NiFi nodes/instance must be authorized for the following global policies in NiFi-Registry: "Can proxy user requests" (/proxy) with "Read, Write, and Delete" - This allows the NiFi nodes/instance to proxy some request made by the user authenticated in NiFi to perform some authorized request against NiFi-Registry (start version control, commit a new version of a version controlled Process Group (PG), etc.) since the NiFi user is not actually authenticating in to NiFi-Registry from NiFi. This does mean that the NiFi user string must exist as a user in NiFi-Registry and be authorized for the action they are trying perform. "Can Manage Buckets" (/buckets) with "Read" - This policy is needed by the NiFi nodes/instance so that the NiFi background thread that occasionally communicates with NiFi-Registry to see if newer version of a version controlled PG is available or so NiFi can display a list of available buckets). This request is not done on behalf of the user authenticated into NiFi. ---- When it comes to the NiFi user, the policies needed in NiFi-Registry vary based on what you want that user to be able to do through NiFi or directly via the NiFi-Registry UI. In order for a user who is currently authenticated and authorized in to NiFi to interact with NiFi-Registry, that user string would need to be authorized in NiFi-Registry for the following: - A NiFi-Registry admin user would need to create a bucket and authorize the NiFi user on that bucket so it can be used by the NiFi user. - "READ" on the bucket would allow the user to import and existing version controlled flow from Nifi-Registry on the NiFi UI. - "WRITE" on the bucket would allow the user to start version control or change the version of the versioned PG in NiFi. - "Delete" on the bucket would allow a user who can authenticate in to NiFi-Registry to delete flows within that bucket. -------- As far as authentication of users in to NiFi and/or NiFi-Registry, you can create certificates for each fo your users, but the most commonly used method is LDAP/AD based authentication. You can add users in NiFi-Registry's authorizer so that those user string can be associated to authorization policies without those user even being able to authenticate and be authorized directly in to the NiFi-Registry's UI. They simply need to exist for the proxied request that come from NiFi on that user's behalf. Hope this exposes all that is needed in this thread. Thanks, Matt
... View more
09-30-2021
01:39 PM
1 Kudo
@VagnerBelfort From your example, it appears you are looking to modify only the first line of your input file and your modification seems pretty simple. In that case, one possible solution is simply to use the ReplaceText processor to modify that first line to match you new modified structure. Here is a ReplaceText processor configuration I used to accomplish your desired output: Search Value (Java regular Expression): ^"(.*?)":"(.*?)":(.*?)\{ Above contains 3 capture groups to capture the unique parts of your input we want to reuse. Replacement Value: {
"hash":"$1:$2:$3", Make note of the added line return. All characters are literals except the $1, $2, and $3 which get replaced with the string form each of the three capture groups from the Java Regex. Replacement Strategy: Regex Replace Evaluation Mode: Line-by-Line Line-by-Line Evaluation Mode: First-Line If you found this response assisted with your query, please take a moment to login and click on "Accept as Solution" below this post. Thank you, Matt
... View more
09-30-2021
01:17 PM
@Ven5 The individual components are responsible for handling any kind fo timeout related to execution of the components code. The NiFi Connections hold NiFi FlowFiles that the downstream processor component will consume from. Typically the FlowFile being executed upon by the downstream component is not removed from the inbound connection until the execution by that downstream connection is complete and its execution has resulted in the creation of its outbound FlowFile (which may be same as ingest FlowFile depending on what that processor does. This is done to protect against dataloss in the even the NiFi service crashes or shut down while a component is still executing. On NiFi service starting back up, the FlowFile would get loaded back in to the same upstream connection allowing the downstream processor to start over executing on the same FlowFile(s). From you description, it sounds like these custom (not part of Apache NiFi or Cloudera distributions) components are becoming hun while executing against a FlowFile from the upstream connection? When your flow is in this hung state, does the MarkLogic processor show a small number in the upper right corner of the processor indicating it has an active thread? Or are you saying the the MarkLogic processor is not actively executing a thread and is not getting a thread to work on the next queued FlowFile? Did you execute "nifi.sh dump <dump-filename>" to verify there is no MarkLogic processor class threads executing? Is it possible that your entire "Max Timer Driven Thread pool" is consumed by other components on your canvas at times preventing this processor from being able to get a thread to run? If this turns out to not be a thread starvation issue, you may need to reach out to the author of these custom components for suggestions. If you found this response assisted with your query, please take a moment to login and click on "Accept as Solution" below this post. Thank you, Matt
... View more
09-30-2021
06:10 AM
2 Kudos
@DSan You run in to some unique challenges with your specific use case since the properties you want to enable NEL expect Java regular expressions. The reserved characters that would tell NiFi that a NEL statement is being used are special characters in Java Regular expressions. You may want to raise your change/improvement request as a Jira in the Apache NiFi Jira project: https://issues.apache.org/jira/browse/NIFI There may be others in the community that like this idea and have suggestions for working around the challenge I shared above. If you found this response assisted with your query, please take a moment to login and click on "Accept as Solution" below this post. Thank you, Matt
... View more
09-30-2021
05:48 AM
@TRSS_Cloudera Your use case is not completely clear to me. Each Node in a NiFi cluster executes its own copy of the dataflow against its own set fo FlowFiles (FlowFiles are what the NiFI components execute upon). NiFi components can be processors, controller services, reporting tasks, input/output ports, RPG, etc. Each node maintains its own set of repositories. Two of those repositories (flowfile_repository and content_repository) hold the parts that make up a FlowFile. In a NiFi cluster a node will always get elected as the Cluster Coordinator or Primary Node (sometimes one node is elected for both these roles) Which node is elected to either role can change at anytime. Your GenerateFlowFIle processor you have configured to execute on "Primary Node" only will produce FlowFile(s) only on the currently elected primary node. From your description, you dod not cover how your dataflow writes the files to the server that you will then run an ExecuteStreamCommand Python script on. What is the best way to handle producing files that can be accessed by all nodes? Answer: Since each node operates on its own FlowFiles, one node will not have access to FlowFiles on the other nodes. A clearer use case as to why you would want every node processing the same FlowFile might be helpful here. Is there a way to specify the node for a process will be run on? (using “run on primary” is not working as the primary node cycles over the process) Answer: Only processors that are responsible for creating the FlowFile should ever be scheduled to execute on the "Primary Node". Any processor that accepts and inbound connection should always be executing on all nodes. So if Node A is current Primary Node and a FlowFile is produced by a primary node only configured processor, the FlowFile would still be processed downstream in the dataflow even if a primary node change happens. If you found this response assisted with your query, please take a moment to login and click on "Accept as Solution" below this post. Thank you, Matt
... View more