Member since
07-30-2019
2440
Posts
1284
Kudos Received
689
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
221 | 01-17-2023 12:55 PM | |
83 | 01-12-2023 01:30 PM | |
124 | 01-12-2023 12:52 PM | |
109 | 12-20-2022 12:06 PM | |
334 | 12-16-2022 08:53 AM |
01-23-2023
12:25 PM
@steven-matison That 3 part series on ExecuteScript was written by a very talented different Matt in this community @mburgess.
... View more
01-23-2023
12:21 PM
@myuintelli2021 Logback is not something written by or for Apache NiFi. NiFi's default logback has changed very little for nifi-app.log, nifi-user.log, and nifi-bootstrap.log. All of which by default use the "RollingFileAppender". In the latest releases you may see that some additional appenders are now present; however, it is not uncommon for users to create additional appenders themselves in the logback.xml so that specific loggers can output to user defined log files rather than the defaults. Logback supports numerous rolling policies: https://logback.qos.ch/manual/appenders.html These have existed along as NiFi has been using Logback. Not sure if maybe the issue is specific to the version of Windows or JDK you are using? If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped. Thank you, Matt
... View more
01-17-2023
02:08 PM
@myuintelli2021 I just installed Apache NiFi 1.19.1 on my MAC and only modified the <maxFileSize> from default 100MB to 2 MB. Log rotation is working as expected. My current logback configuration for nifi-app.log: <configuration scan="true" scanPeriod="30 seconds">
<shutdownHook class="ch.qos.logback.core.hook.DelayingShutdownHook" />
<contextListener class="ch.qos.logback.classic.jul.LevelChangePropagator">
<resetJUL>true</resetJUL>
</contextListener>
<appender name="APP_FILE" class="ch.qos.logback.core.rolling.RollingFileAppender">
<file>${org.apache.nifi.bootstrap.config.log.dir}/nifi-app.log</file>
<rollingPolicy class="ch.qos.logback.core.rolling.SizeAndTimeBasedRollingPolicy">
<!--
For daily rollover, use 'app_%d.log'.
For hourly rollover, use 'app_%d{yyyy-MM-dd_HH}.log'.
To GZIP rolled files, replace '.log' with '.log.gz'.
To ZIP rolled files, replace '.log' with '.log.zip'.
-->
<fileNamePattern>${org.apache.nifi.bootstrap.config.log.dir}/nifi-app_%d{yyyy-MM-dd_HH}.%i.log</fileNamePattern>
<maxFileSize>2MB</maxFileSize>
<!-- keep 30 log files worth of history -->
<maxHistory>30</maxHistory>
</rollingPolicy>
<immediateFlush>true</immediateFlush>
<encoder class="ch.qos.logback.classic.encoder.PatternLayoutEncoder">
<pattern>%date %level [%thread] %logger{40} %msg%n</pattern>
</encoder>
</appender> Log directory showing logs rotated based on size (adds .<num> before .log suffix) reaching max: -rw-r--r-- 1 nifiadmin nifi 452B Jan 17 16:39 nifi-user.log
-rw-r--r-- 1 nifiadmin nifi 2.0M Jan 17 16:53 nifi-app_2023-01-17_16.0.log
-rw-r--r-- 1 nifiadmin nifi 2.0M Jan 17 16:58 nifi-app_2023-01-17_16.1.log
drwxr-xr-x 13 nifiadmin nifi 416B Jan 17 16:58 .
-rw-r--r-- 1 nifiadmin nifi 96K Jan 17 16:58 nifi-request.log
-rw-r--r-- 1 nifiadmin nifi 178K Jan 17 16:58 nifi-app.log I don't have access to a Windows environment to test there as it appears that is where you are testing, but it also works as expected in a linux environment I also have. I recommend opening a community question if you are looking for assistance with your setup. A community article comments is not he correct place to troubleshoot configurations or environmental issues. I encourage you to include details when you raise that question to include at least your complete OS version, complete java (JDK) version being used, and complete NiFi version. Thank you, Matt
... View more
01-17-2023
01:23 PM
@srilakshmi Logging does not happen at the process group level. Processors logging is based on the processor class. So there is nothing in the log output produced by a processor within a process group that is going to tell you in which process group that particular processor belongs. That being said, you may be able to prefix every processor's name within the same Process group with some string that identifies the process group. This processor name would generally be included in the the log output produced by the processor. Then you may be able to use logback filters (have not tried this myself) to filter log output based on these unique strings. https://logback.qos.ch/manual/filters.html NiFi bulletins (bulletins are log output to the NiFi UI and have a rolling 5 minute life in the UI) however do include details about the parent Process Group in which the component generating the bulletin resides. You could build a dataflow in yoru NiFi to handle bulletin notification through the use of the SiteToSiteBulletinReportingTask which is used to send bulletin to a destination remote import port on a target NiFi. A dataflow on the target NiFi could be built to parse the received bulletin records by the bulletinGroupName json path property so that all records from same PG are kept together. These 'like' records could then be written out to local filesystem based on the PG name, remote system, used to send email notifications, etc... Example of what a Bulletin sent using the SiteToSiteBulletinReportingTask looks like: {
"objectId" : "541dbd22-aa4b-4a1a-ad58-5d9a0b730e42",
"platform" : "nifi",
"bulletinId" : 2200,
"bulletinCategory" : "Log Message",
"bulletinGroupId" : "7e7ad459-0185-1000-ffff-ffff9e0b1503",
"bulletinGroupName" : "PG2-Bulletin",
"bulletinGroupPath" : "NiFi Flow / Matt's PG / PG2-Bulletin",
"bulletinLevel" : "DEBUG",
"bulletinMessage" : "UpdateAttribute[id=8c5b3806-9c3a-155b-ba15-260075ce9a6f] Updated attributes for StandardFlowFileRecord[uuid=1b0cb23a-75d8-4493-ba82-c6ea5c7d1ce3,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1672661850924-5, container=default, section=5], offset=969194, length=1024],offset=0,name=bulletin-${nextInt()).txt,size=1024]; transferring to 'success'",
"bulletinNodeId" : "e75bf99f-095c-4672-be53-bb5510b3eb5c",
"bulletinSourceId" : "8c5b3806-9c3a-155b-ba15-260075ce9a6f",
"bulletinSourceName" : "PG1-UpdateAttribute",
"bulletinSourceType" : "PROCESSOR",
"bulletinTimestamp" : "2023-01-04T20:38:27.776Z"
} In the above produced bulletin json you see the BulletinGroupName and the BulletinMessage (the actual log output). If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped. Thank you, Matt
... View more
01-17-2023
12:55 PM
@hkh The only downside to this dynamic approach is that the passwords are in plaintext as attributes on the FlowFile. This means that these passwords could be read by users who are authorized to access your NiFi through listing of FlowFiles on a queued connection or through running a provenance query on a processor in that flow and inspecting the returned results. I have no alternative solution to offer, but wanted you to be aware of downside of adding sensitive values to FlowFile attributes. Thanks, Matt
... View more
01-17-2023
05:34 AM
@davehkd No, do not set root node to a path. The root node should be set to same value and "/nifi" is fine. This is the root node that all NiFi instance will use. Lets say you were using the recommended external Zookeeper rather then the internal Zookeeper. You may choose to use that one Zookeeper cluster to support multiple independent NiFi clusters. To prevent ZK from thinking all nodes are part of same cluster, each NiFi cluster would use a different root node value. So for second NiFi cluster you might use "/nifi2". This value has nothing to do with an install path. Thanks, Matt
... View more
01-12-2023
01:30 PM
@SachinMehndirat There is NO replication of data from the four NiFi repositories across all NiFi nodes in a NiFi cluster. Each NiFi node in the cluster is only aware of and only excutes against the FlowFile on that specific node. As such, NiFi nodes can not share a common set of repositories. Each node must have their own repositories and it is important to protect those repositories from data loss (flowfile_repository and content_repository being most important). - flowfile_repository - contain metadata/attributes about FlowFiles actively processing thorugh your NiFi dataflow(s). This includes metadata on location of content of queued FlowFiles. - content_repository - contains content claims that can hold the content for 1 too many FlowFiles actively being processed or temporarily archived post termination at end of dataflow(s) - provenance_repository - contains historical lineage information about FlowFile currently or previously processed through your NiFi dataflows. - database_repository - contains flow configuration history which is a record of changes made via NiFi UI (adding, modifying, deleting, stopping, starting, etc...). Also contain info about users currently authenticated in to the NiFi node. Processors that record cluster wide state would use zookeeper to store and retrieve that stored state needed by all nodes. Processors that use local state will write that state to NiFi locally configured state directory. So in addition to protect the repositories mentioned above from dataloss, you'll also want to make sure local state (unique to each node in the NiFi cluster) directory is also protected. The embedded documentation in NiFi for each component has a section "State management:" that will tell you if that component use local and/or cluster state. You may find some of the info found in the following articles useful: https://community.cloudera.com/t5/Community-Articles/HDF-CFM-NIFI-Best-practices-for-setting-up-a-high/ta-p/244999 https://community.cloudera.com/t5/Community-Articles/Understanding-how-NiFi-s-Content-Repository-Archiving-works/ta-p/249418 https://blogs.apache.org/nifi/entry/load-balancing-across-the-cluster If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped. Thank you, Matt
... View more
01-12-2023
01:08 PM
1 Kudo
@AndreyN Each processor by default uses Timer Driven Scheduling strategy and a run Schedule of 0 secs. This means that each processor is constantly requesting threads from the Max Timer Driven Thread pool and checking for work (work being any FlowFiles on inbound connections or in case of an ingest processor, connecting to that ingest point whether local dir or remote service to check for data). While generally these checks for work take micro seconds or longer depending on processor, NiFi does have a global setting for yielding the processors when the previous run resulted in no FlowFiles processed/produced. To prevent excessive latency this back duration by default is very short (10 ms). To adjust this setting, you can change the following property in the nifi.properties file: nifi.bored.yield.duration found in Core Properties section of admin guide. Keep in mind that this setting impacts all processors. So higher you set the more latency that could exist before new work is found after a run that resulted in no work. You can also selectively adjust the run schedule on select processors. 0 sec run schedule means to run as fast as possible. So as soon as one thread completes, request another (unless thread results in no work wait bored duration before requesting next thread). So if you have flows that are always very light and latency is not a concern, you could set those processors to only get scheduled to execute every 1 sec or longer. If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped. Thank you, Matt
... View more
01-12-2023
12:52 PM
1 Kudo
@Absolute_Z You can use the NiFi rest-api to set the parameter context on a NiFi Process Group (PG) back to "No Parameter Context". Example: curl 'http://<nifi hostname>:<nifi port>/nifi-api/process-groups/<PG UUID>' -X 'PUT' -H 'Content-Type: application/json' --data-raw '{"revision":{"clientId":"a723bd7b-0185-1000-efb1-be534ab7a455","version":1},"disconnectedNodeAcknowledged":false,"component":{"id":"<PG UUID>","name":"<PG name>","comments":"","parameterContext":{"id":null},"flowfileConcurrency":"UNBOUNDED","flowfileOutboundPolicy":"STREAM_WHEN_AVAILABLE","defaultFlowFileExpiration":"0 sec","defaultBackPressureObjectThreshold":"10000","defaultBackPressureDataSizeThreshold":"1 GB"}}' You'll notice in the data passed I have "parameterContext":{"id":null} which will clear all set parameter contexts on the specified Process Group. If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped. Thank you, Matt
... View more
01-09-2023
06:08 AM
@davehkd You are definitely having issue with your embedded zookeeper. The shared exception indicates NIFi is not able to communicate with it. The KeeperLoss exception thrown by the ZK client in NiFi most commonly is seen when the ZK does not have quorum because not enough nodes are part of ZK cluster or not all nodes are able to talk to one another. A New NiFi with no flows to load should start very quickly (never 12+ hours). Yours is unable to connect to the zookeeper that is included with and started as part of the NiFi process startup. 1. I'd start with making sure all of your nodes can resolve the zookeeper hostnames (nificlient1, nificlient2, and nificlient3) to proper reachable IP addresses. If not make sure you add these to your local hosts file on each server so this is possible. 2. I'd check to see if zookeeper server is running on each host and listening on the configured ports. Make sure no other processes are using those ports thus blocking zookeeper from being able to bind to them. 3. Make sure you don't have any firewalls blocking connections to the zk ports. If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped. Thank you, Matt
... View more
01-05-2023
12:09 PM
@RodolfoE Try using the absolute path to the certificate pem file rather than just the filename. If you are executing your curl command from within the directory where your pem file resides, try using "./<pem filename>". Otherwise curl may try looking up yoru pem filename as if it were an alias/nickname it expects find in some NSS database. If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped. Thank you, Matt
... View more
01-05-2023
11:51 AM
@SachinMehndirat The exception comes from java and not NiFi's core code base. The exception is possibly being thrown as a result of a failed mutual TLS handshake. Without the complete stack trace and preceding log output it is difficult to give a detailed response here, so below is based on assumptions from what little has been shared. However, when you see "chain" issues (keeping in mind that output shared is incomplete), that points at missing TrustedCertEntry(s) in the truststore used one side or the other of that TLS handshake. Both the client and server sides of a TLS handshake present their clientAuth or serverAuth certificate info. These PrivateKeyEntry certificates will have an owner and issuer (signing Certificate Authority (CA) for the owner). In order for the receiving sides to trust the certificate presented from the other side, it must trust that CA. This means that within its truststore, it must have a TrustedCertEntry for the CA that signed the PrivateKey presented in the TLS handshake. BUT... it does not end there. The CA may be an intermediate CA meaning that the certificate for that CA was signed by yet another CA. In that case the truststore would also need the TrustedCertEntry for the next CA in that chain of authority. A complete trust "chain" would consist of all CAs from signer of PrivateKey to the root CA (root CA public key will have same Distinguished Name (DN) for both owner and signer of the TrustedCertEntry). So on one side or the other you do not have the complete trustchain in the truststore. If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped. Thank you, Matt
... View more
01-05-2023
11:33 AM
@davehkd Those log lines look like expected output in the nifi-app.log during startup process. Niether is an ERROR. - Are you using the embedded zookeeper or an external ZK (strongly recommended)? The first indicates that ZK has not elected a cluster coordinator yet. This can happen if ZK does not finished coming up yet or does not yet have quorum. ZK required an odd number of hosts (3, 5, etc) to achieve quorum and without quorum will not function. 3 is the recommended number of ZK hosts to support NiFi. Once ZK is up and established quorum, I'd expect that WARn log message to go away. The second info message simply means that this node was unaware of an elected cluster coordinator and has requested to be elected to that role. ZK responded that it had already elected some other node as the cluster coordinator. This node should receive the elected cluster coordinator from ZK and you should then start seeing in the logs your nodes sending heartbeat messages to the elected cluster coordinator (even the elected cluster coordinator when send a heartbeat to itself.). Only the cluster coordinator will log receiving and processing x number of received heartbeats. My guess here is that you may not have given it enough time to full launch. When you start NiFi via "../bin/nifi.sh start", it executes the bootstrap process, the bootstrap process then kicks off the main child process for NiFi. That process you'll see through the nifi-app.log output as it progresses. NiFi is fully up once you see the log line that states NiFi Ui is available at the following URLs. Now that the NiFi node is fully up it attempts to communicate with ZK and establish itself as part of a cluster. Especially with embedded ZK in use, this can be delayed until all nodes are up so that ZK has quorum. So first node to come up may log more lines like above then last node to finish startup. NiFi handles election based on configuration of these two properties in the nifi.properties file: nifi.cluster.flow.election.max.wait.time (default is 5 mins) nifi.cluster.flow.election.max.candidates (No default, but should be set to number of NiFi instances in cluster) So basically, NiFi nodes will wait up to 5 minutes or until the configured number of candidates have connected with ZK before flow election happens and NiFi finishes coming up. Accessing the UI before this happens would result in flow election still in progress. Make sure that the "../conf/<nifi config files>" are all configured same across all nodes with exception of node specific properties like hostnames, keystores, truststores, etc. Hope that after some additional time, your NiFi cluster did finally come up for you. If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped. Thank you, Matt
... View more
01-04-2023
01:23 PM
1 Kudo
@sarithe NiFi component processors are part of pluggable nar in NiFi. They are separate from the core NiFi code. Processors are designed to log their output base on their component processor class. It then becomes the responsibility of the logback to route those log messages to the appropriate appender. There is nothing in the log output produced by a component processor that will inherently identify which parent process group it resides within. But if you were to use a consistent processor naming structure in each of your Process Groups (PG), you may be able to setup some creative filtering in logback based on that naming structure. Bulletins however do include details about the parent Process Group in which the component generating the bulletin resides. You could build a dataflow in yoru NiFi to handle bulletin notification through the use of the SiteToSiteBulletinReportingTask which is used to send bulletin to a destination remote import port on a target NiFi. A dataflow on the target NiFi could be built to parse the received bulletin records by the bulletinGroupName json path property so that all records from same PG are kept together. These 'like' records could then be written out to local filesystem, remote system, used to send email notifications, etc... Example of what a Bulletin sent using the SiteToSiteBulletinReportingTask looks like: {
"objectId" : "541dbd22-aa4b-4a1a-ad58-5d9a0b730e42",
"platform" : "nifi",
"bulletinId" : 2200,
"bulletinCategory" : "Log Message",
"bulletinGroupId" : "7e7ad459-0185-1000-ffff-ffff9e0b1503",
"bulletinGroupName" : "PG2-Bulletin",
"bulletinGroupPath" : "NiFi Flow / Matt's PG / PG2-Bulletin",
"bulletinLevel" : "DEBUG",
"bulletinMessage" : "UpdateAttribute[id=8c5b3806-9c3a-155b-ba15-260075ce9a6f] Updated attributes for StandardFlowFileRecord[uuid=1b0cb23a-75d8-4493-ba82-c6ea5c7d1ce3,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1672661850924-5, container=default, section=5], offset=969194, length=1024],offset=0,name=bulletin-${nextInt()).txt,size=1024]; transferring to 'success'",
"bulletinNodeId" : "e75bf99f-095c-4672-be53-bb5510b3eb5c",
"bulletinSourceId" : "8c5b3806-9c3a-155b-ba15-260075ce9a6f",
"bulletinSourceName" : "PG1-UpdateAttribute",
"bulletinSourceType" : "PROCESSOR",
"bulletinTimestamp" : "2023-01-04T20:38:27.776Z"
} If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped. Thank you, Matt
... View more
01-03-2023
07:46 AM
@doubtguy Is there a consistency in the timing of when it stops tailing the file (after 15 minutes of starting processor for example)? Interesting that it starts tailing again when you cd to the mounted directory. This seems characteristic of a SMB/CIFS mount that has gone idle. If you set up a process to touch that file or cd to that mounted directory every 2 mins or so, does the issue go away? If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped. Thank you, Matt
... View more
01-03-2023
07:13 AM
@davehkd The exception you have shared points at the following property being set to false in the nifi.properties file: nifi.cluster.protocol.is.secure=false NiFi nodes communicate with one another over HTTP when this is set to false. When set to true NiFi nodes with communicate with one another over HTTPS. Since you have this set to false, it is complaining that you do not have a your NiFi configured to with an HTTP port in the following property in the nifi.properties file: nifi.web.http.port Out of the box, Apache NiFi is configured to start securely over https as a standalone NiFi instance using the Single-User authentication and single-user-authorizer providers: https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#single_user_identity_provider The intent of this provider is to provide a means for easily and quickly starting up a secure NiFi to become familiar with or evaluate NiFi. It gives that single user full access to everything and provides no mechanism for setting up and authorizing any additional users. When switching to a NiFi cluster, You'll need to setup proper authentication and authorization providers that support secure NiFi clusters. In a secured NiFi cluster setup, the NiFi nodes will need to authenticate via their certificates over a mutual TLS handshake (unless set to be unsecure as you have setup which I strongly do not recommend). This in turn means that the NiFi cluster nodes will need to have authorizations setup for proxy, data access, and controller access which the single-user-authorizer does not support. Additionally the single user identity-provider by default on NiFi startup creates a random user name and password which is going to be unique per node. This will not work in a cluster setup since actions performed on node 1 will be replicated to nodes 2 - x nodes as the authenticated user of node 1. However, nodes 2 - x will not know anything about that user and thus fail authorization. The single user authentication provider provides a mechanism for you to set a specific username and password which you could make the same on all instance of NiFi. ./bin/nifi.sh set-single-user-credentials <username> <password> My suggestion to you is to first setup a standalone NiFi securely using yoru own configuration for user authentication and user authorization: For user authentication, follow this section of the admin guide: https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#user_authentication The most commonly used method of user authentication used is the ldap-provider: https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#ldap_login_identity_provider For NiFi authorizations, follow this section of the NiFi admin guide: https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#multi-tenant-authorization The most basic managed setup utilizes all of the following authorization providers in below specific order in the authorizers.xml file: https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#fileusergroupprovider https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#fileaccesspolicyprovider https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#standardmanagedauthorizer These are actual in template format in the default authorizers.xml included with NiFi. They are likely commented out. Once you have a a secured standalone NiFi instance working, then I would move on to setting up your NiFi cluster. You'll need to add your NiFi cluster nodes to the authorizers file-user-group-provider and file-access-policy-provider as part of that process which would require you to remove the users.xml and authorizations.xml files generated by those providers so they get recreated to support your initial cluster needed authorizations. These files are only generated by those providers if the do NOT already exist. Config changes in the providers will not trigger new or modified files. I know there is a lot to take in here, but this will set you up in the best possible way for success. If you found that this response helped with your query, please take a moment to login and select "Accept as Solution" below each response the helped you. Matt
... View more
01-03-2023
06:03 AM
@Mostafa12345678 I'd recommend providing more detail in your query here to include things like the version of NiFi being used, the NiFi processor component being used and its configuration, and the full stack trace error from the nifi.app.log. Thanks, Matt
... View more
12-21-2022
12:25 PM
@samrathal 1. What is the purpose of the SplitJson in your dataflow? 2. If you have 1 FlowFile with 1000 records in it, why use SplitJson to split that in to 1000 FlowFiles having 1 record each? Why not just merge the larger FlowFiles with multiple records in it? Or am i missing part of the use case here? --- Can you share a template of flow definition of yoru dataflow? 1. It is not clear to me how you get " X-Total-Count" and how you are adding this FlowFile attribute to every FlowFile. 2. You have configured the "Release Signal Identifier" with a boolean NiFi Expression Language (NEL) that using your example will return "false" until "fragment.count" FlowFile attribute value equals the FlowFile attribute " X-Total-Count" value. 2a. I assume you are writing "X-Total-Count" to every FlowFile coming out of the SplitJson? How are incrementing the "fragment.count" across all FlowFile in the complete 5600 record batch. Each FlowFile that splits into 1000 FlowFiles via splitJson will have fragment.count set to 1 - 1000. So fragment.count would never reach 5600 unless you are handling this count somewhere else in your dataflow. 2b. If a FlowFile where value from "fragment.count" actually equals value from " X-Total-Count" attribute, your "Release Signal Identifier" will resolve to "true". The ""Release Signal Identifier" value (true or false) in your configuration is looked up in the configured "distributed map cache server. So where in your dataflow to you write the release signal to the distributed map cache? (usually handled by a notify processor) I am in no way implying that what you are trying to accomplish can't be done. However, coming up with an end-to-end workable solution requires knowing all the steps in the use case along the way. I would recommend going through the example Wait/Notify linked in my original response to get a better understanding of how wait and notify processors work together. Then maybe you can makes some changes to your existing dataflow implementation. With more use case details (detailed process steps) I could suggest further changes if needed. I really hope this helps you get some traction on your use case here. If you have a contract with Cloudera, you can reach out to your account owner who could help arrange for professional services that can work with your to solution your use cases in to workable NiFi dataflows. If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped. Thank you, Matt
... View more
12-21-2022
06:33 AM
@samrathal The "Wait" processor works in conjunction with the "Notify" processor in NiFi. See below example use case: https://pierrevillard.com/2018/06/27/nifi-workflow-monitoring-wait-notify-pattern-with-split-and-merge/ And simply waiting until you have received all 1000 record record batches will not ensure a downstream MergeContent or MergeRecord processor will merge them all together. 1. Is this a one time execution flow? 2. if not, how do you differentiate between different complete batches (when does new one merge bundle end and another begin?)? 3. Are all 1000 records from each rest-api call going into a single NiFi FlowFile or 1 FlowFile per record? 4. Is there some correlation identifier as a rest of rest-api call that identifies all 1000 Record batch pulls as part of same complete bundle? The details of yoru use case would make it easier for the community to provide suggestions. Assuming You have some Correlation Attribute and you know that max number of records would never exceed some upper limit, you may be able to simply use a well configured MergeRecord processor using min records set higher then you would ever expect, a correlation attribute, and a max bin age (forced bin to merge after x amount of time even if min has not been satisfied) to accomplish the merging of all your records. But keep in mind the answers to questions asked play a role in whether this is possible or needs some additional consideration put in place. If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped. Thank you, Matt
... View more
12-21-2022
06:18 AM
@anton123 I am still not completely clear on your use, but correct me if below is not accurate: 1. You fetch a single large file. 2. That file is unpacked in to many smaller files. 3. Each of these smaller files are converted in to SQL and inserted via the putSQL processor. 4. You then have unrelated downstream processing you don't want to start until all files produced by the unpackContent processor have been successfully processed by the putSQL processor. Correct? If so, the following exampe use case for the NiFi Wait and Notify processor is probably what you are looking to implement for this use case: https://pierrevillard.com/2018/06/27/nifi-workflow-monitoring-wait-notify-pattern-with-split-and-merge/ If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped. Thank you, Matt
... View more
12-21-2022
06:00 AM
@zIfo The TLS exception "unable to find valid certification path to requested target" is telling you that there is a lack of trust in the handshake. This means that the complete trustchain needed to establish trust is missing from the truststore. This is not an issue with the NiFi InvokeHTTP processor. From command line you could try using openssl to get the public certificates for the trusts chain from the target URL. (note that not all endpoints will return complete trust chain. openssl s_client -connect <FQDN>:<port> -showcerts The server hello in response to this command will have one too many public certs. each cert will have format of below example: -----BEGIN CERTIFICATE-----
MIIFYjCCBEqgAwIBAgIQd70NbNs2+RrqIQ/E8FjTDTANBgkqhkiG9w0BAQsFADBX
MQswCQYDVQQGEwJCRTEZMBcGA1UEChMQR2xvYmFsU2lnbiBudi1zYTEQMA4GA1UE
CxMHUm9vdCBDQTEbMBkGA1UEAxMSR2xvYmFsU2lnbiBSb290IENBMB4XDTIwMDYx
OTAwMDA0MloXDTI4MDEyODAwMDA0MlowRzELMAkGA1UEBhMCVVMxIjAgBgNVBAoT
GUdvb2dsZSBUcnVzdCBTZXJ2aWNlcyBMTEMxFDASBgNVBAMTC0dUUyBSb290IFIx
MIICIjANBgkqhkiG9w0BAQEFAAOCAg8AMIICCgKCAgEAthECix7joXebO9y/lD63
ladAPKH9gvl9MgaCcfb2jH/76Nu8ai6Xl6OMS/kr9rH5zoQdsfnFl97vufKj6bwS
iV6nqlKr+CMny6SxnGPb15l+8Ape62im9MZaRw1NEDPjTrETo8gYbEvs/AmQ351k
KSUjB6G00j0uYODP0gmHu81I8E3CwnqIiru6z1kZ1q+PsAewnjHxgsHA3y6mbWwZ
DrXYfiYaRQM9sHmklCitD38m5agI/pboPGiUU+6DOogrFZYJsuB6jC511pzrp1Zk
j5ZPaK49l8KEj8C8QMALXL32h7M1bKwYUH+E4EzNktMg6TO8UpmvMrUpsyUqtEj5
cuHKZPfmghCN6J3Cioj6OGaK/GP5Afl4/Xtcd/p2h/rs37EOeZVXtL0m79YB0esW
CruOC7XFxYpVq9Os6pFLKcwZpDIlTirxZUTQAs6qzkm06p98g7BAe+dDq6dso499
iYH6TKX/1Y7DzkvgtdizjkXPdsDtQCv9Uw+wp9U7DbGKogPeMa3Md+pvez7W35Ei
Eua++tgy/BBjFFFy3l3WFpO9KWgz7zpm7AeKJt8T11dleCfeXkkUAKIAf5qoIbap
sZWwpbkNFhHax2xIPEDgfg1azVY80ZcFuctL7TlLnMQ/0lUTbiSw1nH69MG6zO0b
9f6BQdgAmD06yK56mDcYBZUCAwEAAaOCATgwggE0MA4GA1UdDwEB/wQEAwIBhjAP
BgNVHRMBAf8EBTADAQH/MB0GA1UdDgQWBBTkrysmcRorSCeFL1JmLO/wiRNxPjAf
BgNVHSMEGDAWgBRge2YaRQ2XyolQL30EzTSo//z9SzBgBggrBgEFBQcBAQRUMFIw
JQYIKwYBBQUHMAGGGWh0dHA6Ly9vY3NwLnBraS5nb29nL2dzcjEwKQYIKwYBBQUH
MAKGHWh0dHA6Ly9wa2kuZ29vZy9nc3IxL2dzcjEuY3J0MDIGA1UdHwQrMCkwJ6Al
oCOGIWh0dHA6Ly9jcmwucGtpLmdvb2cvZ3NyMS9nc3IxLmNybDA7BgNVHSAENDAy
MAgGBmeBDAECATAIBgZngQwBAgIwDQYLKwYBBAHWeQIFAwIwDQYLKwYBBAHWeQIF
AwMwDQYJKoZIhvcNAQELBQADggEBADSkHrEoo9C0dhemMXoh6dFSPsjbdBZBiLg9
NR3t5P+T4Vxfq7vqfM/b5A3Ri1fyJm9bvhdGaJQ3b2t6yMAYN/olUazsaL+yyEn9
WprKASOshIArAoyZl+tJaox118fessmXn1hIVw41oeQa1v1vg4Fv74zPl6/AhSrw
9U5pCZEt4Wi4wStz6dTZ/CLANx8LZh1J7QJVj2fhMtfTJr9w4z30Z209fOU0iOMy
+qduBmpvvYuR7hZL6Dupszfnw0Skfths18dG9ZKb59UhvmaSGZRVbNQpsg3BZlvi
d0lIKO2d1xozclOzgjXPYovJJIultzkMu34qQb9Sz/yilrbCgj8=
-----END CERTIFICATE----- You can copy each (including the begin and end certificate lines) and place it in different <name>.pem files which you can then import each <name>.pem in to your existing truststore. A complete trust chain consists of all the public cert from signer of hosts cert to the self signed root CA public cert. If that signer cert is self-signed (meaning owner and signer have same DN), then it is considered the root CA. If they are not the same, then another public cert exists in the chain. A complete trust chain means you have all the public certs from the one that signed the target FQDN all he way to the root CA (owner and issuer the same DN). If the output of the openssl does not contain all the public certs in the trust chain, you'll need to get the missing public certs from the source. That source could be the company hosting the server or it could be a public certificate authority (Digicert for example). You would need to go to to those sources to obtain the CA certs (often published on their website (example: https://www.digicert.com/kb/digicert-root-certificates.htm). Another option that may work for root CAs and some intermediate CAs is using java's cacerts file bundle with every java release which contains the public certs for many public authorities. If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped. Thank you, Matt
... View more
12-20-2022
12:06 PM
@AlexLasecki The issue here is unrelated to the copy and paste action taken. There is a bug in the code where the jsonPath cache is not cleared when the property value is changed after it has been initially set. So the same issue happens even if you do not copy and paste a splitJson processor configured with json path property value. All you need to do is change the json path value after after already having a value set. Original json path property value that is cached still gets used. The following bug jira has been created and work is already in progress to address the issue. https://issues.apache.org/jira/browse/NIFI-10998 For now as a workaround, you'll need to create a new SplitJson processor anytime you want to change the json path property value. If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped. Thank you, Matt
... View more
12-19-2022
11:19 AM
@AlexLasecki I can reproduce in 1.19.1 as well. Let me look more in to this issue. I'll respond again once I determine issue here. Thanks, Matt
... View more
12-19-2022
10:20 AM
@anton123 It may be helpful if you described your use case in detail. Looking at your dataflow above, I am not clear on what you are trying to accomplish. 1. Your updateAttribute processor is changing the filename on every NiFi FlowFile that passes through it to "bu_service_template". Why are you doing this? 2. It makes no sense to me that you are looping the "original" relationship in a connection back on the MergeContent processor. All FlowFiles that go into a merged FlowFile get sent to this relationship. This means this loop would just grow and grow in size. Each Merged FlowFile that gets sent to the "merged" relationship connection would just get larger and larger. 3. Your MergeContent processor configuration is not ideal and in worst case scenario where it actually tries to merge the max configure number of Entries is likely to cause your NiFi to run out of memory since the FlowFile attributes/metadata for every FlowFile allocated to a merge bin is held in heap memory. 4. When trying to merge that many FlowFile, it is important to handle this in a series of mergeContent processors (one after another). Configure first to produce merged FlowFiles of maybe 10,000. Then another that merges yet another 10,000 and finally one last mergeContent that merges 10. The final merged FlowFiles would be 1 billion. 5. I see you are trying to use a correlation attribute in your MergeContent with attribute name "bu_service_template". Where in your dataflow is this attribute getting added to the inbound FlowFiles? 6. Keep in ind that MergeContent will execute as fats as possible and you have min entries set to 1. So it is very possible that at time of execution is sees only one new unbinned FlowFile in inbound connection and adds that to bin. Well now that bin has satisfied the min and thus would be merged with only a single FlowFile in it. So 1 FlowFile goes to merge relationship and 1 FlowFile that made up the merged FlowFile goes to "original" relationship to get merged again. A better configuration would be to set min to 10000 and max to 10000. Then you can also set a "max bin age". The max bin age is used to force a merge even if mins have not been satisfied after x configured amount of time. 3. I am not sure the role you are trying to accomplish with the wait processor after the MergeContent. I hope some of this configuration guidance helps you with yoru use case. If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped. Thank you, Matt
... View more
12-16-2022
11:07 AM
@PradNiFi1236 NiFi is designed to be data agnostic. So content that NiFi ingested is preserved in binary format wrapped in a NIFi FlowFile. It then becomes the responsibility of an individual processor that needs to operate against the content to understand the content type. The mergeContent processor does not care about the content type. This processor numerous merge strategies: - Binary concatenation simply writes the binary data from one FlowFile to the end of the binary data from another. There is no specific handling based on the content type. So combining two PDF files in this manor is going to leave you with unreadable data which explains why even with the larger content type of the merged FlowFile, you still only see the first PDF in the merged binary. - Tar and zip combines multiple pieces of content in to a single tar file. You can then later untar or unzip this to extract the multiple separate pieces of content it contains. So would preserve both independent PDF files. - FlowFile stream is unique to NiFi and merges multiple NiFi FLowFiles (A FlowFile consist of content and FlowFile metadata. This strategy is only used to preserve that NiFi metadata with the content for future access by another NiFi. - Avro expects the content being merged is already of Avro format. This will properly merge Avro type data in to single new Avro content. So the question here is first, how would you accomplish the merging of two PDF outside of NiFi. Then investigate how to accomplish the same within NiFi, if possible. TAR and ZIP will work to get you one output file; however, if your use case is to produce 1 new PDF form 2 original PDFs, mergeContent is not going to do that. If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped. Thank you, Matt
... View more
12-16-2022
08:53 AM
@Mosoa Before upgrading Apache NiFi you should read through the migration guide to include all releases between your current release and the release you are upgrading to. https://cwiki.apache.org/confluence/display/NIFI/Migration+Guidance The Hive nar was removed as of Apache NiFi 1.17.0, so I am guessing your previous version was at least 1.16 or older. As far as downloading and adding additional nars to NiFi, it is very easy. 1. Go to https://search.maven.org/ in your browser. 2. Search for Apache "NiFi Hive nar" 3. A list of artifacts will be shown, you'll need to click on all those you need, but I would start with "nifi-hive-nar" and "nifi-hive-services-api.nar" by clicking on the version number below each. 4. From the nar specific page you will see a "Downloads" option in the upper right corner of the page: 5. When you click on it three option appear. Select "nar". 6. Place the downloaded nar files in to the "lib" directory of your NiFi 1.19.1 installation. You'll notice that this directory already contains nar for other component classes already included with the base download. 7. Make sure ownership and permissions on these new nar files match other nars in the "lib" directory. 8. Start your NiFi 1.19.1 Now you will see the Hive components available in your NiFi UI: If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped. Thank you, Matt
... View more
12-13-2022
07:07 AM
@sathish3389 Routing based on a sensitive value is an unusual use case. I'd love to hear more about this use case. Ultimately the RouteOnAttribute processor expects a boolean NiFi Expression Language Statement. So you want to have a sensitive parameter value that is evaluated against something else (another attribute on the inbound FlowFile) and if true route to a new relationship. Is what you are comparing this sensitive parameter value against also sensitive? If so, how are you protecting it as Attributes on FlowFiles are not sensitive and stored in plaintext. The ability to use Sensitive Parameters in dynamic properties (non password specific component properties) was added via https://issues.apache.org/jira/browse/NIFI-9957 in Apache NiFi 1.17.0. While this change created the foundation for such dynamic Property support for sensitive parameters, individual components need to be updated to utilize this new capability. As you can imagine with well over 300+ components available to NiFi, this is a huge undertaking. So what i see in the apache community are changes based on specific use case requests. I'd recommend creating an Apache NiFi Jira detailing your use case and working with the Apache Community to adopt that use case change to the RouteOnAttribute processor to support dynamic property support for Sensitive parameters. If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped. Thank you, Matt
... View more
12-13-2022
06:35 AM
@MaarufB You must have a lot of logging enabled that you expect multiple 10MB app.log files per minute. Was NiFi ever rolling files? Check your NiFi app.log for any Out of Memory (OOM) exceptions. Does not matter what class is throwing the OOM(s), once the NiFi process is having memory issues, it impacts everything within that service. If this is the case, you'll need to make changes to your dataflow(s) or increase the NiFi heap memory. Secondly, check to make sure you have sufficient file handles for your NiFi process user. For example; - If your NiFi service is owned by the "nifi" user, make sure the open file limit is set to a very large value for this user (999999). A restart of the NiFi service before the change to file handles will be applied. If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped. Thank you, Matt
... View more
12-12-2022
06:01 AM
@Onkar_Gagre The Max Timer Driven Thread pool setting is applied to each node individually. NiFi nodes configured as a cluster ate expected to be running on same hardware configuration. The guidance of 2 to 4 times the number of cores as starting point is based on the cores of a single node in the cluster and not based off the cumulative cores across all NiFi cluster nodes. You can only reduce wait time as you reduce load on the CPU. In most cases, threads given out to most NiFi processors execute for only milliseconds. But some processors operating against the content can take several seconds or much longer depending on function of the processor and/or size of the content. When the CPU is saturated these threads will take even longer to complete as the CPU is giving time to each active thread. Knowing the only 8 threads at a time per node can actually execute concurrently, a thread only gets a short duration of time before giving some to another. The pauses in between are the CPU wait time as thread queued up wait for their turns to execute. So reducing the max Timer Driven Thread count (requires restart to reduction to be applied) would reduce maximum threads sent to CPU concurrently which would reduce CPU wait time. Of course the means less concurrency in NiFi. Sometimes you can reduce CPU through different flow designs, which is a much bigger discussion than can be handle efficiently via the community forums. Other times, your dataflow simply needs more CPU to handle the volumes and rates you are looking to achieve. CPU and Disk I/O are the biggest causes of slowed data processing. If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped. Thank you, Matt
... View more
12-09-2022
02:00 PM
@F_Amini @Green_ Is absolutely correct here. You should be careful when increasing concurrent tasks as just blindly increasing it everywhere can have the opposite effect on throughput. I recommend stopping setting the concurrent tasks back to 1 or maybe 2 on all the processor where you have adjusted away from the default of 1 concurrent task. Then take a look at the processor further downstream in your dataflow where it has a red input connection but black (no backpressure) outbound connections. This processor s @Green_ mentioned is the processor causing all your upstream backlog. Your'll want to monitor your CPU usage as you make small incremental adjustments to this processors concurrent tasks until you see the upstream backlog start to come down. If while monitoring CPU, you see it spike pretty consistently at 100% usage across all your cores, then your dataflow has pretty much reached the max throughput it can handle for yoru specific dataflow design. At this point you need to look at other options like setting up a NiFi cluster where this work load can be spread across multiple servers or designing your datafow differently with different processors to accomplish same use case that may have a lesser impact on CPU (not always a possibility). Thanks, Matt
... View more