About MattWho

MattWho · ‎05-30-2024

@Vikas-Nifi @ckumar is 100% correct. Only fields explicitly marked as supporting NiFi Expression Language (NEL) can support a NEL expression "${schedule}". I am however curious about your use case as to why you would even being trying to do this. From what you shared you are extracting a cron schedule from the json content of some FlowFiles traversing an EvaluateJsonPath processor. That "schedule" is added on to the NiFi FlowFile as a FlowFile attribute (key=value pair). This would not make that key=value pair accessible to any other NiFi component unless that FlowFile containing the FlowFile attribute was processed by that other component. However, in your shared dataflow you do not mention that EvaluateJsonPath connects to your invokeHTTP processor via an inbound dataflow connection (Keep in mind that even if you did do this, it does not change the fact that the run schedule property does not support NEL). I just wanted to clarify how FlowFile attributes are and can be used. Also keep in mind that the "run schedule" is a scheduler only. The run schedule set on a processor controls when the NiFi controller will schedule the execution of the processors code. It does not mean that they the processor will immediately execute at time of scheduling (It may be delayed on execution waiting for an available execution thread from the thread pool). All scheduled components share a thread pool and NiFi framework will also handle assigning threads to next scheduled component as thread become available. So the NiFi framework needs to know the scheduling for a component when it is started; otherwise, NiFi would never know when to schedule it to execute. Unless a component property has an explicit tool tip that tells you it support NEL, then it does not. For NiFi processor components, you will find that only some processor specific properties within the "PROPERTIES" tab support NEL. This is not only available through property tooltips, but also in the processors documentation. Examples: Even when NEL is supported there is a scope. It may support FlowFile attributes, Variable Registry (going away in NiFi 2.x releases), or both. Thank you, Matt

MattWho · ‎05-29-2024

@Naveen_Sagar The Bearer token is issued by a specific NiFi node for a specific user identity. That Bearer token has a limited life time and can not be used to authenticate a user on any other NiFi node (even one in the same cluster as the original node that provided the bearer token). All rest-api endpoints will require some level of authorization. So simply having a valid bearer token for an authenticated user identity, does not mean that user is authorized to access/interact with every rest-api endpoint. In your case, the user would need "operate the component" or "view the component" and "modify the component" authorizations in order to change the run-status. You should inspect the nifi-user.log on the aaa.com nifi server to see what user identity attempted to change the runs-status on that node and was not authorized. Then verify the necessary authorization is setup for that user identity and try your curl command again. And make sure as @ckumar pointed out that in his curl example that you are using the "-k" flag which allows curl to auto trust the serverAuth certificate presented in the TLS exchange with your secured NiFi. Please help our community thrive. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt

MattWho · ‎05-29-2024

@scoutjohn The article you are using for reference was written back in 2016 before NiFi was changed to starting secure out of the box. It is written entirely around that unsecured NiFi example. You could always unsecure your NiFi and test out S2S capability. That would atleast allow you to test/evaluate the functionality. When NiFi is secure both authentication and authorization must be handled. This includes authentication and authorizations for S2S operations. An out-of-box installation of NiFi utilizes self -generated self-signed certificates to create the keystore and truststore files needed for mutualTLS. It also uses a very basic non production single-user-provider for user authentication and a single-user-authorizer for user/client authorization. These basic providers make it easy to evaluate NiFi, but are not robust enough to support all features. Is this what you are using still or have you created your own keystore and truststore files and setup non single user authentication and authorization providers? To be honest, I always setup production ready NiFi instance and clusters that don't use the auto-generated self-signed certificates and or single user providers. I can't say that I have tried using S2S in such out-of-box environment. So I can't say that the single-user-authorizer supports those needed authorizations. Above being said, I see you set nifi.remote.input.http.enabled=true, but all that property does is allow http transport protocol which means that means that the NiFi would support transferring FlowFiles over http protocol. That does not mean unsecured, it could be http or https depending on the destination URL. The S2S properties in the the NiFi properties need to be modified to support secure S2S by changing nifi.remote.http.secure=true (you did not comment if you made that change or not). 1. Is your S2SProvenanceReportingTask producing any bulletin messages? 2. Are you seeing any not authorized related log lines in the nifi-user.log? 3. What keystore and truststore did you configure in the StandardRestrictedSSLContextService controller service? I'll try to mess around with and out-of-box setup if that is what you are using to see if what you are trying to do is possible in such a non-production ready setup when I have some time. Please help our community thrive. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt

MattWho · ‎05-29-2024

@Dilipkumar I am not sure what you mean by backups. Backups of what? The NiFi-Registry is used to version control Process Groups from one or more NiFi instances. Those version controlled flow definitions include all configurations (minus any sensitive properties values). A version controlled flow definition can be imported to any NiFi instance or cluster that has authorized access to the NiFi-Registry bucket in which the it is stored. NiFi-Registry can be configured to persist the flow definition storage in a local file persistence or in a git repository. Please help our community thrive. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt

MattWho · ‎05-28-2024

@Alexy 100% agree with @ckumar Why is your NiFi producing som much logging? Additional loggers? Increased log levels? Huge FlowFile volume? Why are you not compressing (gz)on rollover to save disk space? Keep in mind that compression will take longer the larger the log file. The performance is not going to change whether you are writing/appending to 100 MB or larger log files. But you do have disk I/O related to amount of logging you are producing. Matt

MattWho · ‎05-28-2024

@scoutjohn The Site-To-Site (S2S) configuration properties configure how your NiFi instance handles both inbound S2S to and outbound S2S connections are handled. It is the receiving instance of NiFi the determines if S2S communication should be secure or not. nifi.remote.input.secure=true nifi.remote.input.socket.port=10000 nifi.remote.input.http.enabled=false First you need to understand how S2S works. The instance of of NiFi with a RemoteProcessGroup (RPG) or a S2S Reporting task is the client side of the connection. When that client component (RPG or S2S reporting task) executes it need to communicate with the target NiFi. That initial communication is always going to be over HTTP(S) to the target NiFi. So if the target NiFi is secured (nifi.web.https.port configured) and the URL provided to RPG or S2S reporting task is "HTTPS" the initial connection is going to be secure. This initial connection is used to fetch S2S details from the target NiFi. Included in those S2S details are numerous bits of information to include: Does target support FlowFile http(s) input transfer? (nifi.remote.input.http.enabled) Does target NiFi support socket based FlowFile transfer? (nifi.remote.input.socket.port) Does target enforce secure communictaions (nifi.remote.input.secure) List of remote inbound and remote output ports the client is authorized to see. How many nodes in the target NiFi cluster. Load on each of those nodes etc. With the setup you shared your NiFi is setup with only the nifi.web.https.port configured meaning that this NiFi can only support https communications from S2S connections. Not sure why you would want to send your data unsecured over your network. Whey not send secure since your NiFi is already secured over https. Now if you were to also configure the nifi.web.http.port (which makes no sense since you would be exposing your NiFi UI unsecured over http as well as secured over https), does it still force nifi.remote.input.secure back to true from false? I have not confgures http and https at same time for a very very long time (only some done rarely when there were different internal and external networks). I could not find any Apache Jiras that stated this is no longer an option, but it is possible that this has changed. But even if possible, i still question using unsecured when your NiFi is already secured. Please help our community thrive. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt

MattWho · ‎05-28-2024

@mohammed_najb Is the ExecuteSQL the first processor in your dataflow or is it being fed by an inbound connection from some upstream processor such as the GenerateTableFetch processor? I only ask since ExecuteSQL processor does not retain and state so it alone would not be the best choice for ingesting from an active table that may be having additional rows added to the DB regularly. As far as the ExecuteSQL, it writes out attributes on the FlowFiles it produces. The "executesql.row.count" will record the number of rows returned by the query OR the number of rows in the specific produced NiFi FlowFile's content when "Max rows per FlowFile" property is configured with a non zero value. When multiple FlowFiles are being produced, you could use an UpdateCounter processor to create a counter and use the NiFi Expression Language "${executesql.row.count}" as the delta. As far as your query about "process fails " is concerned. The ExecuteSQL will execute the SQL query and based on configuration create 1 or more FlowFiles. Also based on configuration it will incrementally release FlowFiles to the downstream connection or release them all at once (default) via OutputBatch Size configuration. Assuming using default, no FlowFiles are output until until query is complete and all FlowFiles are ready fro transfer to the outbound connection. If failure happens prior to the is transfer (system crash, etc.), no FlowFiles are output. On next execution of the ExecuteSQL the query is executed again if no inbound connection. If ExecuteSQL is utilizing and inbound FlowFile from an inbound connection to trigger the execution, processing failure would result in FlowFile routing to failure relationship which you could setup to retry. If system crash, FlowFile remains in inbound connection an simply starts over execution on system restore. Hopefully tis gives you some insight to experiment with. As is the case with many use cases, NiFi often has more then 1 way to build them and multiple processor options. The more detailed you are with yoru use case, the better feedback you may get in the community. Please help our community thrive. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt

MattWho · ‎05-23-2024

@Jagapriyan The exception is telling you that the following properties were not configured in the bootstrap.conf file for your MiNiFi: nifi.minifi.sensitive.props.key= nifi.minifi.sensitive.props.algorithm= Please help our community thrive. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt

MattWho · ‎05-23-2024

@alan18080 The Postgres DB only holds metadata and does not contain the actual datflow in NiFi-registry. The version controlled dataflows are stored in the configured FlowPersistenceProvider. So if you are not preserving the actual flow contents, then the metadata loaded from your postgresSQL will not find it after redeployment. I would also recommend upgrading your NiFi-Registry and NiFi versions. NiFi-Registry is going on 4 years old. Please help our community thrive. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt

MattWho · ‎05-23-2024

@donaldo71 You pasted the exact same question as @Sofia71 here: https://community.cloudera.com/t5/Support-Questions/Problem-with-merge-BIN-MANAGER-FULL/td-p/388211. to which i have already responded with below: Sharing the specific processors used in your dataflow and the configuration of some of them to include your MergeContent processor configuration may help clarify some your specific setup. You are using a ConsumeKafka processor to consume messages (multi-record content) from a Kafka topic. I am assuming that each of these consumed messages only contains two single records so the the SplitRecord processor only produces 2 FlowFiles for every FlowFile it splits? Immediately after SplitRecord processors you have configured the "Round Robin" load -balancing strategy on the outbound connection. This probably going to be your first issue here. Each node in a NiFi cluster runs its own identical copy of the flow against only the FlowFiles present on that one specific node. Each node has not access to or ability read FlowFiles present on other nodes. So if 1 flowFile produced by splitting of a record is on node1 and the other FlowFile is on Node2, the downstream MergeContent is not going to be able to merge them back together. So first question is whether you even need to setup load-balancing on the connection since you are consuming your messages from a Kafka topic. 1. How many nodes in your NiFi Cluster? 2. How many partitions on the Kafka topic from which you are consuming? The consumeKafka processor uses a "Group ID" to identify a consumer group. so Every node in your NiFi Cluster that is running this consumeKafka processor is a member of the same consumer group. So lets assume your source Kafka Topic has 3 partitions and your NiFi cluster has 3 nodes. What would happen here is each node's consumeKafka is assigned to one of those partitions. This means that each node is consuming a uniques set of messages from the topic. So no need to then load balance. Assuming above is not what you are doing, then the proper load-balancing strategy to use would be "Partition by attribute" which use an attribute on the FlowFile to make sure that FlowFiles with the same attribute value get sent to same node. Now on to the MergeContent processor. MergeContent upon execution reads from the inbound connection queue and starts assigning FlowFiles to bins. It does not search the inbound connection for matches. It simply reads in order listed and works its way down the list. First FlowFiles is allocated to a bin, then next FlowFile is if can't be allocated to same bin is placed in second bin, and so on. If a flowFile is allocated to every bin and then the next FlowFiles does not belong to any of those bins, MergeContent force merges the oldest bin to free up a bin for the new FlowFile allocation. There is no way to change how this works as the processor is not designed to parse through all connection FlowFiles looking for matches before allocating to bins. That would not exhibit very good performance characteristics. What is your concern with increasing number of bins? This might be a use case for wait/notify processors. So after you split the record in to two FlowFiles, one FlowFiles is currently routed to an invokeHTTP for further attribute enrichment and the other FlowFile ie routed directly to MergeContent? If so, so this means that the FlowFiles that don't get additional processing will queue up much sooner at MergeContent. But if you add a Notify processor after invokeHTTP processor and the Wait processor in the other FlowFile path before MergeContent you could control the release of FlowFiles to the mergeContent processor. This is just one suggestion you could try, but i would start first by making sure you are handling the distribution of your split FlowFiles correctly. Please help our community grow. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt

Online	Offline
Last Visited	‎05-20-2026 12:12 AM

Member Since	‎07-30-2019 10:41 AM
Last Visited	‎05-20-2026 12:12 AM
Posts	3,470
Kudos received	1637

Cloudera Community

Re: How to invoke a url in nifi which is protected...

Re: Retry impacts scheduler

Re: 503 error while copying/versioning big process...

Re: FetchSMB not fetching all files

Re: Nifi: How to revoke the import and export Temp...

Re: Schedule Invoking HTTP dynamically - Nifi

Re: Trigger RESTAPI to schedule processors from ni...

Re: Fetch Provenance data using SiteToSiteProvenan...

Re: Is there a way to importbackups in nifiregistr...

Re: Nifi Logrotation Policy

Re: Fetch Provenance data using SiteToSiteProvenan...

Re: How to determine if a ExecuteSQL has ingested ...

Re: MiniFi - nifi.sensitive.props.key is not set

Re: Error importing flows from a registry instance...

Re: Problem with merge content in Nifi