Member since
07-30-2019
2243
Posts
1230
Kudos Received
634
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
62 | 05-16-2022 08:31 AM | |
104 | 03-09-2022 11:30 AM | |
176 | 03-09-2022 09:18 AM | |
99 | 03-07-2022 09:06 AM | |
374 | 01-26-2022 06:49 AM |
11-23-2021
05:14 AM
@emmanuel Yes, Apache NiFi support PKCS12 keystores; howevere, JKS is more commonly used format and was making a suggestion to test using JKS to rule out an issue there. Something else you may want to do is enable debug logging for TLS in NiFi-Registry. You can accomplish this by adding an additional line to the bootstrap.conf file: java.arg.<unique num>=-Djavax.net.debug=ssl,handshake Maybe you are having a cipher compatibility issue? Maybe connection is trying to use TLS 1.3 and that needs to be disabled so 1.2 is negotiated? Java's default cacerts file is not used by the NiFi core during Mutual TLS negotiation. Only the configured truststore configured in the nifi.properties and nifi-registry.properties file is used. Hope this helps in your investigation, Matt
... View more
11-22-2021
11:01 AM
@HiSunny The configured "run schedule" on a NiFi processor controls how often the processor is scheduled to execute. The "concurrent tasks" controls the parallel execution of a processor. When a processor is scheduled to execute it will request a thread from the NiFi Max Timer Driven thread pool that will be used to execute the processor code. If that thread is still active upon next scheduled Execution and not all concurrent tasks are in use yet, the processor can request another thread to execute in parallel. When it comes to variables, If the processor property support NiFi Expression Language (NEL) with an evaluation using FlowFile attributes, then each FlowFile via FlowFile attributes can provide unique input to the execution. You can see if a property supports Expression language by floating your cursor of the "?" icon next to each property name. You'll want to make sure the scope of NEL support supports FlowFile attributes. If you found this response assisted with your query, please take a moment to login and click on "Accept as Solution" below this post. Thank you, Matt
... View more
11-22-2021
09:19 AM
@emmanuel Does your truststore on each NiFi host contains a separate TrustedCertEntry for each of your three CA certificates in your trust chain? Your openssl command output does not list your root or intermediate CAs: Acceptable client certificate CA names
/C=FR/O=SAFRAN/OU=SAFRAN SA/OU=0002 562082909/CN=nifi-node3
/C=FR/O=SAFRAN/OU=SAFRAN SA/OU=0002 562082909/CN=niif-node2
/C=FR/O=SAFRAN/OU=SAFRAN SA/OU=0002 562082909/CN=nifi-node1
/C=FR/OU=SAFRAN SA/OU=0002 562082909/O=SAFRAN/CN=Safran Nifi Admin
/C=FR/OU=SAFRAN SA/OU=0002 562082909/O=SAFRAN/CN=localhost Have you tried converting your PKCS12 keystore to a JKS keystore? Matt
... View more
11-22-2021
09:04 AM
@Venkikancharla You are absolutely correct that CFM 2.1.2 is also impacted by this. While CFM 2.1.2 is based off Apache NiFi 1.13.2, it includes many bug fixes that eventually went in to Apache NiFi 1.14. One of those changes happened to be NIFI-8723. Full list of changes made on top of Apache NiFi 1.13.2 in CFM 2.1.2 can be found here: https://docs.cloudera.com/cfm/2.1.2/release-notes/topics/cfm-fixed-issues.html CFM 2.1.3 will include https://issues.apache.org/jira/browse/NIFI-8938 Thank you, Matt
... View more
11-22-2021
08:54 AM
@Yemre If you have a support contract with Cloudera, I'd recommend opening a support case to assist with your issue here. Possible causes: 1. Unsuccessful Mutual TLS handshake with NiFi-Registry from the NiFi hosts resulting in NiFi node connection only being 1-way TLS connection and node treated as an "anonymous" user. Anonymous users would could not proxy user requests and can not see anything except public buckets. --- Caused by missing complete trust chain on one or both sides of connection. Truststore in NiFi-Registry contains complete trust chain for NiFi hosts keystore PrivateKeyEntry. --- Caused by PrivateKeyEntry not meeting minimum requirements (missing SAN with NiFi hostname, missing EKU of clientAuth, and/or using wildcards are the most common) 2. NiFi-registry is configured with an identity mapping pattern in the nifi-registry.properties file that is matching on the DN from the the NiFi's client certificate presented in the mutual TLS handshake. The Identity mapping value and transform is then being applied which alters the actual client string which must be then authorized for Proxy and buckets policies. Hope this helps you, Matt
... View more
11-22-2021
05:12 AM
@Ankit13 I would still use Cron scheduling on the PutFile processor, but rather than just having it run once at say hour 7, I'd schedule it to run every second starting at hour 7. That may it starts putting files at hour 7 and continues to put files all the way until 07:59:59. Then it stops executing until the next day. http://www.quartz-scheduler.org/documentation/quartz-2.3.0/tutorials/crontrigger.html Hope this helps, Matt
... View more
11-18-2021
06:45 AM
@prova Based on timestamp shared, the source is RFC3164 syslog messages in which the timestamp does not include a year. The SyslogReader supports both RFC3164 and RFC5424 syslog messages, but uses a generic syslog schema applied against the source data: {
"type" : "record",
"name" : "nifiRecord",
"namespace" : "org.apache.nifi",
"fields" : [ {
"name" : "priority",
"type" : [ "null", "string" ]
}, {
"name" : "severity",
"type" : [ "null", "string" ]
}, {
"name" : "facility",
"type" : [ "null", "string" ]
}, {
"name" : "version",
"type" : [ "null", "string" ]
}, {
"name" : "timestamp",
"type" : [ "null", "string" ]
}, {
"name" : "hostname",
"type" : [ "null", "string" ]
}, {
"name" : "body",
"type" : [ "null", "string" ]
} ]
} You can see that timestamp is treated as a string. When it comes to reformatting the customer is looking for, where is NiFi expected to extract the year from since int is not in the syslog message? Since schema treats the timestamp as a string, it can't be treated like a timestamp type within the syslog for reformatting.This is possible with RFC5424 formatted source syslog messages. This is not to say that you could not manipulate this date string via some downstream processor, but would still need to figure out where you are going to get the year from. NiFi can't assume that RFC3164 formatted syslog message was produced in same year that NiFi is parsing it. This becomes hard to handle evening via some downstream processor at end of year where NiFi servers may already be in 2022 for example but received RFC3164 syslog messages were produced in 2021. RFC3164 was absolute when RFC5424 was introduced. RFC3164 syslog messages are produced by older systems and the options here are limited. If you found this response assisted with your query, please take a moment to login and click on "Accept as Solution" below this post. Thank you, Matt
... View more
11-16-2021
06:25 AM
@sandip87 This statement is not clear to me: We had combined them to form one certificate and then follow steps in the NiFi documentation. Sounds to me like your trusts chain has an intermediate and root CAs in it. That means your truststore must have two trustedCert Entries in it. One for intermediate CA and other for the Root CA. It sounds like you only have the root CA in your truststore. If you found this response assisted with your query, please take a moment to login and click on "Accept as Solution" below this post. Thank you, Matt
... View more
11-16-2021
06:20 AM
@Yemre Authorizing your user is not enough. The NiFi nodes themselves need to be able to successfully authenticate via a mutual TLS handshake with the target NiFi-Registry. Those nodes then need to be authorized to read all buckets and given read/write to proxy user requests. When a User authenticates in to NiFi, that user entity is authorized to perfrom actions based on authorizations in NiFi. When it comes to NiFi then talking to NiFi-Registry, The NiFi node is proxying request to the NiFi-Registry on behalf of the user authenticated into NiFi. Also background threads in NiFi just like the NiFi processors added to the canvas are not executing as the user authenticated in to NiFi. So in the background NiFi connects to NiFi-Registry to check on current version controlled process groups to see of newer versions exist. While you are granting your NiFi nodes the ability to read all buckets, the NiFi users should be given read and write authorizations to the specific buckets that that user is going to sue to version control their Process Group. @Yemre The ability to dynamically fetch secrets/passwords form an external source is not something that exists currently. Doing so would require modification with the every component class that uses sensitive properties. There is some progress in this path however: https://issues.apache.org/jira/browse/NIFI-5481 This new feature handles pulling secrets from an external vault, but is a NiFi core level feature and does not extend in to individual flow component level. I recommend raising an Apache NiFi Jira with your specific request. https://issues.apache.org/jira/projects/NIFI/ If you found this response assisted with your query, please take a moment to login and click on "Accept as Solution" below this post. Thank you, Matt
... View more
11-16-2021
06:13 AM
@Yemre The ability to dynamically fetch secrets/passwords form an external source is not something that exists currently. Doing so would require modification with the every component class that uses sensitive properties. There is some progress in this path however: https://issues.apache.org/jira/browse/NIFI-5481 This new feature handles pulling secrets from an external vault, but is a NiFi core level feature and does not extend in to individual flow component level. I recommend raising an Apache NiFi Jira with your specific request. https://issues.apache.org/jira/projects/NIFI/ If you found this response assisted with your query, please take a moment to login and click on "Accept as Solution" below this post. Thank you, Matt
... View more
11-16-2021
05:56 AM
@emmanuel Can you share the verbose output for your NiFi keystore: keytool -v -list -keystore <nifi keystore> Does that output align with the certificate requirements for NiFi? https://docs.cloudera.com/cfm/2.1.2/cfm-security/topics/cfm-security-tls-certificate-requirements-recommendations.html What version of NiFi are your running? What version of Java is your NiFi using? If you found this response assisted with your query, please take a moment to login and click on "Accept as Solution" below this post. Thank you, Matt
... View more
11-08-2021
06:04 AM
@sandip87 The following properties within the nifi.properties file will tell you where your NiFi's keystore and truststore files are located: 1. nifi.security.keystore 2. nifi.security.truststore You can use the java keytool command to see the verbose details of the content of these two keystores: <path to java JDK>/bin/keytool -v -list -keystore <keystore/truststore> Once you inspect the these to make sure the contents are good, then you need to make sure you can successfully authenticate your user in to your NiFi. By default once NIFi is secured, the only method to authenticate a user/client is via a mutual TLS handshake which means your user needs to have a certificate loaded in the browser. Optionally you can add additional user authentication methods if you want. https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#user_authentication
... View more
11-02-2021
09:55 AM
@Sridhara47 A status code 302 means there was a redirect in the response. I'd suggest inspecting the NiFi app.log for this exception and take a look at the full stack trace if it exists to see what the issue may be. Hope this helps, Matt
... View more
11-02-2021
09:38 AM
@sandip87 Without the detailed output of your NiFi keystore and truststore and the of your client certificate you are using to authenticate yourself to NiFi, it would be difficult to say exactly where your issue is. I am leaning towards and issue with your company issued certificates because you stated the this same NiFi worked fine when using certificates and keystores generated by NiFi's TLS toolkit. For NiFi's keystore (configured in nifi.properties file), make sure the following are correct: 1. keystore contains 1 and only 1 PrivateKeyEntry 2. Make sure the PrivateKeyEntry ExtendedKeyUsage (EKU) contains clientAuth and serverAuth 3. Make sure that the PrivateKeyEntry contains a SAN entry that matches the hostname of the host where NiFi is running. For the user certificate loaded in the browser being used to authenticate with this NiFi: 1. verify certificate issuer. Is it an intermediate CA or the root CA? 2. Verify the NiFi's truststore.jks (configured in nifi.properties file) contains aTrustedCertEntry for the complete trust chain that goes with your certificate and the certificate found in the keystore.jks. A complete trust chain means that the truststore has the public keys for the issuer of each of the above certificates and if that issuer is and intermediate CA, you also have the public certificate for the CA that signed that intermediate CA in the truststore. You'll know when you have reached the root CA when the TrustedCertEntry has the same DN for both owner and issuer. Your browser must also contain the complete trusts chain for the certificates issued to your NiFi nodes. Once all the above is verified, clear your browser cache and site cookies. If you still have same issue and you are using Chrome browser, try typing "thisisunsafe" (which tells chrome to skip certificate verification on the certificate presented from the NiFi instance) while the NiFi chrome tab is in focus. If this works and allows you to proceed, this again points at a trust issue between your corporately issued certificate and your browser. Go back and verify structure/content of the NiFi keystore again. If you found this response assisted with your query, please take a moment to login and click on "Accept as Solution" below this post. Thank you, Matt
... View more
11-02-2021
08:23 AM
@AnnaBea Let me make sure I am clear on your ask here: 1. You have successfully split your source file in to 3 parts (header line, body line(s), and footer line). 2. You have successfully modified all three split files as needed. 3. You are having issues re-assembling the three split files back in to one file in order of header, body, footer using MergeRecord processor? With this particular dataflow design, the MergeRecord processor is not likely what you want to use. You probably want to be using the MergeContent processor instead with a "Merge Strategy" of "Defragment". But to get these three source FlowFiles merged in a specific order would require some additional work in your upstream flow. In order to use "Defragment" your three source FlowFiles all would need o have these FlowFile Attributes: fragment.identifier All split FlowFiles produced from the same parent FlowFile will have the same randomly generated UUID added for this attribute fragment.index A one-up number that indicates the ordering of the split FlowFiles that were created from a single parent FlowFile fragment.count The number of split FlowFiles generated from the parent FlowFile 1. Add one UpdateAttribute processor before your RouteText and configure it to create the "fragement.identifier" attribute with a value of " ${UUID()}" and another Attribute "Fragment.count" with a value of "3". Each FlowFIle produced by RouteText should then have these two attribute set on it. 2. Then add one UpdateAttribute processor to each of teh 3 flow paths to set the "fragment.index" attribute uniquely per each dataflow path. value=1 for header, value=2 for body, and value=3 for footer. 3. Now the MergeContent will have what it needs to bin these three files by the UUID and merge them in the proper order. There are often times many ways to solve the same use case using NiFi components. Some design choices are better than others and use less resources to accomplish the end goal. While above is one solution, there are others I am sure. Cloudera's professional services is a great resource that can help with use case designs. If you found this response assisted with your query, please take a moment to login and click on "Accept as Solution" below this post. Thank you, Matt
... View more
11-02-2021
06:54 AM
@Yemre The following response you see in the NiFi UI after supplying a username and password in the tells you that the issue happened during the user authentication process: "Unable to validate the supplied credentials. Please contact the system administrator." NiFi has not even tried to do any authorization yet, so your authorizers.xml setup has not come in to the equation yet. Unfortunately, the error produced by the openldap client is rather generic and could mean any of the following could be the issue: 1. incorrect ldap/AD manager DN 2. Incorrect ldap/AD manager password 3. Incorrect username 4. Incorrect user password 5. Incorrect user search filter in the login-identity-providers.xml file In your case it looks like number 5 may be your issue: The ldap-provider expects that the username typed in the login window is passed via the "User Search Filter" so that the entered user's credentials can be verified. I noticed you are using full DNs to login with which is extremely rare. The more common approach here is to configure your ldap-provider with "Identity strategy" of "USE_USERNAME" instead of "USE_DN". This means upon successful user authentication, it is the user string entered in the login window that is used to authorize your user instead of the user's full DN. This means your initial admin string should match your username as you would type it in at the login prompt. In order to pass the entered string at the login prompt to the ldap-provider, your "User Search Filter" would need to look something like this: <property name="User Search Filter">(cn={0})</property>
or
<property name="User Search Filter">(sAMAccountName={0})</property> You should inspect your user ldap/AD entry to see which attribute in your ldap entry contain your username that you type in the login prompt. The user entered username at login is substituted in place of "{0}" in the User Search Filter. When you change the initial admin user string from the full DN to just the username, you would need to remove the old authorizations.xml (NOT the authoirizers.xml) file that was built originally with the full DN by the file-access-policy-provider in your authorizers.xml. The authorizatiions.xml file is only seeded via the file-access-policy-provider if the file does not already exist. Once it exist all future edits to content of this file is handled via changes made from within the NiFi UI. If you found this response assisted with your query, please take a moment to login and click on "Accept as Solution" below this post. Thank you, Matt
... View more
11-02-2021
05:48 AM
@Ankit13 My recommendation would be to only automate the enabling/disabling and starting/stopping of the NiFi processor component that is ingesting the data in to your NiFi dataflow and leave all downstream processors always running, so that any data that is ingested to your dataflow has every opportunity to be processed through your dataflow to the end. When a "running" processor is schedule to execute, but has no FlowFiles queued in its inbound connection(s), it is pauses instead of running immediately over and over again to prevent excessive CPU usage, so it is safe to leave these downstream components running all the time. Thank you, Matt
... View more
10-27-2021
02:23 PM
@galvinpaul1718 I 'd suggest verifying your download was good. Then remove the nifi-registry work directory before restarting. The work directory is rebuilt from the nifi-registry lib dir contents. Make sure you did not run out of disk space. If you found this response assisted with your query, please take a moment to login and click on "Accept as Solution" below this post. Thank you, Matt
... View more
10-27-2021
02:20 PM
@Apoo The EvaluateJsonPath processor dynamic properties do not support NiFi Expression language, so being able to pass dynamic strings to these dynamic properties from FlowFile attributes is not possible. The dynamic properties only support NiFi parameters. You may want to raise an Apache NiFi jira requesting adding NiFi EL support to these dynamic properties or even contribute the the open source code if you so choose. Thank you, Matt
... View more
10-27-2021
02:05 PM
@AhmedAlghwinem You are correct that this typically means you are missing some authorization for the currently authenticated user. To help you with this issue, I would need to know a lot more about your NiFi-Registry setup configuration. 1. nifi-registry.properties file would tell me which method of user authentication you have setup, any identity.mappings you have setup, and which authorizer you are using. 2. identity.providers.xml file tells me how the login provider if used specified in the above nifi-registry.properties file is configured. 3. authorizers.xml file tells you how the configured authorizer specified in the above nifi-registry.properties file is configured and what user-group-providers are being used. 4. Depending on configurations used in authorizers.xml, you may have a users.xml and authorizations.xml file generated as well or you may be using an external authorizer like Ranger. 5. I would need to know your user string (case sensitive) that is displayed in the upper right corner of the NiFi-Registry UI after you login/authenticate into nifi-registry, so that it can be checked against the configured policies to see what your user is missing. The policies used by NiFi-Registry are covered in the admin guide here: https://nifi.apache.org/docs/nifi-registry-docs/html/administration-guide.html#access-policies You will want to look at the "Special Privilege Policies" which include what would be needed by an admin user to create new buckets. Providing the above details in a Cloudera Support ticket provided you have a support subscription with Cloudera would allow support to quickly and easily assist you with this issue. If you found this response assisted with your query, please take a moment to login and click on "Accept as Solution" below this post. Thank you, Matt
... View more
10-20-2021
11:10 AM
@RB764 Your EvaluateJsonPath processor configuration is good. This processor evaluates the json path expressions against the content of the inbound FlowFile and then with "Destination"set to "flowfile-attribute", it will create a new attribute for each dynamic property added to the processor with the value that results from the JsonPath. Your issue here is that your inbound FlowFile has no content for the EvaluateJsonPath processor to run the json path against. I see that in your screenshot of the GenerateFlowFile processor you have added a new dynamic property "value" with a value of "{"Country":"Austria","Capital":"Vienna"}". Dynamic properties become FlowFile attributes themselves on the FlowFile produced and not content. If you want to specify specific content via GenerateFlowFIle processor, you need to use the "Custom Text" property to do so: If you found this response assisted with your query, please take a moment to login and click on "Accept as Solution" below this post. Thank you, Matt
... View more
10-20-2021
10:53 AM
@Apoo Not sure if this is the best solution, but you could use a combination of EvaluateJsonPath and ReplaceText to convert you sample source in to you sample output. EvaluateJsonPath processor: 'new"dynamic property (can use any property name) = $.data[*] this would result in this output based on your example: [{"timestamp_start":0,"timestamp_stop":0}] So we can then use the replaceText to trim off the leading "[" and trailing "]": Search Value = (^\[)|(\]$) Then you have you desired output of: {"timestamp_start":0,"timestamp_stop":0} If you found this response assisted with your query, please take a moment to login and click on "Accept as Solution" below this post. Thank you, Matt
... View more
10-19-2021
02:04 PM
@AA24 The easiest way to accomplish this is to use the PutDistributedMapCache processor in one flow to write the attributes values you want to share to a cache server and on your other flow use the FetchDistributedMapCache processor to retrieve those cached attributes and add them to your other FlowFiles that need them. Another option is to use the MergeContent processor. On flow one where it looks like you are extracting your session_id and job_id you would use the ModfiyBytes processor to zero out the content leaving you with a FlowFile that only has attributes and then use MergeContent to combine this FlowFile with the FlowFile in your second flow. In the MergeContent processor you would configure "Attribute Strategy" to use "Keep All Unique Attributes". If you found this response assisted with your query, please take a moment to login and click on "Accept as Solution" below this post. Thank you, Matt
... View more
10-19-2021
01:55 PM
1 Kudo
@TRSS_Cloudera It i snot clear to me how you have designed your dataflow to remove all files from source SFTP server except newest file? Assuming state was not an issue (since you said you flow works if you manually clear state), how do you have your flow built? There exists a GetSFTP processor that does not maintain state. So you could have your flow that uses the listSFTP and FetchSFTP to always get the newest "history" file and record that that latest "history" files last modified timestamp in something like a distributedMapCache server. Then have your GetFile run once a day using the "Cron driven" scheduling strategy to get all files (Delete Original= false)in that directory (would get latest history file also) and then get the current stored last modified time from the map cache and then via a RouteOnAttribute send any to FlowFiles where last modified stored is newer then what is on files retrieved by GetFile and finally send to a processor to remove them from source SFTP processor. While above would work in an "ideal" world. You would run in to issues when their was an interruption in the running dataflow causing multiple new files to get listed by the listSFTP processor because you would not know which one end up having its last modified timestamp stored in distributedMapCache. But in such a case the worst case if you have a couple files left lingering until the next run results in just one history file being listed and it goes back to expected. Otherwise, there are script base processor you could use to build you own scripted handling here. To be honest it seems like wasted IO to have NiFi consume these files int NiFi just to auto-terminate them when you could use an ExecuteStreamCommand processor to invoke a script that connects to your SFTP server and simply removes what you do not want without needing to pull anything across the network or write file content to NiFi that you don't need Hopefully this gives you some options to think about. If you found this response assisted with your query, please take a moment to login and click on "Accept as Solution" below this post. Thank you, Matt
... View more
10-19-2021
01:18 PM
@AA24 NiFi was designed as an always on type of dataflow design. As such the NiFi processor components support "Timer Driven" and "Cron Driven" Scheduling Strategy types. That being said, the ability to tell a processor to "Run Once" exists within NiFi. You could manually do from within the UI by right clicking on the NiFi processor component and selecting "run once" from the pop-up context menu. The next thing to keep in mind is that anything that you can do via the UI, you can also do via a curl command. So it is possible to build a dataflow that could trigger the "run once" api call against the processor you want to fetch from the appropriate DB. You can not execute "run once" against a PG nor would I recommend doing so. You want to only trigger the file responsible for ingesting your data and leave all the other processor running all the time so they process whatever data they have queued at anytime. First you to create your trigger flow, so you could have a getFile to consume the trigger file and use maybe a RouteOnContent processor to send the FlowFile to either an InvokeHTTP configured to invoke run-once on your Oracle configured processor or an invokeHTTP configured to invoke run-once on your MySQL configured processor. Using your browser's developer tools is an easy way to capture the rest-api calls that are made when you manually perform them the action via the UI. If you found this response assisted with your query, please take a moment to login and click on "Accept as Solution" below this post. Thank you, Matt
... View more
10-19-2021
12:04 PM
@DayDream The ExecuteStreamCommand processor executes a system level command and not something native within NiFi, so its impact on CPU is completely dependent on what the command being called is doing. You mention that the ExecuteStreamCommand is just executing a CP command and that issue happens when you are dealing with a large file. The first thing I would be looking in to is disk I/O of the source and destination directory location where the file is being copied from and copied to. You also mention that the PutFile is writing out a large FlowFile to disk. This means that the processors is reading FlowFile content from the NiFi content_repository and then writing it to some target folder location. I would once again look at the disk I/O of both locations when this is happening. The CPU usage may be high simply because these threads are running a long time waiting on disk I/O. NiFi uses CPU for its core level functions and then you configure an additional thread pool that is used by the NiFi components you add to the NiFi canvas. This resource pool is configured via NiFi UI --> Global Menu (upper right corner of UI) --> Controller Settings: The "Event Driven" thread pool is experimental and deprecated and is used by processors configured to use the event driven scheduling strategy. Stay away from this scheduling strategy. The "Timer Driven" thread pool is used by controller services, reporting tasks, processors, etc... The Processors will use it when configured to use the "Timer Driven" or "Cron driven" scheduling strategies. This pool is what is available for the NiFi controller to hand out to all processors requesting time to execute. Setting this value to an arbitrarily high value will simply lead to many NiFi components getting threads to execute but then spending excessive time in CPU wait as the time on the limited cores is time sliced across all active threads. The general rule of thumb here is to set the pool to 2 to 4 times the number of available core on a single NiFi host/node. So for your 8 core server, you would want this between 16 and 32. This does not mean you can't set this higher, but should only do this in smaller increments while monitoring CPU usage over extended period of time. If you have 5 nodes, this setting is per node so you would have a thread pool of 16 - 32 on each NiFi host/node. Another thing you may want to start looking at is the GC stats for your JVM. Is GC (young and old) running very often? Is it taking a long tome to run? All GC is a stop-the-world event, so the JVM simply is paused while this is going on which can also impact how long a thread is "running". You can get some interesting details about your running NiFi using the built in NiFi diagnostics tool. <path to NiFi>/bin/nifi.sh diagnostics --verbose <path/filename where output should be written> For a NiFi node to remain connected to it must be successful at sending a heartbeat to the elected cluster coordinator at least 1 out of 8 scheduled heartbeat intervals. Let's say the heartbeat interval is configured in the nifi.properties file for 5 secs, then the elected CC must successfully process at least 1 heartbeat every 40 secs or that node would get disconnected for lack of heartbeat. The node would initiate a reconnection once a heartbeat is received after having been disconnected for above reason. Configuring a larger heartbeat interval will help avoid this disconnect/reconnect by allowing from time before heartbeat is considered lost. This would allow more time if the node is going through a long GC pause or the CPU is so saturated it can't get a thread to create a heartbeat. I also recommend reading through this community article: https://community.cloudera.com/t5/Community-Articles/HDF-CFM-NIFI-Best-practices-for-setting-up-a-high/ta-p/244999 If you found this response assisted with your query, please take a moment to login and click on "Accept as Solution" below this post. Thank you, Matt
... View more
10-19-2021
11:32 AM
@vikrant_kumar24 The ExecuteScript processor has been around for over 6 years as part of Apache NiFi. It has had many improvements and bug fixes over those years just like many other well used components. I'd be reluctant from calling it "experimental" any longer regardless of what the embedded Apache NiFi docs say. The only thing to note here is that the ExecuteScript processor does not really execute the "Python" script engine. It is executing "Jython" instead which is a Java implementation of Python. Jython is not 100% compatible with Python, so you must test you script thoroughly. Thanks, Matt
... View more
10-12-2021
07:24 AM
1 Kudo
@CodeLa Giving detailed responses on such a use case would be very difficult in the community. This would take considerable effort and time and would require you to provide a lot more detail to include sample source json files, schemas, etc... Cloudera offers professional services for its customers to help them with their use case solutions. If you have a support contract with Cloudera, please reach out to your account owner about this service. At a very high level, I would suggest you take a look at the PutDatabaseRecord processor and perhaps configure it to use one of the json readers: JsonPathReader JsonTreeReader The processor would also need a DBCPConnectionPool for connecting to your MySQL DB. If you found this response assisted with your query, please take a moment to login and click on "Accept as Solution" below this post. Thank you, Matt
... View more
10-08-2021
11:30 AM
1 Kudo
@Ankit13 The PutFile is going to execute base on its configured run schedule (Timer Driven Execution) or cron schedule (Cron Driven Execution). If you are adding an attribute to the FlowFile that you are using to evaluate a boolean true or false, the best approach is to add a RouteOnAttribute processor between your FetchFile and PutFile processors to redirect those FlowFiles where your condition does not resolve to "true". In this way you selectively decide which FlowFiles to pass to the putFile to be executed upon. As far as the RouteOnAttribute, it will have an unmatched relationship and you add dynamic properties which become new relationships that can be associated with different connections. You can use NiFi Expression Language (NEL) [1] to construct a boolean statement to evaluate your routing condition. [1] https://nifi.apache.org/docs/nifi-docs/html/expression-language-guide.html If you found this response assisted with your query, please take a moment to login and click on "Accept as Solution" below this post. Thank you, Matt
... View more
10-08-2021
11:06 AM
@Ankit13 How do you know no more files we will be put after the NiFi flow processing starts? To me in sound like the PutFile should execute at default 0 secs (as fast at it can run) and you should instead control this dataflow at the beginning were you consume the data. For example: In a 24 hour window data is being written to source directory to be consumed from between 00:00:00 and 16:00:00. Then you want to write that data to target directory starting at 17:00. So you instead setup a cron on a listFile processor to consume list the files at 17:00 and 17:01 and then have a FetchFile and PutFile running all the time so these immediately consume all the content for the listed files and write them to target directory. Then your listFile does not execute again until same time next day or whatever you cron is. This way the files are all listed at same time and the putFile can execute for as long as needed to write all those files to the target directory. Hope this helps, Matt
... View more