Member since
07-30-2019
2909
Posts
1443
Kudos Received
846
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
54 | 04-23-2024 05:56 AM | |
27 | 04-22-2024 06:13 AM | |
150 | 04-17-2024 11:30 AM | |
119 | 04-16-2024 05:36 AM | |
77 | 04-15-2024 05:31 AM |
09-19-2022
05:57 AM
@Sanchari NiFi FlowFiles reside in connection between NiFi component processors. When a processor gets a thread to execute, it takes the highest priority FlowFile from an inbound connection queue and executes the processor code utilizing that FlowFiles metadata/attributes and content (if processor needs content). The FlowFile is not transferred to a processors outbound connection(s) until execution is complete. When NiFi is shutdown gracefully (meaning a user has initiated a shutdown), NiFi stops scheduling future component execution. NiFi then gives existing executing threads a grace period to complete their thread execution. At the end of that grace period, any still running threads are killed with the JVM. Since FlowFiles do not transfer to an outbound connection until code execution has completed, and FlowFile that was owned by a thread at the time the thread was killed still remains on the inbound connection. When NiFi is started again and the dataflows started, the file processing will start over when the processor executes again and executes against the highest priority FlowFile in the connection. Above being said, NiFi will favor data duplication over data loss every time. It is possible in a small window of time that processor executes and part of that execution is let's say to write a file to a remote server. NiFi may for example ack the completion of that transfer to the remote system and NiFi JVM was killed before internally it received ack back from target server. So the FlowFile would end up being processed again resulting potentially data duplication on the target server. These are rare race conditions, but possible. A restart is nothing more than a standard shutdown followed by a start. The same behavior exists in the shutdown process as described above when a restart is performed. If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped. Thank you, Matt
... View more
09-16-2022
12:49 PM
@EuGras You have a FlowFile queued somewhere within your dataflow with UUID= 6ce9e262-b20b-4372-a3b9-43c2c00e8caa The connection is trying to read the content for that FlowFile from a content claim found in the content-repository in order to load balance data across nodes in the cluster here: id=1663256223724-231072817, container=default, section=49 <path to>/content_repository/49/1663256223724-231072817 The FlowFile metadata/attributes has recorded that this content should be 2203 bytes in length; however, tis file is only 1130 bytes in size. So it appears when you had disk issue it resulted in data corruption. You could use NiFi data provenance to locate this FlowFile by UUID or filename (04190e1f-fdca-4352-a796-6b6c9ce41baa) to determine which connection contains it. On that connection you could disable load-balance connection configuration, add a routeOnAttribute processor to filter out this one bad FlowFile and auto-terminate it once it is routed out of other FlowFiles that may have been queued in that same connection. This is not to say that you may have other corruption caused by your disk issues besides this one FlowFile. If you do not care about the data on the nodes that had the disk issues, as another option, you could shutdown that one node, purge the contents of the flowfile_repository and content_repository. This will effectively delete all flowfiles queued in connections on that one node. Then restart the NiFi node. It will construct new content and flowfile repository on startup. If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped. Thank you, Matt
... View more
09-16-2022
12:32 PM
@double_z NiFi does not allow users to create locally managed user accounts (meaning creating a username and password directly in NiFi. NOTE: The latest versions of NiFi provide a single user login provider just so that NiFi can by default be launched securely. This single user would have complete access to do everything and you cannot create additional users. Ideally while this provides some security, it is not a proper recommended deployment method. The good news here is it sounds like you have successfully setup a ldap-provider in your login-identity.providers.xml. This provider provides a method by which a user can be authenticated via LDAP. Authentication via ldap does not control authorization which is handled within NiFi via the authorizers.xml. When you login to NiFi, the resulting user identity string (case sensitive) resulting from your login is evaluated against any identity mapping patterns you may have configured in your nifi.properties file. After that the resulting identity string (if pattern match was found) or unmodified identity string from login provider is passed to the NiFi authorization process. While NiFi loads the providers from the authorizers.xml from the top down, it is easer for a user to read it from bottom up. - Your are using the "managed-authorizer" which is calling the "file-access-policy-provider" - The "File-access-policy-provider" is responsible for authorizations.xml file and seeding it with an initial set of authorization policies needed for your NiFi nodes (multi-node nifi cluster) and an initial admin user. You need an initial admin so that the admin user can set additional authorization from with in the NiFi UI. - In your "file-access-policy-provider" you have told the provider to create the authorization policies for a user identity string "freeipa". So the file-access-policy-provider" must first check to see if that user is known to this NiFi. For that is is configured to use the "composite-configurable-user-group-provider". - The "composite-configurable-user-group-provider" is then configured to get users and associated groups from the "file-user-group-provider" and "ldap-user-group-provider". You can NOT have to user-group-provider return the exact same user identity string. - Your "ldap-user-group-provider" has ben configured to sync user and group identity strings from your LDAP. One of those user being returned is "freeipa". - Your "file-user-group-provider" has been configured to create a local user identity that has the same user identity string of "freeipa". So now you have two user-group-provider returning the same user identity string, so NiFi has not idea which is correct to use and throws the exception you see about two providers providing same user identity. - What you have shared above also shows the "file-user-group-provider" twice. You can't have same provider defined twice in this file. - Steps to move forward. 1. The file-user-group-provider and the file-access-policy-provider will only create the users.xml file and authorizations.xml file if they do NOT already exist. So if these two files exist, delete them (authorizers.xml and authorizations.xml are two different files, make sure you delete the correct one) 2. Make sure you ldap-provider is configured to USE_USERNAME and not USE_DN if not already set this way. 3. Unset the "Initial User Identity 1" in the file-user-group-provider. We don't want this provider creating the freeipa user in the users.xml since your ldap-user-group-provider will be providing this user identity. 4. Leave the initial admin "freeipa" set in the file-access-policy-provider. 5. start your NiFi, it will create a new users.xml and authorizations.xml during startup. At login UI, provide your ldap "freeipa" username and password. Once in the UI, your freeipa user will have all the authorization policies needed to act as and admin. This does not mean this user has all authorizations, but does have ability to grant additional authorizations to itself or other users. NiFi global menu (upper right corner) --> users (will allow you to see all users and groups long with their associations to one another synced from LDAP). It will also show any local users identities you may define (locally defined identifies will show an edit and delete icon next to them). local user and group identities are only used to set authorizations, they are not able to be used to authenticate in to NiFi. If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped. Thank you, Matt
... View more
09-16-2022
11:56 AM
@noekmc I was not clear that when you accessed the NiFi Web address you were skipping the login window completely. This means that your browser provided and alternative method of client/user authentication. When you access the NiFi web address, NiFi will always negotiate a mutual TLS handshake. This is necessary because this is how NiFi nodes authenticate with one another. If no other methods of client authentication have been configured, the mutual TLS handshake "Requires" a client certificate. When other methods of authentication are configured in NiFi, the mutual TLS handshake will "WANT" a client certificate. If no client certificate is presented, then NiFi will move on to the next configured authentication method which would spnego. Spnego based authentication is enabled when the following properties have been configured in the nifi.properties file: Make sure these two properties are clear to disable spnego auth challenge to your browser. If Spnego auth challenge is not successful, NiFi moves on to next auth method such as a configured login provider like the ldap-provider you have setup. The first step is figuring out which method (TLS client certificate or Spnego) is authenticating your user. Typically a browser will prompt you when either if these methods are invoked the first time. If you ack instead of cancel, the browser will remember that choice going forward. For TLS client auth to work, your browser must have a client certificate loaded in to it that your NiFi's truststore file is capable of trusting. For Spengo to work, Spnego must be configured in your browser. Step one: - Open an incognito browser tab (it will not have any retained cookies that would auto use a certificate or spnego) and provide the NiFi UI address. Does it redirect you immediately to the login UI. If so, you now know one of these other methods are being used. - Clear the two Spnego properties if configured in the nifi.properties file. (if already blank, then we know a TLS certificate is what is being used. - Clear browser cache and cookies. Access NiFi UI address, when prompted via browser for certifcate, cancel and you should get redirected to login window. There is not configuration change that can be made in NiFi to stop a browser from doing this. However, your decision to cancel and continue to URL without providing your certifcate should be cached by your browser so it does not ask you each time afterwards. - Try a different browser. While your certificate maybe loaded in one browser, it may not be loaded in another. Same goes for Spnego, it may not be enabled in all browsers on your client. If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped. Thank you, Matt
... View more
09-15-2022
12:45 PM
@rafy The ERROR shared seems unrelated to the action of emptying a connection queue. Are you trying to delete a connection or empty a connection queue? Can you share screenshots of the actions being performed? Can you collect the full stack trace produced in the nifi-app.log when you perform the action? Thank you, Matt
... View more
09-15-2022
12:36 PM
@noekmc The UI you are seeing is telling you that your ldap user credentials have successfully been authenticated; however, your user identity is not authorized within NiFi to "view the UI". NiFi Access Policies The ldap-provider configured in the login-identity-providers.xml handles the authentication process. The configuration within the authorizers.xml handles the authorizing of those authenticated user identities. You can tail the nifi-user.log while you login to see that your user identity that is resulting from your successful authentication. You will also then see the not authorized log output with the missing access policy. The following section of the Apache Documentation can help setting up authorization for the first time: multi-tenant-authorization If you were to share the log lines from your nifi-user.log specific to your login attempt along with the contents of your authorizers.xml file, it may be easier to provide guidance on your setup. The multi-tenant-authorization setup in the authorizers.xml has many configuration options and providers to choose from. The very basic setup would use a managed-provider that uses the file-access-policy-provider and file-user-group-provider. If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped. Thank you, Matt
... View more
09-15-2022
12:23 PM
@leandrolinof Understand that NiFi wraps all ingested content in to a NiFi FlowFile. A NiFi FlowFile consists of FlowFile meatdata/attributes (key/value pairs written to the flowfile_repository. NiFi processors can be used to add and modify existing FlowFile Attributes) and FlowFile content (bytes written to a claim within the NiFi content_repository). Since NiFi is data format agnostic, the FlowFile model allows NiFi to ingest any data format. This does not mean that NiFi can read/edit all data formats. It is the responsibility of an individual processor that must be designed/coded to operate against specific data types. While NiFi does not have a purpose built processor specifically for reading PDF, It does ship with scripting processors that can be used to execute custom script code that can extract content from a PDF in to FlowFile attributes. Reference example: https://gist.github.com/mattyb149/48e72a26d0f62f330e30 https://stackoverflow.com/questions/55169492/nifi-extract-from-pdf-to-text I assume since that directory contains multiple PDF, each of those PDF have a unique name? Once you have ingested your PDFs, the above example groovy script may allow you to create attributes on your FlowFile which you can use later via NiFi Expression Language statements to dynamically set configuration properties on the putEmail processor uniquely by FlowFile it executes upon. When you look a processor's documentation for each configuration property, you will see if it supports NiFi Expression language and to what extent it is supported Supports Expression Language: true (will be evaluated using flow file attributes) Example above states NiFi Expression language is supported and you can use FlowFile attributes which would come from each FlowFile that processor is executing against. If you can extract the needed text from each PDF in to additional attributes on each FlowFile (one FlowFile created for each PDF ingested) with buyer name, then you can use those to dynamically define properties per FlowFile in the PutEmail processor. If you desire is to consume every PDF only once, you should be using the ListFile --> FetchFile processors. The ListFile can be configured to maintain state so that the same FlowFiles are not listed more than once. (ListFile must be configured for "primary node" execution only in a multi-node NiFi cluster setup to avoid data duplication). The message body of an email is going to expect that message to ASCII text. The "Content Type" property of the PutEmail processor is expected to be set to the content' mime type (for a PDF file the mime.type would be "application/pdf"); however, i suspect the putEmail processor may have issue with that mime.type. If that is the case, you'll need to send you PDF as attachments to the email only. I am not sure what the "buyer's ID" gets you. Is that buyer's ID in the PDF filename? How does the buyer's ID get you the buyer's email address to which you will send the PDF via PutEmail? to answer you question about is it possible, I believe so. It is about collecting/creating the needed metadata/attributes from 1 or more sources and adding them to your FlowFile with the PDF content needed to successfully send your email. I hope above gets you moving along that path. NiFi offers so many components, there is often more than one may to solve a use case. First step is laying out a step by step processor for how you would solve this use case outside NiFi manually and then translate each of those steps in to automated NiFi dataflow. Then you can narrow down to the specific step in that use case where help is needed further. If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped. Thank you, Matt
... View more
09-14-2022
02:23 PM
@PratikParekh The putEmail processor supports the adding of dynamic configuration properties as outlined in the linked documentation. https://javaee.github.io/javamail/docs/api/com/sun/mail/smtp/package-summary.html It would be helpful to collect the full stack trace written to the nifi-app.log rather then sharing the partial shown via the processor bulletin produced on the NiFi UI. From that stack trace output you may be able to determine what dynamic properties need to be added. If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped. Thank you, Matt
... View more
09-14-2022
02:10 PM
@leandrolinof I am having trouble understanding your query clearly. The PutEmail processor ONLY supports talking the content of the source FlowFIle and adding it as the email message or adding it as an attachment to the email. The putEmail processor is not involved at all with how the content was added or modified in the source FlowFile that comes from the upstream ExecuteSQL processor. So what exactly are you trying to pass through the PutEmail processor? The BASE64_ENCODE you are retrieving from Oracle? You can write that to a FlowFile attribute. The "Message" configuration property of the PutEmail processor, as well as many others, support using NiFi Expression Language (NEL). That means you could read from the FlowFiles attributes anything you have placed there and insert it into the email message body. If you can provide more detail it may be helpful. It sounds like you want to send an email with aPDF file attachment, but sounds like you haven't even ingested the PDF into NiFi yet. Thank you, Matt
... View more
09-14-2022
01:55 PM
@quimic Welcome to NiFi. The PutFile processor does not produce new content, so I am not clear on "generating 2 files instead one (one file with #7000 lines the other with #3000 lines)". What this sounds like is you have two different NiFi FlowFiles (one with 7000 lines of content and another with 3000 lines of content) being passed to your PutFile processor. The PutFile processor is then just writing those to files to the configured directory path. I am guessing maybe that you wanted your MergeContent processor to produce one FlowFile with 10000 lines? If so that requires a change to the configuration within your MergeContent processor. Are you always merging 10,000 FlowFiles to 1 FlowFile? Please share some more details and the configuration of your MergeContent processor. With NiFi you need to understand that each processor operates independent of the processor before or after it. So the processor before your MergeContent is going to be moving FlowFiles in its outbound connection feeding the MergeContent at the same time MergeContent is executing. This means that not all 10,000 Source FlowFiles may be in that connection when MergeContent starts allocating those FlowFiles to a bin. If a bin meets the min merge criteria configured at completion of bin execution, it gets merged and then MergeContent executes again and gets the new FlowFiles added to that connection since last execution. The following properties control when FlowFiles allocated to bin get merged: You may also find these articles helpful when working with MergeContent processors: https://community.cloudera.com/t5/Community-Articles/Dissecting-the-NiFi-quot-connection-quot-Heap-usage-and/ta-p/248166 https://community.cloudera.com/t5/Community-Articles/How-to-address-JVM-OutOfMemory-errors-in-NiFi/ta-p/244431 If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped. Thank you, Matt
... View more