Member since
07-30-2019
3467
Posts
1641
Kudos Received
1016
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 131 | 05-04-2026 05:20 AM | |
| 437 | 03-23-2026 05:44 AM | |
| 328 | 02-18-2026 09:59 AM | |
| 576 | 01-27-2026 12:46 PM | |
| 1009 | 01-20-2026 05:42 AM |
07-09-2020
05:42 AM
@Ben37 The GetHDFSFileInfo processor still only produced one single output FlowFile after making the change? I'd expect to see a separate FlowFile for each sub-directory found as well as each file found within those directories. This is where the RouteOnAttribute processor i mentioned would be used to drop any of the FlowFiles specific to just the "directory" and not a specific "file". Then only the FlowFiles specific to "files" would be sent on to your FetchHDFS for content ingestion. Matt
... View more
07-09-2020
05:24 AM
@JohnA The latest versions of NiFi-Registry introduced public buckets. When accessing the NiFi-Registry's https URL without presenting either a client certificate or responding to a spnego auth challenge (provide spnego properties are configured), the user will access to the public buckets as the anonymous user. If there are no anonymous buckets the user will simply see the NiFi-Registry Ui with nothing displayed they can use. To access the non public buckets and any other capability requiring authorization, the user/client would need to authenticate. In the case where a login provider is setup, the login option will be available in upper right corner which a user can click on to "login". When it comes to NiFi interacting with NiFi-Registry a successful mutual TLS handshake must be successful; otherwise, the connection will happen as anonymous since no user authentication occurred. You should not be authorizing the anonymous user, but instead fixing the mutual TLS handshake between your NiFi and NIFi-Registry. The keystore and truststore configured in both the nifi.properties and nifi-registry.properties files are what are used in facilitating that successful handshake. Since NiFi is acting as the client in this handshake, its PrivateKeyEntry in its keystore must support clientAuth EKU and its truststore must be contain the complete trust chain for the server certificate being present by NiFi-Registry. The truststore used in NiFi-Registry must contain the complete trust chain for the certificate being presented by the client (NiFi). The NiFi certificate authenticated string must then be authorized to read ll buckets and proxy within NiFi-Registry. Additionally any user initiating version control actions from within NiFi must also be authorized within NiFi-Registry for any buckets they will need access to. Hope this helps, Matt
... View more
07-08-2020
06:09 AM
@sgk The error you are seeing has nothing to do with authorization at this point. It is throwing an error during authentication of your user. So your focus at this point is on your ldap-provider configuration since it is handling the authentication of your user. "The Supplied Username or Password are not valid" indicates that the LDAP search resulted in no returns or the password used was wrong. Observations: 1. Are you using ldap or Active Directory (AD). I see you have set "User Search Filter" to "sAMAccountName={0}". sAMAccountName is more commonly seen in AD and not LDAP. Did you try using the ldapsearch command from a terminal window on your NiFI server to make sure you can return a listing for your user using this search filter? ldapsearch -x -H ldap://<ldap-hostname/IP>:<ldap-port> -D "<Manager DN>" -w "<Manager password>" -b "<user search base>" "sAMAccountName=<username>" 2. Not that this has anything to do with successful authentication, but I see you have set "Identity Strategy" to "USE_DN" which then uses the users full DN from ldap to identify that user during authorization actions following successful authorization. If you set this to "USE_USERNAME", the user string type at login will be used. 3. Also has nothing to do with authentication, but I see you are using "CN=localhost, OU=NiFi" as your "node identity 1" value. Using localhost in your node certificates is not advisable. This should be set to unique value. Also keep in mind that the keystore used by NiFi must meet the following minimum requirements: - Contain only 1 "PrivateKeyEntry" - The "PrivateKeyEntry" must support both clientAuth and serverAuth ExtendedKeyUsage (EKU). - The "PrivateKeyEntry" must contain at least 1 SubjectAlternativeName (SAN) that matches the hostname of the server on which the certificate is being used. Hope this information helps you progress with your authentication and then authorization setup in NiFi. Matt
... View more
07-08-2020
05:32 AM
1 Kudo
@Ben37 The GetHDFSFileInfo processor will produce one output FlowFile containing a complete listing of all files/directories found based upon the configured "Full path" and "Recurse Subdirectories" properties settings when the property "Group Results" is set to "All". Since you want a single FlowFile for each object listed from HDFS, you will want to set the "Group Results" property to "None". You should then see a separate FlowFile produced fro each object found. Then in your FetchHDFS processor you would need to set the property "HDFS Filename" to "${hdfs.path}/${hdfs.objectName}". You may also find that you need to insert a RouteOnAttribute processor between your GetHDFSFileInfo and FetchHDFS processors to route out any FlowFIles produced by the GetHDFSFileInfo processor that are for directory objects only (not a file). You simply add a dynamic property to route any FlowFile produced from GetHDFSFileInfo processor that has the attribute "hdfs.type" set to "file" on to the fetchHDFSprocessor and send all other FlowFiles to the unmatched relationship which you can just auto-terminate. Other things to consider: 1. Keep in mind that the GetHDFSFileInfo processor does not maintain any state, so every time it executes it will list all files/directories from the target regardless of whether they were listed before or not. The ListHDFS processor uses state. 2. If you are running your dataflow in a NIFi multi-node cluster, every node in your cluster will be performing the same listing (which may not be what you want). If you only want the list of target files/directories listed by one node, you should configure the GetHDFSFileInfo processor to "execute" on "Primary node" only (configured from processors "scheduling" tab). You can use load balancing configuration on the connection out of the GetHDFSFileInfo processor to redistribute the produced FlowFiles across all nodes in your cluster before they are processed by the FetchHDFS processor. Hope this helps, Matt
... View more
07-07-2020
05:31 AM
1 Kudo
@venkii What version of NiFi are you running? There are a set of bugs identified here that are likely related to the issue you described: https://issues.apache.org/jira/browse/NIFI-5948 https://issues.apache.org/jira/browse/NIFI-6020 https://issues.apache.org/jira/browse/NIFI-6027 The good news is that these have all since been resolved. I would recommend upgrading your NiFi to a version 1.10 or newer. Or HDF 3.4.1.1 or newer Or CFM 1.0.1 or newer As a workaround, you could shutdown you NiFi and search the users.xml and authorizations.xml file for uuid "938eb61e-bbc4-383a-8475-aee80541b5a5" and remove all references to it. You would then need to make the exact same changes to the users.xml and authorizations.xml file on every other node in your NiFi cluster or copy the corrected files from one node to the other nodes. Hope this helps, Matt
... View more
04-30-2020
08:28 AM
1 Kudo
@Logann NiFi does not offer local user creation for authentication. There is no way to create local users and assign them passwords for the purpose of user authentication. User Authentication require either: 1. User certificates (always requested by NiFi during TLS handshake) 2. Spnego auth (Spnego auth challenge sent to browser if spnego properties configured in nifi.properties. This request is only sent if 1 did not result in client certificate in response from client) 3. Configured login provider (uses login-provider configured in login-identity-providers.xml and referenced in nifi.properties file. Only used if both 1 and 2 did not provide client/user authentication already). 4. NiFi will also support other OpenID Connect supported authentication providers. Hope this helps, Matt
... View more
04-30-2020
05:29 AM
2 Kudos
@abhinav_joshi This is something that only affects the Apache NiFi releases only. The removal of some nars is documented in the Apache NiFi migration guidance wiki here: https://cwiki.apache.org/confluence/display/NIFI/Migration+Guidance "Starting in 1.10, the following nars were removed from the default convenience binary. These include kite-nar, kafka-0-8-nar, flume-nar, media-nar, druid-controller-service-api-nar, druid-nar, other-graph-services-nar. You can still get them from the various artifact repositories and use them in your flows but we cannot bundle them due to space limitations by default." For Apache NiFi users, I recommend that users create a custom NiFi lib directory by adding the following property to their nifi.properties file: nifi.nar.library.directory.<unique-string>=/<path to>/custom-lib1
for exmaple:
nifi.nar.library.directory.NiFi-1-10-nars=/nars/nifi-1-10-lib You can add as many custom lib directories as you want. Then place any nars that are noted as removed via the migration guidance documentation into one of these custom lib paths before upgrading. Then as part of your upgrade process add the above property to your nifi.properties file. Then later if additional nar bundles are deprecated, you can create another custom lib dir or just add those nars to your existing custom lib directory before upgrading. This allows you to avoid the downtime you encountered. The Cloudera releases of HDF and CFM include are built with all the nars by default currently as they are not affected by space limitations. Hope this helps, Matt
... View more
04-15-2020
07:54 AM
2 Kudos
@memad If your GetFile processor is consuming files before they have finished writing there are a few changes that may help: 1. How are files being written in to the directory? The default "File Filter" will ignore files that start with a ".". If it is possible to change how files are being written to the directory, that will solve your issue through a file filter. For example.... writing new files to directory as ".<filename>" and upon successful write does a rename to remove the dot (this is how ssh works). But you can of course setup any file filter that works for you, 2. Assuming the process that is writing files to the directory is always updating the timestamp on the file, you can use the "Minimum File Age" property to prevent the GetFile from consuming a file until the last modified timestamp in the file has not updated for the configured amount of time. This works in most cases, except when there may be long pauses in the write process that exceeds the configured Min File Age time. Hope this helps, Matt
... View more
03-30-2020
01:20 PM
1 Kudo
@venkii You need to login to your secured NiFi-Registry and make sure all your NiFi nodes have been authorized for both the following "Special Privileges": 1. "Read" for "Can Manage Buckets" 2. "Can proxy user requests" Click on wrench icon in upper right corner to manage your users in NiFi-Registry. Then find your NiFi nodes in the list of USERS and click on the "manage user" pencil icon to the far right side. Hope this helps, Matt
... View more
03-27-2020
01:39 PM
1 Kudo
@Petr_Simik No matter which processor you are looking at the stats presented all tell you the same information: In <-- Tells you how many FlowFile were processed from one or more inbound connections over the last rolling 5 minute window. With this processor you have it configured the "wait mode" to leave the FlowFile on the inbound connection. So the processor is constantly looking at the file over and over again until the configured expiration time has elapsed. Read/Write. <-- Tells you how much FlowFile content was read from or written to the NiFi content repository (helps user identify processors that may be disk I/O heavy) Out. <-- Tells you how many FlowFiles have been released to an outbound connection over the last rolling 5 minute window. Here you see a number that reflects only those flowfiles that expired and where sent to your outbound expired connection. Tasks/Time. <-- Tells you how many threads this processor completed execution over the last rolling 5 minutes and the total cumulative time those threads consumed from the CPU. (helps user identify what processors consume lots of CPU time) So the stats you are seeing are not surprising. While this processor works for your use case i guess, it has overhead needing to connect to a distributed map cache on every execution against an inbound FlowFile. If your intent is only to delay a FlowFile for 1 second before it proceeds down the flow path, a better solution may be to just use an updateAttribute processor that creates an attribute with current time and RouteOnAttribute processor that checks to see if that recorded time plus 1000 ms is less than current time. Then loop that check until it is not. Hope this helps, Matt
... View more