Member since
07-30-2019
3400
Posts
1621
Kudos Received
1002
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 188 | 12-03-2025 10:21 AM | |
| 508 | 11-05-2025 11:01 AM | |
| 381 | 11-05-2025 08:01 AM | |
| 649 | 11-04-2025 10:16 AM | |
| 770 | 10-20-2025 06:29 AM |
04-06-2017
12:30 PM
@Ahmad Debbas The GetHDFS processor is deprecated in favor of using ListHDFs and FetchHDFS processors. The GetHDFS processor does not retain state and therefore will start over from the beginning as you noted when an error occurs. The ListHDFS processor does maintain state, so even through NiFi restarts or processor restarts, the listing picks up where it left off. The zero byte FlowFiles produced are then passed to a FetchHDFS that actually retrieves the content and inserts it into the existing FlowFile. Another advantage to the list/fetch design model is the ability to distribute those listed zero byte files across a Nifi cluster before fetching the content. This improves performance by reducing resource strain caused by GetHDFS on a single NiFi node. Thanks, Matt
... View more
04-04-2017
01:33 PM
4 Kudos
@Pushkara Ravindra The intent of the Site-To-Site (S2S) protocol is to allow the exchange of NiFi FlowFiles between NiFi instances. A NiFi FlowFile consists of two parts: 1. FlowFile content <-- Original content in whatever format (NiFi is data agnostic and has no data format dependency) 2. FlowFile Attributes <-- Collection of key/value pairs (some are NiFi assigned by default while others are add via processors) Sending FlowFiles between NiFi instances allows the originating NiFi to share the attributes it knows about a FlowFiles content with the target NiFi instance. The FlowFile Attributes are loaded in to the FlowFile repo of the target NiFi automatically. In addition to the above, the S2S allows for the automatically smart load-balancing of FlowFiles to a target NiFi cluster. S2S allows for the auto-scaling up or down of the target Nifi cluster without the client needing to change anything. How it all works: The source/client NiFi instance/cluster will add a Remote Process Group (RPG) to their canvas and configure it to point at the URL of any target/destination NiFi instance or cluster node. The communication at this point is over HTTP protocol. Once a connection is established the destination NiFi sends S2S details back to the source NIFi (Includes URLs of nodes if destination is cluster and the current load of each node.) The RPG will continuously update this information and store a local copy of this information in the event it cannot get an update at any time. Input and output ports are used to send or receive FlowFiles from the parent process group of where they were added. So when input or output ports are added to the root canvas level of any dataflow they become "remote" input and output ports capable of sending or receiving data from another NiFi. Whether you set the S2S protocol to HTTP or RAW the above is true. What is different is what happens next (Actual FlowFile transfer). When using the RAW format (Socket based transfer), the "nifi.remote.input.host" and "nifi.remote.input.socket.port" configured values from each of the target NiFi instances are used by the NiFi client as the destination for sending FlowFiles. When using the HTTP format, the "nifi.remote.input.host" and the "nifi.web.http.port" or "nifi.web.https.port" configured values from each of the target NiFi instances are used by the NiFi client as the destination for sending FlowFiles. Advantage of RAW format is that their is a dedicated port for all S2S transfers, so under high load it affect on the NiFi HTTP interface is minimal. Advantage of HTTP, you do not need to open an additional S2S port since teh same HTTP/HTTPS port is used to transfer FlowFile. Thanks, Matt
... View more
03-30-2017
12:07 PM
1 Kudo
@Praveen Singh You could install sshpass which would allow you to use a password in the ssh connection, but i strongly recommend against this approach. This requires you to set your password in plaintext in your NiFi processor which exposes it to anyone who has access to view that component. Thanks, Matt
... View more
03-29-2017
12:57 PM
@Bram Klinkenberg The "Roles" noted above are only valid for us in the older Apache NiFi 0.x baseline. They were part of the authorized-users.xml file used in that baseline. The Apache NiFi 1.x baseline added support for multi-tenancy and a granular access control via access policies. It is an entirely new authorization method and uses different files. There is no notion of Roles in NiFI 1.x. The authorizers.xml file allows you to specify a legacy authorized-usesr.xml file in place of configuring an "Initial Admin Identity" simply to make it easy for user of NiFi 0.x to port their existing users over to NiFi 1.x. Matt
... View more
03-29-2017
12:41 PM
1 Kudo
@Bram Klinkenberg The users.xml and authorizations.xml files are generated for you the first time NiFi is started after being secured. Initially they are populated using the configuration from the authorizers.xml file. In that file you specified an "Initial Admin Identity" (assuming you used CN=admin). As a result a user (CN=admin) was added to the users.xml file and the relevant "admin" related access policies were assigned to that user in the authorizations.xml file. At this point your user (CN=admin) should be able to access the NiFi UI. The admin will use the NIFi UI to add additional users and authorize them for various access policies: Users are managed and Global Policies are applied as follows: Adding "Users" within NiFi has nothing to do with user authentication. The users you add here are for authorization to NiFi resources only. User Authentication must occur first and can be accomplished using User issued certs load in browser, Kerberos, or LDAP. Global access policies include the following: Component (Processors, process groups, and other things on canvas) level access policies are assigned to users as follows: Component level access policies include: Some Component level access policies are on available to specific components. If the currently selected component does not support the policy it will be greyed out in the list. More detail on teh various access policies can be found in teh admin guide: https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#config-users-access-policies Thank you, Matt
... View more
03-28-2017
06:43 PM
@Emmanouil Petsanis Was there some error or condition that occurred prior to this issue (Out of disk space issues, repo corruption, etc...)? Do you run into the same issue if you switch to a newer release version?
The latest HDF release is HDF 2.1.2 http://docs.hortonworks.com/HDPDocuments/HDF2/HDF-2.1.2/index.html Thanks, Matt
... View more
03-28-2017
06:33 PM
@Bram Klinkenberg Glad to hear it is resolved. If this answer provided what you needed to resolve your issue, please accept the answer. Thank you,
Matt
... View more
03-28-2017
04:23 PM
@Anishkumar Valsalam No problem, everyone starts somewhere. Keep in mind that in a cluster every node is running the same dataflow. Node 1 has no idea what node 2 is doing and vice versa. So by default all 3 nodes in your cluster running the above dataflow and each may perform slightly different. When looking at the Ui of any one of the nodes in your NiFi cluster, the stats shown are the cumulative stats for all node sin your cluster. You should not assume that the numbers will always evenly divide between your connected nodes. When you make a request in the UI, that request must be replicated to all nodes in your cluster. So image a request to start or stop your GenerateFlowFile processor, that Processor may get started or stopped at not the exact same moment in time on each node. Considering the rate at which it produces your small test files, i would not expect the numbers to be the same. In addition, other very small difference can affect each node differently. What other process, service, OS level thing happen to run on one node and not another. which node is the cluster coordinator (does extra work), etc... While in the big picture the impact to over performance is negligible, with this simple flow you can see some differences. You can right click on a processor and select "Status History" to open a graph that will show various stats per node. The different stats are in a pull-down menu in the upper right corner of the Status History window. Blue line shows cumulative values (same as what is shown on processors). There is a different colored line for each node. Some suggestion for using this forum: 1. Try to keep one question per post, you tend to get better responses that way. This questions is related, so you are good there. 2. If you find an answer that got you the answer you were looking for, accept that answer so it benefits others using this forum. Thank you,
Matt
... View more
03-28-2017
03:27 PM
4 Kudos
@Nikhil Chaudhary The encryption of values in the NiFi variable registry is not there yet. It is a future goal in Apache NiFi. There is an existing Apache Jira that covers the adding of this capability: https://issues.apache.org/jira/browse/NIFI-2653 Thanks, Matt
... View more
03-28-2017
03:24 PM
@Anishkumar Valsalam
... View more