Member since
07-30-2019
3118
Posts
1558
Kudos Received
907
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
139 | 12-13-2024 10:58 AM | |
293 | 12-05-2024 06:38 AM | |
235 | 11-22-2024 05:50 AM | |
212 | 11-19-2024 10:30 AM | |
193 | 11-14-2024 01:03 PM |
08-30-2016
08:54 PM
@Saikrishna Tarapareddy
Just want to make sure I understand completely.
You can establish a connection from your local machine out to your remote NiFi; however, you cannot have yoru remote NiFi connect to your local machine. correct?
In this case you would install a NiFi instance on your local machine and the Remote Process Group (RPG) would be added to the canvas on that local NiFi instance. The NiFi instance running the RPG is acting as the client in the connection between NiFi instances. On your remote NiFi instance, your dataflow that is fetching files from your HDFS would need to route those files to an output port located on the root canvas level. (output and input ports allow FlowFiles to transfer from one level up in the dataflow. So at the root level they allow you to interface with another NiFi.)
For this transfer to work your local instance of NiFi will need to be able to communicate with the http(s) port of your remote NiFi instance (NCM http(s) port if remote is a NiFi cluster). Your local instance will also need to be able to communicate with the configured Site-To-Site (S2S) port on your remote instance (Need to be able to communicate with S2S port on every Node if remote is a NiFi cluster). nifi.properties file # Site to Site properties
nifi.remote.input.socket.host=<remote instance FQDN>
nifi.remote.input.socket.port=<S2S port number> The dataflow on your remote NiFi would look something like this: The dataflow on your local NiFi would look something like this: As you can see in this setup the local NiFi is establishing the connection to the remote NiFi and pulling the data from the output port "outLocal". Thanks,
Matt
... View more
08-29-2016
09:01 PM
@Saikrishna Tarapareddy Your Regex above says the CSV file content must start with Tagname,Timestamp,Value,Quality,QualityDetail,PercentGood
So, it should not route to "Header" unless the CSV starts with that. What is found later in the CSV file should not matter. I tried this and it seems to work as expected. If i removed the '^', then all files matched. Your processor is also loading 1 MB worth of the CSV content for evaluation; however, the string you are searching for is far fewer bytes. If you only want to match against the first line, reduce the size of the buffer from '1 MB' to maybe '60 b'. If I changed the buffer to '60 b' and removed the '^' from the regex above, only the files with the matching header were routed to "header".
Thanks, Matt
... View more
08-29-2016
06:47 PM
2 Kudos
@Saikrishna Tarapareddy The mergeContent processor is not designed to look at the content of the NiFi FlowFiles it is merging. What you will want to do first is use a RouteOnContent processor to route only those Flowfiles where Content contains the headers you want to merge. The 'unmatched' FlowFiles could then be routed elsewhere or auto-terminated.
Thanks, Matt
... View more
08-26-2016
12:00 PM
3 Kudos
@kishore sanchina NiFi only supports user controlled access when it is configured to run securely over HTTPS. The HTTPS configuration of NiFi will require a keystore and truststore is created/provided. If you don't have a corporately provided PKI infrastructure that can provide your with TLS certificates for this purpose, you can create your own. The following HCC article will walk you through manually creating your own: https://community.hortonworks.com/articles/17293/how-to-create-user-generated-keys-for-securing-nif.html Once your NiFi is setup securely, you will need to enable user access to the UI. There are two parts to successful access: 1. User authentication <-- This can accomplished via TLS certificates, LDAP, or Kerberos. Setting up NiFi to use one of these login identity providers is covered here: https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#user-authentication 2. User Authorization <-- This is accomplished through NiFi via the authorized-users.xml file. This process is documented here: https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#controlling-levels-of-access You will need to manually populate the Authorized-users.xml file with your first "Admin" role user. That Admin user will be able to approve access to other users who have passed the authentication phase and submitted a UI based authorization request. Thanks, Matt
... View more
08-25-2016
08:41 PM
1 Kudo
@INDRANIL ROY
NiFi does not distribute processing of a single file across multiple Nodes in a NiFi cluster. Each Node works on its own set of files. The Nodes themselves are not even aware other nodes exist. They work on what files they have and report their health and status back to the NiFI Cluster Manager (NCM). 1. What format is this file in? 2. What kind of processing are you trying to do against this files content? 3. Can the file be split in to numerous smaller files (Depending on the file content, NiFi may be able to do the splitting)? As an example: A common dataflow involves processing very large log files. The large log file is processed by the SplitText processor to produce many smaller files. These smaller files are then distributed across a cluster of NiFi nodes where the remainder of the processing is performed. There are a variety of pre-existing "split" type processors. Thanks, Matt
... View more
08-25-2016
02:55 PM
4 Kudos
@kishore sanchina The simplest answer to your question is to use the ListFile processor to produce a list of the files from your local filesystem, feed that to a fetchFile processor that will pickup the content and then pass them to a PutHDFS processor to send them to your HDFS. The listFile processor will maintain state based on lastModified time on the files to ensure the files are not listed more then once. If you right click on either of these NiFi processors you can select "usage" from the displayed context menu to get more details on the configuration of each of these. Thanks, Matt
... View more
08-25-2016
02:00 PM
@INDRANIL ROY The massive size of your file, ListSFTP/FetchSFTP may not be the best approach. Let me ask a few questions: 1. Are you picking up numerous files of this multi-TB size or are we talking about a single file? 2. Are you trying to send the same TB file to every Node in your cluster or is each node going to receive a completely different file? 3. Is the directory where these files are originally consumed from a local disk or a network mounted disk?
... View more
08-24-2016
03:43 PM
1 Kudo
Just to clarify on how S2S works when communicating with a target NiFi cluster. The NCM never receives any data so it cannot act as the load-balancer. When the source NiFi communicates with the NCM, the NCM returns a list of all currently connected nodes and there S2S ports along with the current load on each node to the source NiFi. It is then the job of the source NiFi RPG to use that information to do a smart load-balanced delivery of data to those nodes.
... View more
08-24-2016
03:04 PM
Anything you can do via the browser can be done my making calls to the NiFi-API. You could either setup an external process to run a couple curl commands to start and they stop the GetTwitter processor in your flow or you could us a couple invokeHTTP processors in your dataflow (configured using the cron scheduling strategy) to start and stop the GetTwitter processor on a given schedule. Matt
... View more
08-24-2016
02:14 PM
1 Kudo
@INDRANIL ROY What you describe is a very common dataflow design. I have a couple question for clarity. RPG (Remote Process Group) do not send to other RPGs. RPG send and pull data from input and output ports located on other NiFi instances. I suspect your standalone instance has the RPG and it is sending FlowFiles to input port(s) on the destination NiFi cluster. In this particular case the load-balancing of data is being handled by the RPG. For network efficiency data is distributed in batches, so you may not see with light dataflows and exact same number of FlowFiles going to each Node. Also the Load-balancing has logic built in to it so that Node in the target cluster who have less work load get more FlowFiles. Although the URL provided to the RPG is the URL for the target BNiFi cluster's NCM, the FlowFiles are not sent to the NCM, but rather sent directly to the connected nodes in the target cluster. Every Node in a NiFi cluster operates independently of one another working only on the FlowFiles it possesses. Nodes do not communicate with one another. They simply report their health and status back to the NCM. It is information from those health and status heartbeats that is sent back to the source RPG and used by that RPG to do the smart data delivery. In order to distribute the fetching of the source data, the source directory would need to be reachable by all nodes in the target NiFi cluster. In the case of ListFile/FetchFile, the directory would need to mounted identically to all systems. Another option would be to switch to a listSFTP/FetchSFTP setup. In this setup you would not even need your standalone NiFi install. You could simply add a listSFTP processor to your cluster (configured to run "on primary node"). Then take the success from that listing and feed it to an RPG that points back at the clusters NCM URL. An input port would be used to receive the now load-balanced FlowFiles. Feed the success from that input port to the FetchSFTP processor and now you have all nodes in your cluster retrieving the actual content. So as you can see from the above the listSFTP would only run on one node (Primary Node) producing no content FlowFiles. The RPG would smartly distribute those FlowFile across all connected nodes where the FetchSFTP on each Node would retrieve the actual content. The same flow above could be done with listFile and FetchFile as well, just mount the same source directory to every node and follow the same model. Matt
... View more