Member since
07-30-2019
3136
Posts
1565
Kudos Received
910
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
34 | 01-16-2025 03:22 PM | |
175 | 01-09-2025 11:14 AM | |
963 | 01-03-2025 05:59 AM | |
448 | 12-13-2024 10:58 AM | |
505 | 12-05-2024 06:38 AM |
08-15-2016
05:15 PM
@Yogesh Sharma You are seeing duplicate data because the run schedule on your invokeHTTP processor is set to 1 sec and the data you are pulling is not updated that often.
You can build in to your flow the ability to detect duplicates (even across a NiFi cluster). In order to do this you will need the following things setup: 1. DistributedMapCacheServer (Add this controller service to "Cluster Manager" if clustered. If standalone it still needs to be added. This is configured with a listening port) 2. DistributedMap CacheClientService (Add this controller service to "Node" if clustered. If standalone it still needs to be added. This is configured with teh FQDN of the NCM running the above Cache Server.) 3. Start the above controller services. 4. Add a HashContent and DetectDuplicate processors to your flow between your invokeHTTP processor and the SplitJson processors.
I have attached a modified version of your template.
eqdataus-detectduplicates.xml If you still see duplicates, adjust the configured age off duration in the DetectDuplicate processor.
Thanks, Matt
... View more
08-15-2016
12:40 PM
2 Kudos
@Obaid Salikeen Not sure what "issues" you had when you tried to add a new node to your existing cluster. The components (processors, connections, etc...) of an existing cluster can be running when you add new additional nodes to it. The new nodes will inherit the flow and templates from the NCM as well as the current running state of those components when it joins.
But, in order for a node to successfully join a cluster the following must be true:
1. The new node either has no flow.xml.gz file and templates directory or the flow.xml.gz file and templates do not match what is currently on the NCM. (Remove flow.xml.gz file and templates dir from new node and restart node) The nifi-app.log will indicate in if a difference was found. 2. The nifi.sensitive.props.key= in the nifi.properties file must have the same value as on the NCM. 3. The NCM must be able to resolve the URL to the new node. If the nifi.web.http(s).host= was left blank on your new node, Java on that node may be reporting the hostname as localhost. Make sure valid resolvable hostnames are supplied for nifi.web.http.host=, nifi.cluster.node.address=, and nifi.cluster.node.unicast.manager.address=. 4. Both NCM and Node security protocol must match. nifi.cluster.protocol.is.secure= in nifi.properties file. 5. Firewalls must be open between NCM and Node on both HTTP(s) port and node and NCM ports. 6. New node must have all the same available java classes. If custom processors exist in your flow make sure the new node also has those custom nar/jar files included in its lib dir. Thanks, Matt
... View more
08-05-2016
09:20 PM
1 Kudo
@Saikrishna Tarapareddy
Where all 74 files in the input queue before the MergeContent was run?
The mergeContent processor just like the other processors works on a run schedule. My guess is that last file was not in the queue at the moment the MergeContent processor ran, so you only saw 13 get bundled instead of 14. With a min of 4 entries, it will read what is on the queue and bin it. You likely ended up with 3 bins with 20 and 1 bin with 13 because at the moment it looked at the queue 73 or 13 FlowFiles is all it saw.
You can confirm this by stopping the MergeContent and allowing all 74 files to queue before staring it. The behavior should then be as you suspect.
Sounds like it is not important to have exactly 20 per merged file. Perhaps you can set a max bin age so that files don't get stuck.
Something else you can do is adjust the run schedule so the mergeContent does not run as often. The default is "o sec" which means run as fast as possible. Try changing that to somewhere between 1 and 10 sec to give the files a chance to queue. If you are picking up all the 74 files at the same time, we are likely talking milliseconds here that is causing this last file to get missed. Thanks, Matt
... View more
08-05-2016
02:48 PM
The attached images do not really show us your complete configuration. Can you generate a template of your flow through the NiFi UI and share that? You create a template by highlighting/selecting all components you want to include in your template and then click on the "create template" icon in the upper center of the UI. After the template has been created you can export it out of your NiFi from the template management UI icon (upper right corner of UI). Then attach that exported xml template here.
... View more
08-05-2016
01:50 PM
With a NiFi cluster, every node in that cluster runs the exact same dataflow. Some data ingest type processors are not ideally suited for this as they may complete or pull the same data in to each cluster node. In cases like this it is better to set the scheduling strategy on these processor to "On primary Node" so that the processor only runs on one node (primary node).
You can then use dataflow design strategies like RPGs (NiFi Site-to-Site) to redistribute the received data across all your NiFi cluster nodes for processing.
... View more
08-05-2016
11:57 AM
1 Kudo
@Yogesh Sharma Is your NiFi a cluster or Standalone instance of NiFi? If it is a cluster, it could explain why you are seeing duplicates since the same GetTwitter processor would be running on every Node. Matt
... View more
08-03-2016
10:55 AM
5 Kudos
@Ankit Jain
When A NiFi instance is designated as a node its starts sending out heartbeat messages after it is started. Those heartbeat messages contain important connection information for the Node. Part of that messages is the hostname for each connecting node. If left blank Java will try to determine the hostname and in many cases the hostname ends up being "localhost". This may explain why the same configs worked when all instances where on the same machine.
Make sure that all of the following properties have been set on everyone of your Nodes:
# Site to Site properties
nifi.remote.input.socket.host= <-- Set to the FQDN for the Node musty be resolvable by all other instances.
nifi.remote.input.socket.port= <-- Set to unused port on Node.
# web properties #
nifi.web.http.host= <-- set to resolvable FQDN for Node
nifi.web.http.port= <-- Set to unused port on Node
# cluster node properties (only configure for cluster nodes) #
nifi.cluster.is.node=true
nifi.cluster.node.address= <-- set to resolvable FQDN for Node
nifi.cluster.node.protocol.port= <-- Set to unused port on Node
nifi.cluster.node.protocol.threads=2
# if multicast is not used, nifi.cluster.node.unicast.xxx must have same values as nifi.cluster.manager.xxx #
nifi.cluster.node.unicast.manager.address= <-- Set to the resolvable FQDN of your NCM
nifi.cluster.node.unicast.manager.protocol.port= <-- must be set to Manager protocol port assigned on your NCM.
Your NCM will need to be configured the same way as above for the Site-to-Site properties and Web properties, but instead of the "Cluster Node properties", you will need to fill out the "cluster manager properties": # cluster manager properties (only configure for cluster manager) #
nifi.cluster.is.manager=true
nifi.cluster.manager.address= <-- set to resolvable FQDN for NCMnifi.cluster.manager.protocol.port= <-- Set to unused port on NCM. The most likely cause of your issue is not having the host/address fields populated or trying to use a port that is already in use on the server.
If setting the above does not resolve your issue, try setting DEBUG for the cluster logging in the logback.xml on one of your nodes and the NCM to get more details: <logger name="org.apache.nifi.cluster" level="DEBUG"/>
... View more
08-02-2016
05:48 PM
3 Kudos
@Obaid Salikeen Try using \\n (double backslash) or using 'Shift + enter" in the expression language editor box to create new lines in your replacement string as shown by Joe Witt above.
Thanks, Matt
... View more
07-22-2016
12:26 PM
@Manikandan Durairaj
Simon is completely correct above; however, I want to add a little to his statement about saving the entire flow.xml.gz (Standalone or NiFI Cluster Node) file or flow.tar (NiFi Cluster NCM) file.
When you generate templates in NiFi, those dataflows are scrubbed of all encrypted values (passwords). When importing those templates in to another NiFi, the user will need to repopulate all the processor and controller tasks passwords manually.
Saving off the flow.xml.gz or flow.tar file will capture the entire flow exactly as it is, encrypted sensitive passwords and all. NiFi will not start if it cannot decrypt these encrypted sensitive properties contained in the flow.xml. When sensitive properties (passwords) are added they are encrypted using these settings from your nifi.properties file:
# security properties #
nifi.sensitive.props.key=
nifi.sensitive.props.algorithm=PBEWITHMD5AND256BITAES-CBC-OPENSSL
nifi.sensitive.props.provider=BC In order to drop your entire flow.xml.gz or flow.tar onto another clean NiFi, these values must all match exactly.
Thanks, Matt
... View more
07-18-2016
10:39 PM
1 Kudo
@gkeys What are the permissions on both the file(s) you are trying to pickup with the GetFile processor and the permissions on the directory the file(s) live in? -rwxrwxrwx 1 nifi dataflow 24B Jul 18 18:20 testfile and drwxr-xr-- 3 root dataflow 102B Jul 18 18:20 testdata With the above example permission, I reproduce exactly what you are seeing. If "Keep Source File" is set to true, NiFi creates a new flowfile with the content of the file. If "Keep Source File" is set to false, NiFi GetFile yields because it does not have the necessary permissions to delete the file from the directory. This is because the write bit is required on the source directory for the user who is trying to delete the file(s). In my example nifi is running as user nifi, so he can read the files in the root owned testdata directory because the directory group ownership is dataflow just like my nifi user and the dir has r-x permissions. fi i change that dir permissions to rwx then my nifi user will also be able to delete the testfile. Thanks,
Matt
... View more