Created 01-30-2019 06:40 PM
I will send request to ListenHttp(running on each node) using PostHTTP(running on only primary node). For this, i am getting nodes address from nifi rest-api(/controller/cluster). And i prepare URL like this;
http://nodeAddress:port/contentListener
Can I successfully send request to ListenHttp processors in each node with PostHttp using this URL?
Created 01-30-2019 08:20 PM
Please elaborate on your use case here.
What do you mean by sending "request"?
-
Are you trying to have use postHTTP to send the same NiFi FlowFile to every node in your cluster?
If so, then using PostHTTP to send to ListenHTTP is the correct path. I would not have PostHTTP configured to run on Primary node only. Only dataflow source processors should ever be configured fro primary node only operation. Having processors in the body of a dataflow configured for primary node only can result in data stuck in connection queues if a the elected primary node should change.
-
Thanks,
Matt
-
If you found this answer addressed your question, please take a moment to login in and click the "ACCEPT" link.
Created 01-30-2019 09:00 PM
*** Community Forum Tip: Try to avoid starting a new answer in response to an existing answer. Instead use comments to respond to existing answers. There is no guaranteed order to different answer which can make it hard following a discussion.
-
Even if you use PostHTTP to ListenHTTP and a particular FlowFile is destined for a specific node and that node goes down, that data will remain sitting in that connection queue unless you route "failure" relationship to another postHTTP with a different destination. And if postHTTP is running primary node only, again you will have data stuck on the old primary node if a primary node changes. Just because a primary node change occurs does not mean the node is necessarily down.
-
Resulting behavior seems same to me here. If a node goes down, all data on that node is stuck on that node until it is recovered. This includes FlowFiles on primary node feeding a postHTTP (if primary node goes down or changes) and FlowFiles redistributed via a connection using "Partition by Attribute" if one of the nodes receiving a partition of data goes down.
-
Unless you are regularly polling the nifi-api/controller/cluster endpoint to see what nodes are actually connected. and adjusting which nodes get data based on who is still connected. But then you end up with a node not getting some data or some other node maybe getting double data?
-
Would be interested in what your proposed dataflow design would look like. That may clear up some of my questions.
-
Thanks,
Matt
Created 01-30-2019 09:10 PM
@Matt Clarke
I will look for answers to these questions.. Then I'll re-edit this question.
Thank you so much.
Created 01-30-2019 09:03 PM
@Matt Clarke
Yes, I know that some data stuck in connection queues. But I have to use postHttp(primary node) and listenHttp for distribute data to cluster. Because, I want to send the same FlowFile to same node. For this, "Partition By Attribute" may be used. Even, its great for this scenario. But, if any node downs, some data will be waiting in the unavailable queue. But i don't want this. Because, i will lose data due to these data that waiting in the unavailable queue.
Also, Roun-roind is available. But, it doesn't same data to same node. For this, I will use postHttp and listenHttp for distribute data to cluster.
Thank you
Created 01-30-2019 09:28 PM
you'd have to look at the source, but its been my experience that ${hostname()} gives you the same value as if you ran the unix command 'hostname' Whether nifi is listening there just depends how you have configured other things....
I imagine that the node name you get back from the api is the nifi.web.http[s].host nifi property
But it sounds like you are trying to do something that would be better served by just running a completely independent flow on each node.
Created 01-30-2019 09:34 PM
@David Miller
"But it sounds like you are trying to do something that would be better served by just running a completely independent flow on each node."
--
Yeah, that's exactly what I want to do. So I'm reviewing all possible scenarios.
Thank you.
Created 01-30-2019 10:37 PM
The NiFi Expression Language function ${hostname()} does not return what is configured in the nifi.properties file for nifi.web.http[s].host. It uses java to query system for hostname. So an improperly configured system may return localhost as the hostname. But you are correct that it only returns the hostname of the NiFi node on which it was executed. I believe that is why he was querying the rest-api/controller/cluster endpoint to get json which includes all nodes in cluster and their current connection status.
Created 01-31-2019 06:30 PM
Yes, I am getting nodeAddress and its status from "rest-api/controller/cluster" for create http endpoint. I will send flowFile to other active nodes after i prepare URL. For example;
Result of "rest-api/controller/cluster";
{ "cluster": { "nodes": [ { "nodeId": "node1Id" , "address": "node1Address", "apiPort": 9999, "status": "CONNECTED", "heartbeat": "value", "connectionRequested": "value", "roles": [], "activeThreadCount": 0, "queued": "value", "events": [{ "timestamp": "value", "category": "value", "message": "value" }], "nodeStartTime": "value" }, { "nodeId": "node2Id" , "address": "node2Address", "apiPort": 9999, "status": "CONNECTED", "heartbeat": "value", "connectionRequested": "value", "roles": ["PRIMARY"], "activeThreadCount": 0, "queued": "value", "events": [{ "timestamp": "value", "category": "value", "message": "value" }], "nodeStartTime": "value" } ], "generated": "value" } }<br>
I get node1Address and node2Address(if they CONNECTED) values from this json. Then, I am creating URL like this;
http://node1Address:9999/contentListener
Then, I will send post request to listenHttp(running on each node) using postHttp with this URL. ListenHttp(running on the each node) will listen to the this path and retrieve the data.
Created 01-31-2019 07:31 PM
Hey Matt, yeah thats what I said 🙂
Adam, I think your approach is wrong. If you are trying to get one flowfile to appear on each node, just have each node get the flowfile. If you want to send flowfiles between nodes, use s2s or rebalancing. You are re-inventing the wheel here.