Created 05-31-2018 08:12 PM
I have a situation where I query Elasticsearch using "QueryElasticsearchHttp" to match a query then return results of the query to populate flowfile-attributes. However, if the query returns empty results it doesn't go to "failure queue". I need the results but the results are inconsistent, they may not be ready to be queried. I want it to fail so that I can "Retry at a later time" using ControlRate. However, whenever QueryElasticsearchHttp fails to find the document, it just disappears, no failure, no success queue file. The flowfile disappears.
I CANNOT use FetchElasticsearchHttp because it has no property: "Target: flowfile-attributes" ( I need the flowfile content that is being queried )
I am really stuck now, because my results in elasticsearch are populated by a different nifi flow process, so it may not be ready to be "Queried from Elastic". That's a timing issue. I want it to try querying again later.
Created 05-31-2018 08:20 PM
You are running into NIFI-3576, this will be included in the next release of NiFi (1.7.0). As a workaround, you could try a MonitorActivity processor after QueryESHttp, if its Threshold Duration exceeds the Run Schedule of QueryESHttp (plus any time it would take to complete the query), then it would issue a flow file you could use to route back to QueryESHttp to try again. In this case I believe you'd need to "prime" QueryESHttp with an initial flow file.
Created 05-31-2018 08:20 PM
You are running into NIFI-3576, this will be included in the next release of NiFi (1.7.0). As a workaround, you could try a MonitorActivity processor after QueryESHttp, if its Threshold Duration exceeds the Run Schedule of QueryESHttp (plus any time it would take to complete the query), then it would issue a flow file you could use to route back to QueryESHttp to try again. In this case I believe you'd need to "prime" QueryESHttp with an initial flow file.
Created 06-01-2018 03:05 PM
Not sure I understand the MonitorActivity part, I would still need the flowfile-content attached so I can't use other flowfiles or split flowfiles. I mean I think I am literally stuck now waiting for 1.7. FetchElastic has no "target" and QueryElastic just seems to disappear the flowfile when no hits return. I have no way of keeping the flowfile-content binary, while I query and update its metadata. It doesn't pass empty attributes for the flowfile so I don't get to keep the flowfile. I need a way perhaps to pull out x amount of files and confirm that x amount of files were recorded, if less than x, then redo that batch of files.
Created 06-01-2018 06:28 PM
MonitorActivity is kind of an "inverse" processor, it does work when nothing's happened. So in your case, MonitorActivity downstream from QueryES would actually generate a flow file when none has been generated from QueryES. This in a sense emulates the behavior of NIFI-3576, by emitting a flow file where there are no query results (after X time has passed, not after the query is complete). Your original question was about empty results, I don't think this would apply when you get Y results but expect Z.
Created 06-04-2018 03:30 PM
I agree with you but one problem is I am ingesting and querying millions of files so I doubt MonitorActivity will be able to keep track and also I don't find "run-schedule" option in properties of any ES/HBase processor to be accurate. E.g. Changing the run-schedule for HBase doesn't actually slow down the ingest much at all probably due to batch sizes.