Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Problem with timing Elastic Queries with Nifi

avatar
Contributor

I have a situation where I query Elasticsearch using "QueryElasticsearchHttp" to match a query then return results of the query to populate flowfile-attributes. However, if the query returns empty results it doesn't go to "failure queue". I need the results but the results are inconsistent, they may not be ready to be queried. I want it to fail so that I can "Retry at a later time" using ControlRate. However, whenever QueryElasticsearchHttp fails to find the document, it just disappears, no failure, no success queue file. The flowfile disappears.

I CANNOT use FetchElasticsearchHttp because it has no property: "Target: flowfile-attributes" ( I need the flowfile content that is being queried )

I am really stuck now, because my results in elasticsearch are populated by a different nifi flow process, so it may not be ready to be "Queried from Elastic". That's a timing issue. I want it to try querying again later.

1 ACCEPTED SOLUTION

avatar
Master Guru

You are running into NIFI-3576, this will be included in the next release of NiFi (1.7.0). As a workaround, you could try a MonitorActivity processor after QueryESHttp, if its Threshold Duration exceeds the Run Schedule of QueryESHttp (plus any time it would take to complete the query), then it would issue a flow file you could use to route back to QueryESHttp to try again. In this case I believe you'd need to "prime" QueryESHttp with an initial flow file.

View solution in original post

4 REPLIES 4

avatar
Master Guru

You are running into NIFI-3576, this will be included in the next release of NiFi (1.7.0). As a workaround, you could try a MonitorActivity processor after QueryESHttp, if its Threshold Duration exceeds the Run Schedule of QueryESHttp (plus any time it would take to complete the query), then it would issue a flow file you could use to route back to QueryESHttp to try again. In this case I believe you'd need to "prime" QueryESHttp with an initial flow file.

avatar
Contributor

Not sure I understand the MonitorActivity part, I would still need the flowfile-content attached so I can't use other flowfiles or split flowfiles. I mean I think I am literally stuck now waiting for 1.7. FetchElastic has no "target" and QueryElastic just seems to disappear the flowfile when no hits return. I have no way of keeping the flowfile-content binary, while I query and update its metadata. It doesn't pass empty attributes for the flowfile so I don't get to keep the flowfile. I need a way perhaps to pull out x amount of files and confirm that x amount of files were recorded, if less than x, then redo that batch of files.

avatar
Master Guru

MonitorActivity is kind of an "inverse" processor, it does work when nothing's happened. So in your case, MonitorActivity downstream from QueryES would actually generate a flow file when none has been generated from QueryES. This in a sense emulates the behavior of NIFI-3576, by emitting a flow file where there are no query results (after X time has passed, not after the query is complete). Your original question was about empty results, I don't think this would apply when you get Y results but expect Z.

avatar
Contributor

I agree with you but one problem is I am ingesting and querying millions of files so I doubt MonitorActivity will be able to keep track and also I don't find "run-schedule" option in properties of any ES/HBase processor to be accurate. E.g. Changing the run-schedule for HBase doesn't actually slow down the ingest much at all probably due to batch sizes.