Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Using InvokeHTTP, FlowFile, and ReplaceText from Nifi to Query Elasticsearch Search API

avatar
Explorer

Hello all, I have been struggling with the tasking of querying Elasticsearch Search API using Apache Nifi Processes for a days now and think I am close but need help. My goal is to be able send a query such as:

'{"query" : { "match_all" : {}}}'

which just returns all documents in an index in json format but the curl version to do this is: curl -XGET http://localhost:9200/tweet_library/_search?size=10000 -d '{"query" : { "match_all" : {} }}' so as you can see it uses the -d operation to send json. Right now I have GenerateFlowFile Process which just has default values except the file size which is 0 B. I assume I need this from reading other threads but in complete honesty I do not know what this process does or why I need it (I just know I didn't get a response without it). I originally then had a ReplaceText process connected to the FlowFile which was configured with default values except the replacement value was my query. This was then connected to the InvokeHTTP Process where my remote URL is: http://localhost:9200/tweet_library/_search?size=10000 with everything else the default value. Finally my response from Invoke is routed to a PutFile process. This setup did not work, I received somewhere around 84,000 files (there are only 10,000 documents in the index) and each file had a ton of copies of the same document in each file. So from there I didn't really know what the ReplaceText process was doing so I took it out and replaced it with an UpdateAttribute Process. Here I added to properties:

mime.type = application/json and query = ?size=10000size=10000 -d '{"query" : { "match_all" : {} }}'. 

So then then I connected this to my InvokeHTTP process and changed my remote URL to: http://localhost:9200/tweet_library/_search?${query} this gave me multiple files with one document in each and every document was the same. All I am trying to do is get all the documents from one index from Elasticsearch that matches the query I put in (in this case all documents) and output the results to one json file. Where am I going wrong/what do I need to change? Please any help is appreciated.

1 ACCEPTED SOLUTION

avatar
Master Guru

In curl the -d means to put the data into the request body. The InvokeHttp processor does not send the contents of the flow file for GET requests, only PUT or POST. However the Elasticsearch Search/Query API accepts GET, so this approach probably won't work.

What you may be looking for is the URL Search API, I commented on that in another thread, but will post here too. Using this method, you can put your query in the URL itself. Note that the query parameters look a bit different because it's not JSON, they are HTTP query parameters. In your example you are matching all documents (which is the default I believe) so http://localhost:9200/tweet_library/_search?size=10000 should be all you need for that case. To explicitly match all documents, you can use the q parameter:

http://localhost:9200/tweet_library/_search?size=10000&q=*:*

There are quite a few query options available with the URL Search API, please see the Elasticsearch documentation for more information.

View solution in original post

2 REPLIES 2

avatar
Master Guru

In curl the -d means to put the data into the request body. The InvokeHttp processor does not send the contents of the flow file for GET requests, only PUT or POST. However the Elasticsearch Search/Query API accepts GET, so this approach probably won't work.

What you may be looking for is the URL Search API, I commented on that in another thread, but will post here too. Using this method, you can put your query in the URL itself. Note that the query parameters look a bit different because it's not JSON, they are HTTP query parameters. In your example you are matching all documents (which is the default I believe) so http://localhost:9200/tweet_library/_search?size=10000 should be all you need for that case. To explicitly match all documents, you can use the q parameter:

http://localhost:9200/tweet_library/_search?size=10000&q=*:*

There are quite a few query options available with the URL Search API, please see the Elasticsearch documentation for more information.

avatar
Explorer

Ah I understand now, thanks. Is it weird that I was getting a response at all using the other methods I was attempting? Not important just curious.