Support Questions

trz · ‎06-30-2016

Hello all, I have been struggling with the tasking of querying Elasticsearch Search API using Apache Nifi Processes for a days now and think I am close but need help. My goal is to be able send a query such as:

'{"query" : { "match_all" : {}}}'

which just returns all documents in an index in json format but the curl version to do this is: curl -XGET http://localhost:9200/tweet_library/_search?size=10000 -d '{"query" : { "match_all" : {} }}' so as you can see it uses the -d operation to send json. Right now I have GenerateFlowFile Process which just has default values except the file size which is 0 B. I assume I need this from reading other threads but in complete honesty I do not know what this process does or why I need it (I just know I didn't get a response without it). I originally then had a ReplaceText process connected to the FlowFile which was configured with default values except the replacement value was my query. This was then connected to the InvokeHTTP Process where my remote URL is: http://localhost:9200/tweet_library/_search?size=10000 with everything else the default value. Finally my response from Invoke is routed to a PutFile process. This setup did not work, I received somewhere around 84,000 files (there are only 10,000 documents in the index) and each file had a ton of copies of the same document in each file. So from there I didn't really know what the ReplaceText process was doing so I took it out and replaced it with an UpdateAttribute Process. Here I added to properties:

mime.type = application/json and query = ?size=10000size=10000 -d '{"query" : { "match_all" : {} }}'.

So then then I connected this to my InvokeHTTP process and changed my remote URL to: http://localhost:9200/tweet_library/_search?${query} this gave me multiple files with one document in each and every document was the same. All I am trying to do is get all the documents from one index from Elasticsearch that matches the query I put in (in this case all documents) and output the results to one json file. Where am I going wrong/what do I need to change? Please any help is appreciated.

mburgess · ‎06-30-2016

In curl the -d means to put the data into the request body. The InvokeHttp processor does not send the contents of the flow file for GET requests, only PUT or POST. However the Elasticsearch Search/Query API accepts GET, so this approach probably won't work.

What you may be looking for is the URL Search API, I commented on that in another thread, but will post here too. Using this method, you can put your query in the URL itself. Note that the query parameters look a bit different because it's not JSON, they are HTTP query parameters. In your example you are matching all documents (which is the default I believe) so http://localhost:9200/tweet_library/_search?size=10000 should be all you need for that case. To explicitly match all documents, you can use the q parameter:

http://localhost:9200/tweet_library/_search?size=10000&q=*:*

There are quite a few query options available with the URL Search API, please see the Elasticsearch documentation for more information.

View solution in original post

mburgess · ‎06-30-2016

In curl the -d means to put the data into the request body. The InvokeHttp processor does not send the contents of the flow file for GET requests, only PUT or POST. However the Elasticsearch Search/Query API accepts GET, so this approach probably won't work.

What you may be looking for is the URL Search API, I commented on that in another thread, but will post here too. Using this method, you can put your query in the URL itself. Note that the query parameters look a bit different because it's not JSON, they are HTTP query parameters. In your example you are matching all documents (which is the default I believe) so http://localhost:9200/tweet_library/_search?size=10000 should be all you need for that case. To explicitly match all documents, you can use the q parameter:

http://localhost:9200/tweet_library/_search?size=10000&q=*:*

There are quite a few query options available with the URL Search API, please see the Elasticsearch documentation for more information.

trz · ‎06-30-2016

Ah I understand now, thanks. Is it weird that I was getting a response at all using the other methods I was attempting? Not important just curious.

Cloudera Community

Support Questions

Using InvokeHTTP, FlowFile, and ReplaceText from Nifi to Query Elasticsearch Search API

Using NiFi GetTwitter, UpdateAttributes and Replac...

Querying Data Provenance using FlowFile Attribute ...

Nifi ReplaceText, escapeCsv does not return anythi...

Change NiFi Flow Using Rest API - Part 1

How to access Ozone file system using Java API

Creating a Kibana dashboard of Twitter data pushed...

NiFi JoltTransformJSON 2.0.0 trasform JSON flowfil...

Import Json Definition to a target nifi system usi...

Replace ConsumeASB Processor with the InvokeHttp P...

How to use NiFi to write API data to CDP CDW