About trz

trz · ‎11-08-2017

I am trying to use two different processors to query an Amazon Aurora cluster. I have created an Amazon Aurora Database cluster running MySQL with three instances: the main instance that backs the cluster and two read replicas for balancing. However, the cluster does not seem to be balancing the reads at all and I believe this is because of the Connection Pool NiFi uses. The issue is that Aurora scales based on new incoming database connections but by using NiFi's connection pool the connections made to the database are never new because they are just freed after a query or update is executed and then reused. I know there is a parameter in Apache commons DBCP connection pool that is a “maxConnLifetime” setting that defaults to infinite. I think I want to find a way to set it to 10 seconds or something so I can keep making new connections to the database instead of reusing them. Does anyone have ideas on how to go about do this or any other ideas on how to get around this issue to make use of the scaling Aurora provides? Any help is appreciated.

trz · ‎07-28-2016

Thank you sir! I will give it a shot.

trz · ‎07-27-2016

I have written a custom NiFi process for use with elasticsearch and I have tested it thoroughly outside of the DataFlow environment and it works perfect but when I drop it into the flow it throws an error message that is of no help so I was wondering if there is a way to debug the code from within the flow or produce a better error message. Any advice/help is appreciated, thanks.

trz · ‎06-30-2016

Ah I understand now, thanks. Is it weird that I was getting a response at all using the other methods I was attempting? Not important just curious.

trz · ‎06-30-2016

Hello all, I have been struggling with the tasking of querying Elasticsearch Search API using Apache Nifi Processes for a days now and think I am close but need help. My goal is to be able send a query such as: '{"query" : { "match_all" : {}}}' which just returns all documents in an index in json format but the curl version to do this is: curl -XGET http://localhost:9200/tweet_library/_search?size=10000 -d '{"query" : { "match_all" : {} }}' so as you can see it uses the -d operation to send json. Right now I have GenerateFlowFile Process which just has default values except the file size which is 0 B. I assume I need this from reading other threads but in complete honesty I do not know what this process does or why I need it (I just know I didn't get a response without it). I originally then had a ReplaceText process connected to the FlowFile which was configured with default values except the replacement value was my query. This was then connected to the InvokeHTTP Process where my remote URL is: http://localhost:9200/tweet_library/_search?size=10000 with everything else the default value. Finally my response from Invoke is routed to a PutFile process. This setup did not work, I received somewhere around 84,000 files (there are only 10,000 documents in the index) and each file had a ton of copies of the same document in each file. So from there I didn't really know what the ReplaceText process was doing so I took it out and replaced it with an UpdateAttribute Process. Here I added to properties: mime.type = application/json and query = ?size=10000size=10000 -d '{"query" : { "match_all" : {} }}'. So then then I connected this to my InvokeHTTP process and changed my remote URL to: http://localhost:9200/tweet_library/_search?${query} this gave me multiple files with one document in each and every document was the same. All I am trying to do is get all the documents from one index from Elasticsearch that matches the query I put in (in this case all documents) and output the results to one json file. Where am I going wrong/what do I need to change? Please any help is appreciated.

trz · ‎06-29-2016

Thanks for all your help so far Matt. I had to use the first option as I am going to have more complicated queries than getting all documents so I'm hoping I can just replace the match all query in my ReplaceText processor with whatever I need to query. The match all query seems to be working though but I am getting multiple copies of documents, is this because of the GenerateFlowFile Processor? I looked up the documentation for it and I'm not quite sure what it is doing. I do not want it to go through more than once. Edit: It looks like I just got 84,000 files that are all the same documents and there's only 10,000 total in index. Do I need an UpdateAttribute process?

trz · ‎06-29-2016

@Matt Burgess Is there example of how to use the the InvokeHTTP processor to do this? I am attempting to this but having issues with the attributes to send property. The way I currently have it set up is I have an InvokeHTTP process simply connected to a PutFile process on original and response relationships and the rest (failure, no retry, and retry) auto terminating. My InvokeHTTP properties are all the default values except: HTTP Method: GET Remote URL: http://localhost:9200 Now the I am just trying to get all the documents from a particular index called "tweet_library" and the query to do that using a curl command is: <a href="http://localhost:9200/[your_index_name]/_search">http://localhost:9200/[your_index_name]/_search { "size": [your value] //default 10 "from": [your start index] //default 0 "query": { "match_all": {} } } So I thought the right idea would be to just place tweet_library/_search{"size": 10000 "query":{"match_all": {}}} (I left off "from" because I want the default value) inside attributes to send but when I try to do this I get an error saying that "Attributes to Send validated against tweet_library/_search{ "size": 10000 "query": { "match_all": {} }} is invalid because Not a valid Java Regular Expression." Could you please point me in a direction of a solution or possibly provide some help? Thank you. But I also know that this curl command gives me what I want: curl -XGET http://localhost:9200/tweet_library/_search?size=10000 -d '{"query" : { "match_all" : {} }}' so I tried putting tweet_library/_search?size=10000 -d '{"query" : { "match_all" : {}}}' in my Attributes to Send and got the same warning/error.

trz · ‎06-27-2016

I am trying to create an Apache Nifi model that allows me to read in all my data from Elasticsearch and store it to a file. I have everything connected correctly but the issue I am having is the FetchElasticsearch process is requiring a document identifier (as it should) but I want to get every single item in the index it is searching not just a document with ID 1 for example. I know that Nifi and the process property supports expression language so I have tried simply using a regex expression that should match all characters which should just be: ${'*'} but I get warning when I did this because the process actually looks for the literal document id that is * which of course does not exist. Below are screenshots so hopefully it can help with understanding my issue. I am searching localhost:9300/tweet_library/tweet/(regex expression) so I want all of the documents in tweet_library. Any help is appreciated, thanks.

Online	Offline
Last Visited	‎03-01-2018 08:56 PM

Member Since	‎06-27-2016 06:26 PM
Last Visited	‎03-01-2018 08:56 PM
Posts	9
Kudos received	1

Cloudera Community

Apache NiFi Database Connection Pool with Amazon A...

Re: Is there a way to debug a custom NiFi process ...

Is there a way to debug a custom NiFi process that...

Re: Using InvokeHTTP, FlowFile, and ReplaceText fr...

Using InvokeHTTP, FlowFile, and ReplaceText from N...

Re: How does the FetchElasticSearch Processor Work...

Re: How does the FetchElasticSearch Processor Work...

How to get all values with expression language in ...