Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

How to update per query execution time in table?

avatar
Contributor

Hello,

I have Designed the bellow data flow

1 - -> 2 - ConvertAvroToJSON -> 3 - splitJSon -> 4 - EvaluateJsonPath -> 5 - UpdateAttribute -> 6 ExecuteSQL -> 7 - replaceText -> 8- putSQL.

1 - 2 - 3 : query your table and split by row. You will have one flowfile by row. Use $ for the JsonPathExpression in the split processor.

4 - Retrieve the query_id and the query_execute in two attributes of each flowfile (values of your properties are $.query_id and $.query_execute)

5 - Create a new attribute query_startTime with the value ${now():format('HH:mm:ss')

6 - Execute the query ${query_execute}

7 - Write your insert query : INSERT INTO queryExecutionInfo VALUES (${query_id}, ${query_startTime},{${now():format('HH:mm:ss'));

8 - Execute your insert

problem:

In 5th step query_startTime for all query is same

ExecuteSQL processor is required some time to execute the query.
after the first query execute the value are inserted into queryExecutionInfo table sucessfully.
but when second query execution completed the query_startTime same as first query_startTime and end time is different.
I have to insert second query_startTime as first query_endTime and vice versa.

Can someone point me at an example to get me going

1 ACCEPTED SOLUTION

avatar
Master Guru

Step 3 (SplitJson) is kind of a "barrier" processor, it will send out all flow files after they have been split (in order to add fragment.count to each flow file). Then steps 4&5 must be executing so quickly that each flow file gets the same query_startTime value. It sounds like you'd like to set query_startTime just as ExecuteSQL is about to execute the statement. Unfortunately I don't believe this is possible with ExecuteSQL (although please feel free to write a Jira to add this improvement). If you are comfortable with a scripting solution such as Groovy, check out my blog post on how to do SQL using Groovy using the ExecuteScript processor in NiFi. You could modify that to add the query_startTime attribute just before calling the sql.rows() statement in the script

View solution in original post

4 REPLIES 4

avatar
Master Guru

Step 3 (SplitJson) is kind of a "barrier" processor, it will send out all flow files after they have been split (in order to add fragment.count to each flow file). Then steps 4&5 must be executing so quickly that each flow file gets the same query_startTime value. It sounds like you'd like to set query_startTime just as ExecuteSQL is about to execute the statement. Unfortunately I don't believe this is possible with ExecuteSQL (although please feel free to write a Jira to add this improvement). If you are comfortable with a scripting solution such as Groovy, check out my blog post on how to do SQL using Groovy using the ExecuteScript processor in NiFi. You could modify that to add the query_startTime attribute just before calling the sql.rows() statement in the script

avatar
Contributor

thanks Matt for the reply,

can I used the attribute to store the query_startTime= query_endTime after the first query execution completed.

can I insert the attribute query_startTime and query_endTime after the second query execution and vice versa.

Please kindly guide me for the same.

avatar
Contributor

Hi Matt,

I read your blog, I understand your groovy script.

but in my situation which processor I use,what is sequence of processor when i used the Executescript processor.

what is the changes in the script that provided in blog.

avatar
Master Guru

Since you're using a script, you could feasibly replace steps 4-7 above, but my blog post (plus adding your attribute) really only covers 5&6