Support Questions

waghmithu · ‎08-22-2017

Hello,

I have Designed the bellow data flow

1 - -> 2 - ConvertAvroToJSON -> 3 - splitJSon -> 4 - EvaluateJsonPath -> 5 - UpdateAttribute -> 6 ExecuteSQL -> 7 - replaceText -> 8- putSQL.

1 - 2 - 3 : query your table and split by row. You will have one flowfile by row. Use $ for the JsonPathExpression in the split processor.

4 - Retrieve the query_id and the query_execute in two attributes of each flowfile (values of your properties are $.query_id and $.query_execute)

5 - Create a new attribute query_startTime with the value ${now():format('HH:mm:ss')

6 - Execute the query ${query_execute}

7 - Write your insert query : INSERT INTO queryExecutionInfo VALUES (${query_id}, ${query_startTime},{${now():format('HH:mm:ss'));

8 - Execute your insert

problem:

In 5th step query_startTime for all query is same

ExecuteSQL processor is required some time to execute the query.
after the first query execute the value are inserted into queryExecutionInfo table sucessfully.
but when second query execution completed the query_startTime same as first query_startTime and end time is different.
I have to insert second query_startTime as first query_endTime and vice versa.

Can someone point me at an example to get me going

mburgess · ‎08-22-2017

Step 3 (SplitJson) is kind of a "barrier" processor, it will send out all flow files after they have been split (in order to add fragment.count to each flow file). Then steps 4&5 must be executing so quickly that each flow file gets the same query_startTime value. It sounds like you'd like to set query_startTime just as ExecuteSQL is about to execute the statement. Unfortunately I don't believe this is possible with ExecuteSQL (although please feel free to write a Jira to add this improvement). If you are comfortable with a scripting solution such as Groovy, check out my blog post on how to do SQL using Groovy using the ExecuteScript processor in NiFi. You could modify that to add the query_startTime attribute just before calling the sql.rows() statement in the script

View solution in original post

mburgess · ‎08-22-2017

Step 3 (SplitJson) is kind of a "barrier" processor, it will send out all flow files after they have been split (in order to add fragment.count to each flow file). Then steps 4&5 must be executing so quickly that each flow file gets the same query_startTime value. It sounds like you'd like to set query_startTime just as ExecuteSQL is about to execute the statement. Unfortunately I don't believe this is possible with ExecuteSQL (although please feel free to write a Jira to add this improvement). If you are comfortable with a scripting solution such as Groovy, check out my blog post on how to do SQL using Groovy using the ExecuteScript processor in NiFi. You could modify that to add the query_startTime attribute just before calling the sql.rows() statement in the script

waghmithu · ‎08-22-2017

thanks Matt for the reply,

can I used the attribute to store the query_startTime= query_endTime after the first query execution completed.

can I insert the attribute query_startTime and query_endTime after the second query execution and vice versa.

Please kindly guide me for the same.

waghmithu · ‎08-23-2017

Hi Matt,

I read your blog, I understand your groovy script.

but in my situation which processor I use,what is sequence of processor when i used the Executescript processor.

what is the changes in the script that provided in blog.

mburgess · ‎08-24-2017

Since you're using a script, you could feasibly replace steps 4-7 above, but my blog post (plus adding your attribute) really only covers 5&6

Cloudera Community

Support Questions

How to update per query execution time in table?

Hive Query Recovery Tactics: Handling Failures thr...

Query execution time

Hive query execution taking longer time

Why putsql processor do not execute update query f...

updating table

Update Zeppelin JDCB Interpreter To Support Solr S...

Exercise 2 : Long time running + error from interm...

Vectorized query execution for parquet tables

Hive Query against ORC table failing with serious ...

Updating hive table with sqoop from mysql table