Created 09-01-2017 10:02 AM
I want to run a sql query with joins incrementally i.e, to get the fresh result based on some max value columns as in query database table processor.
how can i use executesql processor to do this ? executesql processor does not store state. how to provide max value columns to achieve this ?
Created 09-01-2017 03:02 PM
For every QueryDatabase Processor result Flow Files having attributes with the max value columns that you specified in the Processor.
Example:-
To understand better, i have included all the configurations of processors in screenshots.
1. In my case i am having START_DATE_TIME,END_DATE_TIME are the Maximum-value Columns.
2. So the results flow file from QueryDatabase Processor will have attributes maxvalue.end_date_time,maxvalue.start_date_time.
3. After storing the results of QueryDatabase Processor in HDFS, etc..., Connect the success relationship to trigger ExecuteSql Processor.
4. You can call those attributes in your ExecuteSql join statement to run incrementally.
executesql-statement.png
Created 09-04-2017 05:08 AM
Thank you. I am not getting the 3rd step, why to store querydatabase table result in HDFS etc.. in order to executesql. I just care about the result of executesql. I dont want to store anything before that. Hope I am clear enough.
Created on 09-04-2017 05:09 PM - edited 08-17-2019 05:38 PM
@seshachalam malisetti got it, you don't have to store results. if you are caring about only maxvalue columns,the QueryDatabase processor success relation flow file should have the max value column attributes associated to them.
in your case use QueryDatabase to fetch the results incrementally and use ExecuteSql to run join incrementally by using maxvalue.*** attributes... i.e step1 and step4.
Created 09-04-2017 05:18 PM
Hi @yash.
This works but we will be fetching data twice, Once from querydatabase table and from executesql processor.
I wish there is a straight forward way to do this.
Thanks
Created on 09-05-2017 04:58 PM - edited 08-17-2019 05:38 PM
but we can limit fetching the data twice by running query in execute sql processor with all the max value columns in it.
1.in my case i am having START_DATE_TIME,END_DATE_TIME are the Maximum-value Columns, so i have used them inside my execute sql processor sql statement.select max(START_DATE_TIME) as MAX_START_DATE_TIME,max(END_DATE_TIME) as MAX_END_DATE_TIME from test
2. Result flowfile from executesql will be in avro format use ConvertAvrotoJson processor with default properties.
3. Extract the attributes from json flowfile by using EvaluateJsonPath processor and change Destination property to flowfile-attribute, it will results the extracted attributes values will be part of flow file attributes.
using the properties mentioned in the screenshot below:-
4. Use those max value attributes associated with the flowfile to run your join incrementally.
Flow Screenshot:-