Support Questions

Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Data Analytics (CDA) group hub.

How fetch a sql join result incrementally ?

New Contributor

I want to run a sql query with joins incrementally i.e, to get the fresh result based on some max value columns as in query database table processor.

how can i use executesql processor to do this ? executesql processor does not store state. how to provide max value columns to achieve this ?

5 REPLIES 5

Super Guru

Hi @seshachalam malisetti,

For every QueryDatabase Processor result Flow Files having attributes with the max value columns that you specified in the Processor.

Example:-

To understand better, i have included all the configurations of processors in screenshots.

1. In my case i am having START_DATE_TIME,END_DATE_TIME are the Maximum-value Columns.

querydatabase-properties.png

2. So the results flow file from QueryDatabase Processor will have attributes maxvalue.end_date_time,maxvalue.start_date_time.

flowfile-attributes.png

3. After storing the results of QueryDatabase Processor in HDFS, etc..., Connect the success relationship to trigger ExecuteSql Processor.

4. You can call those attributes in your ExecuteSql join statement to run incrementally.
executesql-statement.png

New Contributor

Thank you. I am not getting the 3rd step, why to store querydatabase table result in HDFS etc.. in order to executesql. I just care about the result of executesql. I dont want to store anything before that. Hope I am clear enough.

Super Guru

@seshachalam malisetti got it, you don't have to store results. if you are caring about only maxvalue columns,the QueryDatabase processor success relation flow file should have the max value column attributes associated to them.

in your case use QueryDatabase to fetch the results incrementally and use ExecuteSql to run join incrementally by using maxvalue.*** attributes... i.e step1 and step4.

38512-query-execute.png

New Contributor

Hi @yash.

This works but we will be fetching data twice, Once from querydatabase table and from executesql processor.

I wish there is a straight forward way to do this.

Thanks

Super Guru
@seshachalam malisetti, i'm not sure about the straight forward way to do this,

but we can limit fetching the data twice by running query in execute sql processor with all the max value columns in it.

1.in my case i am having START_DATE_TIME,END_DATE_TIME are the Maximum-value Columns, so i have used them inside my execute sql processor sql statement.
select max(START_DATE_TIME) as  MAX_START_DATE_TIME,max(END_DATE_TIME) as  MAX_END_DATE_TIME from test

38523-max-value-sql-stmt.png

2. Result flowfile from executesql will be in avro format use ConvertAvrotoJson processor with default properties.

3. Extract the attributes from json flowfile by using EvaluateJsonPath processor and change Destination property to flowfile-attribute, it will results the extracted attributes values will be part of flow file attributes.

using the properties mentioned in the screenshot below:-

38524-evaluatejsonpath.png

4. Use those max value attributes associated with the flowfile to run your join incrementally.

38526-flowfile-attributes.png

Flow Screenshot:-

38527-flow.png


maxvalue-sql-stmt.png
Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.