Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Apache NiFi QueryRecord Get record based on max value between two columns

avatar
Contributor

Hi community,

hope someone can help me on below.

 

My Flow File content is something like this:

 

 

[ {
"version" : "5",
"shipment_number" : "2022065597",
"situation_code" : "MLV",
"justification_code" : "CFM",
"situation_date" : "202206071253",
"new_appointment_date" : "202301081253",
}, {
"version" : "5",
"shipment_number" : "2022065597",
"situation_code" : "MLV",
"justification_code" : "CFM",
"situation_date" : "202206071252",
"new_appointment_date" : "",
} ]

 

 

I need to add a Query Record processor in order to grab only one JSON record for each Flow File.

The condition is: Get a single result with highest value of situation_date or new_appointment_date.

 

Just to explain better, the logic behind should be the following:

Grab the max of situation_date, the max of new_appointment_date and compare each other, then take the highest of the comparison and get entire JSON record.

 

In MYSQL or T-SQL it could be easier, but if i understand well Query Record is based on Apache Calcite which is not so easy for me.

 

Does anyone can point mee on the right direction?

 

Appreciate

 

1 ACCEPTED SOLUTION

avatar

Hi ,

Not sure if there is better way, but in my case I was able to get the result using two QueryRecord Processors:

1-   Query everything with new field  (lets call max_Date) to assign the max between "situation_date" and "new_appointment_date" on each record, in this case the query will be like this:

 

SELECT *, case when situation_date> new_appointment_date then situation_date else new_appointment_date end maxDate
FROM FLOWFILE
            

 

2- Next QueryRecord will basically capture the max based on the max_date above from each flowfile records, as follows:

 

 

    SELECT *
            FROM FLOWFILE
            WHERE max_date = (
                SELECT MAX(max_date) from FLOWFILE
				
            )

 

 

Not sure if this can be done in more efficient way, If anyone can think of better way please advise, otherwise if this works for you please accept solution.

Thanks

View solution in original post

2 REPLIES 2

avatar

Hi ,

Not sure if there is better way, but in my case I was able to get the result using two QueryRecord Processors:

1-   Query everything with new field  (lets call max_Date) to assign the max between "situation_date" and "new_appointment_date" on each record, in this case the query will be like this:

 

SELECT *, case when situation_date> new_appointment_date then situation_date else new_appointment_date end maxDate
FROM FLOWFILE
            

 

2- Next QueryRecord will basically capture the max based on the max_date above from each flowfile records, as follows:

 

 

    SELECT *
            FROM FLOWFILE
            WHERE max_date = (
                SELECT MAX(max_date) from FLOWFILE
				
            )

 

 

Not sure if this can be done in more efficient way, If anyone can think of better way please advise, otherwise if this works for you please accept solution.

Thanks

avatar
Contributor

@SAMSALthank you for the solution provided.

Testing the solution provided i was aware the sometimes the fields vaues(string) are different.

Sometime are in the form 20220807091252 which includes also hours and minutes, sometime 202209080.

So i need to have a conversion process before in order to convert into datetime.

 

Am i right? Do you came across this?

Many thanks