Support Questions

Find answers, ask questions, and share your expertise

How can I sort record in parquet file?

avatar
Explorer

Hello,

I'm triyng to sort record in parquet file like below in order of timestamp.

{"Src_obj__event_metadata":"{\"timestamp\":\"2024-11-01T00:23:58.440995\",\"severity\":\"Info\"}","Src_obj__user_data":"{\"message\":\"Message AAA\"}
{"Src_obj__event_metadata":"{\"timestamp\":\"2024-11-01T00:23:58.429579\",\"severity\":\"Info\"}","Src_obj__user_data":"{\"message\":\"Message BBB\"}
{"Src_obj__event_metadata":"{\"timestamp\":\"2024-11-01T00:23:08.441709\",\"severity\":\"Info\"}","Src_obj__user_data":"{\"message\":\"Message CCC\"}
{"Src_obj__event_metadata":"{\"timestamp\":\"2024-11-01T00:23:08.428501\",\"severity\":\"Info\"}","Src_obj__user_data":"{\"message\":\"Message DDD\"}
{"Src_obj__event_metadata":"{\"timestamp\":\"2024-11-01T00:23:48.440624\",\"severity\":\"Info\"}","Src_obj__user_data":"{\"message\":\"Message EEE\"}

I confugured the following query in QueryRecord processor and it passed validation in the processor.

SELECT * from flowfile ORDER BY JSON_EXTRACT(Src_obj__event_metadata, "$.timestamp") ASC

But when I run it, it failed with the following error.
It seems to fail to find timestamp field.

errorerror

Could someone please point out what is wrong with my query?

Thanks.

1 ACCEPTED SOLUTION

avatar
Expert Contributor

JSON_EXTRACT function in the QueryRecord processor may not be interpreting Src_obj__event_metadata as a JSON object. Instead, it likely sees Src_obj__event_metadata as a plain string, so it cannot directly access the "$.timestamp" field. 

We may need to use EvaluateJsonPath processor first 

to extract timestamp from Src_obj__event_metadata into a new attribute:

Destination: flowfile-content

Return Type: json

JSON Path Expression: Use the following configuration in the Properties tab:

Property   Value

timestamp   $.Src_obj__event_metadata.timestamp

Once we extracted timestamp as a separate column, then we could call it directly in QueryRecord processor:

SELECT *
FROM flowfile
ORDER BY timestamp ASC

View solution in original post

10 REPLIES 10

avatar
Community Manager

@tono425, Thank you for your participation in the Cloudera Community. I'm happy to see you resolved your issue. Please mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future. 



Regards,

Vidya Sargur,
Community Manager


Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Learn more about the Cloudera Community: