Created 11-06-2024 09:03 PM
Hello,
I'm triyng to sort record in parquet file like below in order of timestamp.
{"Src_obj__event_metadata":"{\"timestamp\":\"2024-11-01T00:23:58.440995\",\"severity\":\"Info\"}","Src_obj__user_data":"{\"message\":\"Message AAA\"} {"Src_obj__event_metadata":"{\"timestamp\":\"2024-11-01T00:23:58.429579\",\"severity\":\"Info\"}","Src_obj__user_data":"{\"message\":\"Message BBB\"} {"Src_obj__event_metadata":"{\"timestamp\":\"2024-11-01T00:23:08.441709\",\"severity\":\"Info\"}","Src_obj__user_data":"{\"message\":\"Message CCC\"} {"Src_obj__event_metadata":"{\"timestamp\":\"2024-11-01T00:23:08.428501\",\"severity\":\"Info\"}","Src_obj__user_data":"{\"message\":\"Message DDD\"} {"Src_obj__event_metadata":"{\"timestamp\":\"2024-11-01T00:23:48.440624\",\"severity\":\"Info\"}","Src_obj__user_data":"{\"message\":\"Message EEE\"} |
I confugured the following query in QueryRecord processor and it passed validation in the processor.
SELECT * from flowfile ORDER BY JSON_EXTRACT(Src_obj__event_metadata, "$.timestamp") ASC |
But when I run it, it failed with the following error.
It seems to fail to find timestamp field.
Could someone please point out what is wrong with my query?
Thanks.
Created 11-06-2024 09:32 PM
Welcome to our community! To help you get the best possible answer, I have tagged our NiFi experts @MattWho @ckumar @SAMSAL who may be able to assist you further.
Please feel free to provide any additional information or details about your query. We hope that you will find a satisfactory solution to your question.
Regards,
Vidya Sargur,Created 11-06-2024 09:56 PM
The field Src_obj__event_metadata is a JSON string, so to access fields within it, you might need to parse it into a JSON object first. Some systems may require you to explicitly parse JSON strings before extracting fields.
Please try:
SELECT *
FROM flowfile
ORDER BY CAST(JSON_EXTRACT(Src_obj__event_metadata, "$.timestamp") AS TIMESTAMP) ASC
Created 11-06-2024 10:22 PM
@ywu
Thank you for the advice.
I tried with the query you suggested but the situation has not changed.
Thanks.
Created 11-06-2024 10:27 PM
If CAST and JSON_PARSE functions are not supported in the Nifi processor you're using, we may try extracting the timestamp value as a string and sorting alphabetically
SELECT *
FROM flowfile
ORDER BY JSON_EXTRACT_SCALAR(Src_obj__event_metadata, "$.timestamp") ASC
Created 11-06-2024 11:02 PM
@ywu
Thank you for your prompto reply.
I tried with that query but unfortunately the same result.
Both of 2 queries you provided passed the validation in the processor.
Thanks.
Created 11-07-2024 10:06 PM
JSON_EXTRACT function in the QueryRecord processor may not be interpreting Src_obj__event_metadata as a JSON object. Instead, it likely sees Src_obj__event_metadata as a plain string, so it cannot directly access the "$.timestamp" field.
We may need to use EvaluateJsonPath processor first
to extract timestamp from Src_obj__event_metadata into a new attribute:
• Destination: flowfile-content
•Return Type: json
•JSON Path Expression: Use the following configuration in the Properties tab:
Property Value
timestamp $.Src_obj__event_metadata.timestamp
Once we extracted timestamp as a separate column, then we could call it directly in QueryRecord processor:
SELECT *
FROM flowfile
ORDER BY timestamp ASC
Created 11-11-2024 06:57 AM
Hello @ywu ,
Thank you for your advice.
Yes, as you noted I don't think JSON_EXTRACT is working as expected in QueryRecord processor.
I converted my parquet file to json format and tried with EvaluateJsonPath processor as you advised.
When I set $.Src_obj__event_metadata.timestamp as timestamp, EvaluateJsonPath processor terminated by unmatched relationship.
When I changed timestamp value to .Src_obj__event_metadata.timestamp, only [] was genarated as output.
[]
When I changed the value to .Src_obj__event_metadata, the following output was genarated.
["{\"timestamp\":\"2024-11-01T00:23:58.440995\",\"severity\":\"Info\"}","{\"timestamp\":\"2024-11-01T00:23:58.429579\",\"severity\":\"Info\"}","{\"timestamp\":\"2024-11-01T00:23:08.441709\",\"severity\":\"Info\"}","{\"timestamp\":\"2024-11-01T00:23:08.428501\",\"severity\":\"Info\"}","{\"timestamp\":\"2024-11-01T00:23:48.440624\",\"severity\":\"Info\"}"]
From these results, it seems we need another consideration to correctly specify timestamp field.
Do you have any idea or insights?
Regards,