Created 11-06-2024 09:03 PM
Hello,
I'm triyng to sort record in parquet file like below in order of timestamp.
{"Src_obj__event_metadata":"{\"timestamp\":\"2024-11-01T00:23:58.440995\",\"severity\":\"Info\"}","Src_obj__user_data":"{\"message\":\"Message AAA\"} {"Src_obj__event_metadata":"{\"timestamp\":\"2024-11-01T00:23:58.429579\",\"severity\":\"Info\"}","Src_obj__user_data":"{\"message\":\"Message BBB\"} {"Src_obj__event_metadata":"{\"timestamp\":\"2024-11-01T00:23:08.441709\",\"severity\":\"Info\"}","Src_obj__user_data":"{\"message\":\"Message CCC\"} {"Src_obj__event_metadata":"{\"timestamp\":\"2024-11-01T00:23:08.428501\",\"severity\":\"Info\"}","Src_obj__user_data":"{\"message\":\"Message DDD\"} {"Src_obj__event_metadata":"{\"timestamp\":\"2024-11-01T00:23:48.440624\",\"severity\":\"Info\"}","Src_obj__user_data":"{\"message\":\"Message EEE\"} |
I confugured the following query in QueryRecord processor and it passed validation in the processor.
SELECT * from flowfile ORDER BY JSON_EXTRACT(Src_obj__event_metadata, "$.timestamp") ASC |
But when I run it, it failed with the following error.
It seems to fail to find timestamp field.
Could someone please point out what is wrong with my query?
Thanks.
Created 11-07-2024 10:06 PM
JSON_EXTRACT function in the QueryRecord processor may not be interpreting Src_obj__event_metadata as a JSON object. Instead, it likely sees Src_obj__event_metadata as a plain string, so it cannot directly access the "$.timestamp" field.
We may need to use EvaluateJsonPath processor first
to extract timestamp from Src_obj__event_metadata into a new attribute:
• Destination: flowfile-content
•Return Type: json
•JSON Path Expression: Use the following configuration in the Properties tab:
Property Value
timestamp $.Src_obj__event_metadata.timestamp
Once we extracted timestamp as a separate column, then we could call it directly in QueryRecord processor:
SELECT *
FROM flowfile
ORDER BY timestamp ASC
Created 11-06-2024 09:32 PM
Welcome to our community! To help you get the best possible answer, I have tagged our NiFi experts @MattWho @ckumar @SAMSAL who may be able to assist you further.
Please feel free to provide any additional information or details about your query. We hope that you will find a satisfactory solution to your question.
Regards,
Vidya Sargur,Created 11-06-2024 09:56 PM
The field Src_obj__event_metadata is a JSON string, so to access fields within it, you might need to parse it into a JSON object first. Some systems may require you to explicitly parse JSON strings before extracting fields.
Please try:
SELECT *
FROM flowfile
ORDER BY CAST(JSON_EXTRACT(Src_obj__event_metadata, "$.timestamp") AS TIMESTAMP) ASC
Created 11-06-2024 10:22 PM
@ywu
Thank you for the advice.
I tried with the query you suggested but the situation has not changed.
Thanks.
Created 11-06-2024 10:27 PM
If CAST and JSON_PARSE functions are not supported in the Nifi processor you're using, we may try extracting the timestamp value as a string and sorting alphabetically
SELECT *
FROM flowfile
ORDER BY JSON_EXTRACT_SCALAR(Src_obj__event_metadata, "$.timestamp") ASC
Created 11-06-2024 11:02 PM
@ywu
Thank you for your prompto reply.
I tried with that query but unfortunately the same result.
Both of 2 queries you provided passed the validation in the processor.
Thanks.
Created 11-07-2024 10:06 PM
JSON_EXTRACT function in the QueryRecord processor may not be interpreting Src_obj__event_metadata as a JSON object. Instead, it likely sees Src_obj__event_metadata as a plain string, so it cannot directly access the "$.timestamp" field.
We may need to use EvaluateJsonPath processor first
to extract timestamp from Src_obj__event_metadata into a new attribute:
• Destination: flowfile-content
•Return Type: json
•JSON Path Expression: Use the following configuration in the Properties tab:
Property Value
timestamp $.Src_obj__event_metadata.timestamp
Once we extracted timestamp as a separate column, then we could call it directly in QueryRecord processor:
SELECT *
FROM flowfile
ORDER BY timestamp ASC
Created 11-11-2024 06:57 AM
Hello @ywu ,
Thank you for your advice.
Yes, as you noted I don't think JSON_EXTRACT is working as expected in QueryRecord processor.
I converted my parquet file to json format and tried with EvaluateJsonPath processor as you advised.
When I set $.Src_obj__event_metadata.timestamp as timestamp, EvaluateJsonPath processor terminated by unmatched relationship.
When I changed timestamp value to .Src_obj__event_metadata.timestamp, only [] was genarated as output.
[]
When I changed the value to .Src_obj__event_metadata, the following output was genarated.
["{\"timestamp\":\"2024-11-01T00:23:58.440995\",\"severity\":\"Info\"}","{\"timestamp\":\"2024-11-01T00:23:58.429579\",\"severity\":\"Info\"}","{\"timestamp\":\"2024-11-01T00:23:08.441709\",\"severity\":\"Info\"}","{\"timestamp\":\"2024-11-01T00:23:08.428501\",\"severity\":\"Info\"}","{\"timestamp\":\"2024-11-01T00:23:48.440624\",\"severity\":\"Info\"}"]
From these results, it seems we need another consideration to correctly specify timestamp field.
Do you have any idea or insights?
Regards,
Created 12-01-2024 10:26 PM
@tono425, I noticed that @ywu's response has helped resolve your issue. If it did, please mark the relevant reply as a solution, as it will help others find the answer more easily in the future.
Regards,
Vidya Sargur,Created 12-04-2024 10:30 PM
Hello,
After trials and errors, I succeeded to sort records by the following steps.
1. Convert flow file to json by ConvertRecord processor
Convert flow file from parquet format to json format so that we can modify it in next step.
2. Format records by ExecuteStreamCommand processor
As double quotation marks outside the {} were preventing us from processing records as json, I removed them and escape characters by sed then sorted records by jq from script.
Sample script
#!/bin/bash
/usr/bin/sed -e 's/\"{/{/g' -e 's/}\"/}/g' -e 's/\\"/"/g' $1 | /usr/bin/jq '. | sort_by(.Src_obj__event_metadata.timestamp)'
3. Convert flow file to parquet by ConvertRecord processor
Convert flow file from json format to parquet format.
@ywu
Your advice was very helpful in resolving the issue.
Thanks a lot.