<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: How can I sort record in parquet file? in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/How-can-I-sort-record-in-parquet-file/m-p/397101#M249694</link>
    <description>&lt;P class="p1"&gt;The field &lt;SPAN class="s1"&gt;Src_obj__event_metadata&lt;/SPAN&gt; is a JSON string, so to access fields within it, you might need to parse it into a JSON object first. Some systems may require you to explicitly parse JSON strings before extracting fields.&lt;/P&gt;&lt;P class="p1"&gt;Please try:&lt;/P&gt;&lt;P class="p1"&gt;SELECT *&lt;BR /&gt;FROM flowfile&lt;BR /&gt;ORDER BY CAST(JSON_EXTRACT(Src_obj__event_metadata, "$.timestamp") AS TIMESTAMP) ASC&lt;/P&gt;</description>
    <pubDate>Thu, 07 Nov 2024 05:56:13 GMT</pubDate>
    <dc:creator>ywu</dc:creator>
    <dc:date>2024-11-07T05:56:13Z</dc:date>
    <item>
      <title>How can I sort record in parquet file?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/How-can-I-sort-record-in-parquet-file/m-p/397097#M249692</link>
      <description>&lt;P&gt;Hello,&lt;/P&gt;&lt;P&gt;I'm triyng to sort record in parquet file like below in order of timestamp.&lt;/P&gt;&lt;TABLE border="1" width="100%"&gt;&lt;TBODY&gt;&lt;TR&gt;&lt;TD width="100%"&gt;{"Src_obj__event_metadata":"{\"timestamp\":\"2024-11-01T00:23:58.440995\",\"severity\":\"Info\"}","Src_obj__user_data":"{\"message\":\"Message AAA\"}&lt;BR /&gt;{"Src_obj__event_metadata":"{\"timestamp\":\"2024-11-01T00:23:58.429579\",\"severity\":\"Info\"}","Src_obj__user_data":"{\"message\":\"Message BBB\"}&lt;BR /&gt;{"Src_obj__event_metadata":"{\"timestamp\":\"2024-11-01T00:23:08.441709\",\"severity\":\"Info\"}","Src_obj__user_data":"{\"message\":\"Message CCC\"}&lt;BR /&gt;{"Src_obj__event_metadata":"{\"timestamp\":\"2024-11-01T00:23:08.428501\",\"severity\":\"Info\"}","Src_obj__user_data":"{\"message\":\"Message DDD\"}&lt;BR /&gt;{"Src_obj__event_metadata":"{\"timestamp\":\"2024-11-01T00:23:48.440624\",\"severity\":\"Info\"}","Src_obj__user_data":"{\"message\":\"Message EEE\"}&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;&lt;P&gt;I confugured the following query in QueryRecord processor and it passed validation in the processor.&lt;/P&gt;&lt;TABLE border="1" width="100%"&gt;&lt;TBODY&gt;&lt;TR&gt;&lt;TD width="100%"&gt;SELECT * from flowfile ORDER BY JSON_EXTRACT(Src_obj__event_metadata, "$.timestamp") ASC&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;&lt;P&gt;But when I run it, it failed with the following error.&lt;BR /&gt;It seems to fail to find timestamp field.&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="error" style="width: 355px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/42506i7823333E0283ECB2/image-size/large?v=v2&amp;amp;px=999" role="button" title="GetImage.png" alt="error" /&gt;&lt;span class="lia-inline-image-caption" onclick="event.preventDefault();"&gt;error&lt;/span&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;Could someone please point out what is wrong with my query?&lt;/P&gt;&lt;P&gt;Thanks.&lt;/P&gt;</description>
      <pubDate>Thu, 07 Nov 2024 05:03:13 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/How-can-I-sort-record-in-parquet-file/m-p/397097#M249692</guid>
      <dc:creator>tono425</dc:creator>
      <dc:date>2024-11-07T05:03:13Z</dc:date>
    </item>
    <item>
      <title>Re: How can I sort record in parquet file?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/How-can-I-sort-record-in-parquet-file/m-p/397099#M249693</link>
      <description>&lt;P&gt;Welcome to our community! To help you get the best possible answer, I have tagged our NiFi experts&amp;nbsp;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/35454"&gt;@MattWho&lt;/a&gt;&amp;nbsp;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/42173"&gt;@ckumar&lt;/a&gt;&amp;nbsp;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/80381"&gt;@SAMSAL&lt;/a&gt;&amp;nbsp;&amp;nbsp;who may be able to assist you further.&lt;BR /&gt;&lt;BR /&gt;Please feel free to provide any additional information or details about your query. We hope that you will find a satisfactory solution to your question.&lt;/P&gt;</description>
      <pubDate>Thu, 07 Nov 2024 05:32:57 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/How-can-I-sort-record-in-parquet-file/m-p/397099#M249693</guid>
      <dc:creator>VidyaSargur</dc:creator>
      <dc:date>2024-11-07T05:32:57Z</dc:date>
    </item>
    <item>
      <title>Re: How can I sort record in parquet file?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/How-can-I-sort-record-in-parquet-file/m-p/397101#M249694</link>
      <description>&lt;P class="p1"&gt;The field &lt;SPAN class="s1"&gt;Src_obj__event_metadata&lt;/SPAN&gt; is a JSON string, so to access fields within it, you might need to parse it into a JSON object first. Some systems may require you to explicitly parse JSON strings before extracting fields.&lt;/P&gt;&lt;P class="p1"&gt;Please try:&lt;/P&gt;&lt;P class="p1"&gt;SELECT *&lt;BR /&gt;FROM flowfile&lt;BR /&gt;ORDER BY CAST(JSON_EXTRACT(Src_obj__event_metadata, "$.timestamp") AS TIMESTAMP) ASC&lt;/P&gt;</description>
      <pubDate>Thu, 07 Nov 2024 05:56:13 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/How-can-I-sort-record-in-parquet-file/m-p/397101#M249694</guid>
      <dc:creator>ywu</dc:creator>
      <dc:date>2024-11-07T05:56:13Z</dc:date>
    </item>
    <item>
      <title>Re: How can I sort record in parquet file?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/How-can-I-sort-record-in-parquet-file/m-p/397108#M249695</link>
      <description>&lt;P&gt;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/83812"&gt;@ywu&lt;/a&gt;&amp;nbsp;&lt;BR /&gt;Thank you for the advice.&lt;BR /&gt;I tried with the query you suggested but the situation has not changed.&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="スクリーンショット 2024-11-07 151704.png" style="width: 492px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/42508i2842676D42C7B696/image-size/large?v=v2&amp;amp;px=999" role="button" title="スクリーンショット 2024-11-07 151704.png" alt="スクリーンショット 2024-11-07 151704.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;Thanks.&lt;/P&gt;</description>
      <pubDate>Thu, 07 Nov 2024 06:22:21 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/How-can-I-sort-record-in-parquet-file/m-p/397108#M249695</guid>
      <dc:creator>tono425</dc:creator>
      <dc:date>2024-11-07T06:22:21Z</dc:date>
    </item>
    <item>
      <title>Re: How can I sort record in parquet file?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/How-can-I-sort-record-in-parquet-file/m-p/397109#M249696</link>
      <description>&lt;P class="p1"&gt;If &lt;SPAN class="s1"&gt;CAST&lt;/SPAN&gt; and &lt;SPAN class="s1"&gt;JSON_PARSE&lt;/SPAN&gt; functions are not supported in the Nifi processor you're using, we may try&amp;nbsp;extracting the &lt;SPAN class="s1"&gt;timestamp&lt;/SPAN&gt; value as a string and sorting alphabetically&lt;/P&gt;&lt;P class="p1"&gt;SELECT&lt;SPAN class="s1"&gt; *&lt;/SPAN&gt;&lt;/P&gt;&lt;P class="p2"&gt;&lt;SPAN class="s2"&gt;FROM&lt;/SPAN&gt; flowfile&lt;/P&gt;&lt;P class="p2"&gt;&lt;SPAN class="s2"&gt;ORDER&lt;/SPAN&gt; &lt;SPAN class="s2"&gt;BY&lt;/SPAN&gt; JSON_EXTRACT_SCALAR(Src_obj__event_metadata, "$.timestamp") &lt;SPAN class="s2"&gt;ASC&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 07 Nov 2024 06:27:29 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/How-can-I-sort-record-in-parquet-file/m-p/397109#M249696</guid>
      <dc:creator>ywu</dc:creator>
      <dc:date>2024-11-07T06:27:29Z</dc:date>
    </item>
    <item>
      <title>Re: How can I sort record in parquet file?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/How-can-I-sort-record-in-parquet-file/m-p/397114#M249698</link>
      <description>&lt;P&gt;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/83812"&gt;@ywu&lt;/a&gt;&amp;nbsp;&lt;BR /&gt;Thank you for your prompto reply.&lt;BR /&gt;I tried with that query but unfortunately the same result.&lt;BR /&gt;Both of 2 queries you provided passed the validation in the processor.&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="スクリーンショット 2024-11-07 154049.png" style="width: 522px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/42511i6EEB467DB336B3DD/image-size/large?v=v2&amp;amp;px=999" role="button" title="スクリーンショット 2024-11-07 154049.png" alt="スクリーンショット 2024-11-07 154049.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;Thanks.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 07 Nov 2024 07:02:20 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/How-can-I-sort-record-in-parquet-file/m-p/397114#M249698</guid>
      <dc:creator>tono425</dc:creator>
      <dc:date>2024-11-07T07:02:20Z</dc:date>
    </item>
    <item>
      <title>Re: How can I sort record in parquet file?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/How-can-I-sort-record-in-parquet-file/m-p/397205#M249766</link>
      <description>&lt;P class="p1"&gt;&lt;SPAN class="s1"&gt;JSON_EXTRACT&lt;/SPAN&gt; function in the &lt;SPAN class="s1"&gt;QueryRecord&lt;/SPAN&gt; processor may not be interpreting &lt;SPAN class="s1"&gt;Src_obj__event_metadata&lt;/SPAN&gt; as a JSON object. Instead, it likely sees &lt;SPAN class="s1"&gt;Src_obj__event_metadata&lt;/SPAN&gt; as a plain string, so it cannot directly access the &lt;SPAN class="s1"&gt;"$.timestamp"&lt;/SPAN&gt; field.&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;We may need to use&amp;nbsp;&lt;STRONG&gt;EvaluateJsonPath &lt;/STRONG&gt;processor first&amp;nbsp;&lt;/P&gt;&lt;P class="p1"&gt;to extract &lt;SPAN class="s1"&gt;timestamp&lt;/SPAN&gt; from &lt;SPAN class="s1"&gt;Src_obj__event_metadata&lt;/SPAN&gt; into a new attribute:&lt;/P&gt;&lt;P class="p1"&gt;&lt;SPAN class="s1"&gt;• &lt;STRONG&gt;Destination&lt;/STRONG&gt;: &lt;/SPAN&gt;flowfile-content&lt;/P&gt;&lt;P class="p2"&gt;•&lt;STRONG&gt;Return Type&lt;/STRONG&gt;: &lt;SPAN class="s2"&gt;json&lt;/SPAN&gt;&lt;/P&gt;&lt;P class="p2"&gt;•&lt;STRONG&gt;JSON Path Expression&lt;/STRONG&gt;: Use the following configuration in the &lt;STRONG&gt;Properties&lt;/STRONG&gt; tab:&lt;/P&gt;&lt;P class="p1"&gt;&lt;STRONG&gt;Property&lt;/STRONG&gt;&amp;nbsp; &amp;nbsp;&lt;STRONG&gt;Value&lt;/STRONG&gt;&lt;/P&gt;&lt;P class="p2"&gt;timestamp&amp;nbsp; &amp;nbsp;$.Src_obj__event_metadata.timestamp&lt;BR /&gt;&lt;BR /&gt;Once we&amp;nbsp;extracted &lt;SPAN class="s1"&gt;timestamp&lt;/SPAN&gt; as a separate column, then we could call it directly in&amp;nbsp;QueryRecord processor:&lt;/P&gt;&lt;P class="p2"&gt;SELECT *&lt;BR /&gt;FROM flowfile&lt;BR /&gt;ORDER BY timestamp ASC&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 08 Nov 2024 06:06:45 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/How-can-I-sort-record-in-parquet-file/m-p/397205#M249766</guid>
      <dc:creator>ywu</dc:creator>
      <dc:date>2024-11-08T06:06:45Z</dc:date>
    </item>
    <item>
      <title>Re: How can I sort record in parquet file?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/How-can-I-sort-record-in-parquet-file/m-p/397336#M249817</link>
      <description>&lt;P&gt;Hello&amp;nbsp;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/83812"&gt;@ywu&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;&lt;P&gt;Thank you for your advice.&lt;BR /&gt;Yes, as you noted I don't think JSON_EXTRACT is working as expected in QueryRecord processor.&lt;/P&gt;&lt;P&gt;I converted my parquet file to json format and tried with EvaluateJsonPath processor as you advised.&lt;/P&gt;&lt;P&gt;When I set $.Src_obj__event_metadata.timestamp as timestamp, EvaluateJsonPath processor terminated by unmatched relationship.&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="スクリーンショット 2024-11-11 175858.png" style="width: 753px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/42568iE0B935B208485809/image-size/large?v=v2&amp;amp;px=999" role="button" title="スクリーンショット 2024-11-11 175858.png" alt="スクリーンショット 2024-11-11 175858.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;When I changed timestamp value to .Src_obj__event_metadata.timestamp, only [] was genarated as output.&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;[]&lt;/LI-CODE&gt;&lt;P&gt;&lt;BR /&gt;When I changed the value to .Src_obj__event_metadata, the following output was genarated.&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;["{\"timestamp\":\"2024-11-01T00:23:58.440995\",\"severity\":\"Info\"}","{\"timestamp\":\"2024-11-01T00:23:58.429579\",\"severity\":\"Info\"}","{\"timestamp\":\"2024-11-01T00:23:08.441709\",\"severity\":\"Info\"}","{\"timestamp\":\"2024-11-01T00:23:08.428501\",\"severity\":\"Info\"}","{\"timestamp\":\"2024-11-01T00:23:48.440624\",\"severity\":\"Info\"}"]&lt;/LI-CODE&gt;&lt;P&gt;From these results, it seems we need another consideration to correctly specify timestamp field.&lt;BR /&gt;Do you have any idea or insights?&lt;/P&gt;&lt;P&gt;Regards,&lt;/P&gt;</description>
      <pubDate>Mon, 11 Nov 2024 14:57:53 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/How-can-I-sort-record-in-parquet-file/m-p/397336#M249817</guid>
      <dc:creator>tono425</dc:creator>
      <dc:date>2024-11-11T14:57:53Z</dc:date>
    </item>
    <item>
      <title>Re: How can I sort record in parquet file?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/How-can-I-sort-record-in-parquet-file/m-p/398312#M250158</link>
      <description>&lt;P&gt;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/119395"&gt;@tono425&lt;/a&gt;, I noticed that&amp;nbsp;&lt;A style="background-color: #ffffff;" href="#" target="_blank" rel="noopener"&gt;@ywu&lt;/A&gt;'s response has helped resolve your issue. If it did, please mark the relevant reply as a solution, as it will help others find the answer more easily in the future.&lt;/P&gt;</description>
      <pubDate>Mon, 02 Dec 2024 06:26:39 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/How-can-I-sort-record-in-parquet-file/m-p/398312#M250158</guid>
      <dc:creator>VidyaSargur</dc:creator>
      <dc:date>2024-12-02T06:26:39Z</dc:date>
    </item>
    <item>
      <title>Re: How can I sort record in parquet file?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/How-can-I-sort-record-in-parquet-file/m-p/398430#M250186</link>
      <description>&lt;P&gt;Hello,&lt;/P&gt;&lt;P&gt;After trials and errors, I succeeded to sort records by the following steps.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;STRONG&gt;1. Convert flow file to json by ConvertRecord processor&lt;/STRONG&gt;&lt;BR /&gt;Convert flow file from parquet format to json format so that we can modify it in next step.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;STRONG&gt;2. Format records by ExecuteStreamCommand processor&lt;/STRONG&gt;&lt;BR /&gt;As double quotation marks outside the {} were preventing us from processing records as json, I removed them and escape characters by sed then sorted records by jq from script.&lt;/P&gt;&lt;P&gt;Sample script&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;#!/bin/bash

/usr/bin/sed -e 's/\"{/{/g' -e 's/}\"/}/g' -e 's/\\"/"/g' $1 | /usr/bin/jq '. | sort_by(.Src_obj__event_metadata.timestamp)'&lt;/LI-CODE&gt;&lt;P&gt;&lt;BR /&gt;&lt;STRONG&gt;3. Convert flow file to parquet by ConvertRecord processor&lt;/STRONG&gt;&lt;BR /&gt;Convert flow file from json format to parquet format.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/83812"&gt;@ywu&lt;/a&gt;&amp;nbsp;&lt;BR /&gt;Your advice was very helpful in resolving the issue.&lt;BR /&gt;Thanks a lot.&lt;/P&gt;</description>
      <pubDate>Thu, 05 Dec 2024 06:30:02 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/How-can-I-sort-record-in-parquet-file/m-p/398430#M250186</guid>
      <dc:creator>tono425</dc:creator>
      <dc:date>2024-12-05T06:30:02Z</dc:date>
    </item>
    <item>
      <title>Re: How can I sort record in parquet file?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/How-can-I-sort-record-in-parquet-file/m-p/398435#M250187</link>
      <description>&lt;P&gt;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/119395"&gt;@tono425&lt;/a&gt;,&amp;nbsp;Thank you for your participation in the Cloudera Community. I'm happy to see you resolved your issue. Please mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 05 Dec 2024 11:08:14 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/How-can-I-sort-record-in-parquet-file/m-p/398435#M250187</guid>
      <dc:creator>VidyaSargur</dc:creator>
      <dc:date>2024-12-05T11:08:14Z</dc:date>
    </item>
  </channel>
</rss>

