<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: How to convert multiple related rows in CSV into nested JSON using Nifi? in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/How-to-convert-multiple-related-rows-in-CSV-into-nested-JSON/m-p/334000#M231586</link>
    <description>&lt;P&gt;Thanks for the solution! It solves my use case perfectly.&lt;BR /&gt;&lt;BR /&gt;Just one question though. I expect the input JSON to the Jolt processor to be somewhat large (larger than the available RAM). In that case, is it absolutely necessary to split the input into multiple files?&lt;/P&gt;</description>
    <pubDate>Mon, 17 Jan 2022 13:09:33 GMT</pubDate>
    <dc:creator>Tomson</dc:creator>
    <dc:date>2022-01-17T13:09:33Z</dc:date>
    <item>
      <title>How to convert multiple related rows in CSV into nested JSON using Nifi?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/How-to-convert-multiple-related-rows-in-CSV-into-nested-JSON/m-p/333936#M231567</link>
      <description>&lt;P&gt;I have the following CSV:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;TABLE border="1" width="100%"&gt;&lt;TBODY&gt;&lt;TR&gt;&lt;TD width="16.666666666666668%"&gt;mobile&lt;/TD&gt;&lt;TD width="16.666666666666668%"&gt;username&lt;/TD&gt;&lt;TD width="16.666666666666668%"&gt;bill_id&lt;/TD&gt;&lt;TD width="16.666666666666668%"&gt;bill_date&lt;/TD&gt;&lt;TD width="16.666666666666668%"&gt;product_id&lt;/TD&gt;&lt;TD width="16.666666666666668%"&gt;product_price&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD width="16.666666666666668%"&gt;9876543210&lt;/TD&gt;&lt;TD width="16.666666666666668%"&gt;John&lt;/TD&gt;&lt;TD width="16.666666666666668%"&gt;AB10&lt;/TD&gt;&lt;TD width="16.666666666666668%"&gt;2020-12-23&lt;/TD&gt;&lt;TD width="16.666666666666668%"&gt;X34&lt;/TD&gt;&lt;TD width="16.666666666666668%"&gt;500&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD width="16.666666666666668%"&gt;9876543210&lt;/TD&gt;&lt;TD width="16.666666666666668%"&gt;John&lt;/TD&gt;&lt;TD width="16.666666666666668%"&gt;AB10&lt;/TD&gt;&lt;TD width="16.666666666666668%"&gt;2020-12-23&lt;/TD&gt;&lt;TD width="16.666666666666668%"&gt;X35&lt;/TD&gt;&lt;TD width="16.666666666666668%"&gt;230&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;9876543210&lt;/TD&gt;&lt;TD&gt;John&lt;/TD&gt;&lt;TD&gt;AB22&lt;/TD&gt;&lt;TD&gt;2020-11-14&lt;/TD&gt;&lt;TD&gt;X89&lt;/TD&gt;&lt;TD&gt;700&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;9999999999&lt;/TD&gt;&lt;TD&gt;Alice&lt;/TD&gt;&lt;TD&gt;AC54&lt;/TD&gt;&lt;TD&gt;2019-12-20&lt;/TD&gt;&lt;TD&gt;X10&lt;/TD&gt;&lt;TD&gt;109&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;And I want to convert them to the following structure:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;{
    "users": [
    {
        "mobile": "9876543210",
        "username": "John",
        "bills": [
        {
            "bill_id": "AB10",
            "bill_date": "2020-12-23",
            "products": [{
                "product_id": "X34",
                "product_price": "500",
            },
            {
                "product_id": "X35",
                "product_price": "230",
            }]
        },
        {
            "bill_id": "AB22",
            "bill_date": "2020-11-14",
            "products": [
            {
                "product_id": "X89",
                "product_price": "700",
            }]
        }]
    },
    {
        "mobile": "9999999999",
        "username": "Alice",
        "bills": [
        {
            "bill_id": "AC54",
            "bill_date": "2019-12-20",
            "products": [
            {
                "product_id": "X10",
                "product_price": "109",
            }]
        }]
    }]
}&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I wasn't able to figure out how to group the related properties (e.g. product properties with the same bill_id) into an array. How should I go about doing this?&lt;/P&gt;</description>
      <pubDate>Sun, 16 Jan 2022 19:01:00 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/How-to-convert-multiple-related-rows-in-CSV-into-nested-JSON/m-p/333936#M231567</guid>
      <dc:creator>Tomson</dc:creator>
      <dc:date>2022-01-16T19:01:00Z</dc:date>
    </item>
    <item>
      <title>Re: How to convert multiple related rows in CSV into nested JSON using Nifi?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/How-to-convert-multiple-related-rows-in-CSV-into-nested-JSON/m-p/333964#M231573</link>
      <description>&lt;P&gt;Thank you for your question.&lt;BR /&gt;We can try following steps to make it work as expected.&amp;nbsp;&lt;BR /&gt;1. Convert csv to json ( you may use &lt;STRONG&gt;ConvertRecord&lt;/STRONG&gt;&amp;nbsp;), to get JSON as below&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;[{
	"mobile": "9876543210",
	"username": "John",
	"bill_id": "AB10",
	"bill_date": "2020-12-23",
	"product_id": "X34",
	"product_price": "500"
}, {
	"mobile": "9876543210",
	"username": "John",
	"bill_id": "AB10",
	"bill_date": "2020-12-23",
	"product_id": "X35",
	"product_price": "230"
}, {
	"mobile": "9876543210",
	"username": "John",
	"bill_id": "AB22",
	"bill_date": "2020-11-14",
	"product_id": "X89",
	"product_price": "700"
}, {
	"mobile": "9999999999",
	"username": "Alice",
	"bill_id": "AC54",
	"bill_date": "2019-12-20",
	"product_id": "X10",
	"product_price": "109"
}]&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;2. We will then use &lt;STRONG&gt;JoltTransformJSON&lt;/STRONG&gt; to make the shift stuff, the specification of such JOLT is listed as:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;[ {
  "operation" : "shift",
  "spec" : {
    "*" : {
      "*" : "@(1,username)@(1,mobile)@(1,bill_id)@(1,bill_date)@(1,product_id).&amp;amp;"
    }
  }
}, {
  "operation" : "shift",
  "spec" : {
    "*" : {
      "$" : "users.[#2].username",
      "*" : {
        "$" : "users.[#3].mobile",
        "*" : {
          "*" : {
            "$" : "users.[#5].bills.[#3].bill_date",
            "$1" : "users.[#5].bills.[#3].bill_id",
            "*" : {
              "product_*" : "users.[#6].bills.[#4].products.[#2].&amp;amp;"
            }
          }
        }
      }
    }
  }
} ]&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;EM&gt;NOTE:&lt;/EM&gt;&lt;BR /&gt;&lt;EM&gt;- For the JOLT spec, we gonna group the raw data into organic fields, which means we will get the data which are grouped by "username + mobile + bill_id +bill_date + product_id " fields.&lt;/EM&gt;&lt;BR /&gt;&lt;EM&gt;- After that, we get the phase-1 shifted data that are ordered by unique keys.&lt;/EM&gt;&lt;BR /&gt;&lt;EM&gt;- For the phase-2 shift, we gonna list out the data as predefined logic which is inferred from your attached sample output.&lt;/EM&gt;&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;3. For details on JOLT spec, you may refer to the following link:&amp;nbsp;&lt;BR /&gt;&lt;EM&gt;&lt;STRONG&gt;&lt;A href="https://jolt-demo.appspot.com/" target="_blank" rel="noopener"&gt;https://jolt-demo.appspot.com/&lt;/A&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;/STRONG&gt;&lt;/EM&gt;Hope this helps.&lt;BR /&gt;Thanks&amp;nbsp;&lt;BR /&gt;Oliver Gong&lt;/P&gt;</description>
      <pubDate>Mon, 17 Jan 2022 05:50:52 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/How-to-convert-multiple-related-rows-in-CSV-into-nested-JSON/m-p/333964#M231573</guid>
      <dc:creator>OliverGong</dc:creator>
      <dc:date>2022-01-17T05:50:52Z</dc:date>
    </item>
    <item>
      <title>Re: How to convert multiple related rows in CSV into nested JSON using Nifi?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/How-to-convert-multiple-related-rows-in-CSV-into-nested-JSON/m-p/333973#M231574</link>
      <description>&lt;P&gt;&lt;STRONG&gt;Attached the corresponding screenshots to cover the detail on implementation.&lt;BR /&gt;0. The whole data flow is like:&lt;BR /&gt;&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="OliverGong_0-1642403823058.png" style="width: 1267px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/33188i11AFCEA23EEEC977/image-size/medium?v=v2&amp;amp;px=400" role="button" title="OliverGong_0-1642403823058.png" alt="OliverGong_0-1642403823058.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;&lt;BR /&gt;&lt;BR /&gt;1.GenerateFlowFile:&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="OliverGong_1-1642403058287.png" style="width: 400px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/33181iE1793C298227946F/image-size/medium?v=v2&amp;amp;px=400" role="button" title="OliverGong_1-1642403058287.png" alt="OliverGong_1-1642403058287.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;STRONG&gt;2. ConvertRecord:&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="OliverGong_2-1642403099646.png" style="width: 400px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/33182i73BFFA82961F5AE6/image-size/medium?v=v2&amp;amp;px=400" role="button" title="OliverGong_2-1642403099646.png" alt="OliverGong_2-1642403099646.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;2.1 CSV Reader(LG CSVReader):&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="OliverGong_3-1642403167599.png" style="width: 400px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/33183i00D6E82DF6DE5C83/image-size/medium?v=v2&amp;amp;px=400" role="button" title="OliverGong_3-1642403167599.png" alt="OliverGong_3-1642403167599.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;2.2&amp;nbsp;&lt;SPAN&gt;JsonRecordSetWriter (LG JsonRecordSetWriter):&lt;BR /&gt;&lt;/SPAN&gt;&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="OliverGong_4-1642403271422.png" style="width: 400px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/33184i095A75FC867D5AC4/image-size/medium?v=v2&amp;amp;px=400" role="button" title="OliverGong_4-1642403271422.png" alt="OliverGong_4-1642403271422.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;3.&amp;nbsp;&lt;SPAN&gt;JoltTransformJSON&lt;/SPAN&gt;&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="OliverGong_5-1642403330677.png" style="width: 400px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/33185i40889FF4D98BB012/image-size/medium?v=v2&amp;amp;px=400" role="button" title="OliverGong_5-1642403330677.png" alt="OliverGong_5-1642403330677.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;&amp;nbsp;3.1&amp;nbsp;&lt;SPAN&gt;JoltTransformJSON -&amp;nbsp;&lt;/SPAN&gt;Advanced Mode:&lt;/STRONG&gt;&lt;BR /&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="OliverGong_0-1642402987008.png" style="width: 400px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/33180i8889860ADD697F45/image-size/medium?v=v2&amp;amp;px=400" role="button" title="OliverGong_0-1642402987008.png" alt="OliverGong_0-1642402987008.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;Should you have any questions, just feel free to contact me.&lt;BR /&gt;Thanks,&lt;BR /&gt;Oliver Gong&lt;/P&gt;</description>
      <pubDate>Mon, 17 Jan 2022 07:19:02 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/How-to-convert-multiple-related-rows-in-CSV-into-nested-JSON/m-p/333973#M231574</guid>
      <dc:creator>OliverGong</dc:creator>
      <dc:date>2022-01-17T07:19:02Z</dc:date>
    </item>
    <item>
      <title>Re: How to convert multiple related rows in CSV into nested JSON using Nifi?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/How-to-convert-multiple-related-rows-in-CSV-into-nested-JSON/m-p/334000#M231586</link>
      <description>&lt;P&gt;Thanks for the solution! It solves my use case perfectly.&lt;BR /&gt;&lt;BR /&gt;Just one question though. I expect the input JSON to the Jolt processor to be somewhat large (larger than the available RAM). In that case, is it absolutely necessary to split the input into multiple files?&lt;/P&gt;</description>
      <pubDate>Mon, 17 Jan 2022 13:09:33 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/How-to-convert-multiple-related-rows-in-CSV-into-nested-JSON/m-p/334000#M231586</guid>
      <dc:creator>Tomson</dc:creator>
      <dc:date>2022-01-17T13:09:33Z</dc:date>
    </item>
    <item>
      <title>Re: How to convert multiple related rows in CSV into nested JSON using Nifi?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/How-to-convert-multiple-related-rows-in-CSV-into-nested-JSON/m-p/334022#M231592</link>
      <description>&lt;P&gt;Glad to hear that works for you!&lt;BR /&gt;Though we can split the raw csv or json content into a smaller size just in case of OOM issue when doing the shifting stuff on JoltTransformJSON, it may cause other issue like:&lt;BR /&gt;- result from JoltTransformJSON is not complete. That means, some part of the same user data ( with different bill_date)&amp;nbsp; may be wrapped in the other flow files. In such case, we will need to merge them back as a whole.&lt;BR /&gt;&lt;BR /&gt;- &lt;EM&gt;&lt;STRONG&gt;It would be better if we can keep raw csv data in some DB table.&lt;/STRONG&gt;&lt;/EM&gt; Then drag out the data rows with a specified limit&lt;STRONG&gt;&lt;EM&gt;--&amp;gt; we can use "split pages" way to query on such table,&lt;/EM&gt;&lt;/STRONG&gt;&amp;nbsp;&lt;BR /&gt;- We can&amp;nbsp;then easily fetch a logical &lt;EM&gt;&lt;STRONG&gt;completed data&lt;/STRONG&gt; &lt;/EM&gt;to do the rest shift things. With such completed result, we don't need to worry about the data is not an&lt;STRONG&gt;&lt;EM&gt; info-incomplete one&lt;/EM&gt;&lt;/STRONG&gt;.&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 17 Jan 2022 14:46:14 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/How-to-convert-multiple-related-rows-in-CSV-into-nested-JSON/m-p/334022#M231592</guid>
      <dc:creator>OliverGong</dc:creator>
      <dc:date>2022-01-17T14:46:14Z</dc:date>
    </item>
  </channel>
</rss>

