Support Questions

Find answers, ask questions, and share your expertise

How to convert multiple related rows in CSV into nested JSON using Nifi?

avatar
New Contributor

I have the following CSV:

 

mobileusernamebill_idbill_dateproduct_idproduct_price
9876543210JohnAB102020-12-23X34500
9876543210JohnAB102020-12-23X35230
9876543210JohnAB222020-11-14X89700
9999999999AliceAC542019-12-20X10109

 

And I want to convert them to the following structure:

 

{
    "users": [
    {
        "mobile": "9876543210",
        "username": "John",
        "bills": [
        {
            "bill_id": "AB10",
            "bill_date": "2020-12-23",
            "products": [{
                "product_id": "X34",
                "product_price": "500",
            },
            {
                "product_id": "X35",
                "product_price": "230",
            }]
        },
        {
            "bill_id": "AB22",
            "bill_date": "2020-11-14",
            "products": [
            {
                "product_id": "X89",
                "product_price": "700",
            }]
        }]
    },
    {
        "mobile": "9999999999",
        "username": "Alice",
        "bills": [
        {
            "bill_id": "AC54",
            "bill_date": "2019-12-20",
            "products": [
            {
                "product_id": "X10",
                "product_price": "109",
            }]
        }]
    }]
}

 

 

I wasn't able to figure out how to group the related properties (e.g. product properties with the same bill_id) into an array. How should I go about doing this?

1 ACCEPTED SOLUTION

avatar
Contributor

Thank you for your question.
We can try following steps to make it work as expected. 
1. Convert csv to json ( you may use ConvertRecord ), to get JSON as below

 

[{
	"mobile": "9876543210",
	"username": "John",
	"bill_id": "AB10",
	"bill_date": "2020-12-23",
	"product_id": "X34",
	"product_price": "500"
}, {
	"mobile": "9876543210",
	"username": "John",
	"bill_id": "AB10",
	"bill_date": "2020-12-23",
	"product_id": "X35",
	"product_price": "230"
}, {
	"mobile": "9876543210",
	"username": "John",
	"bill_id": "AB22",
	"bill_date": "2020-11-14",
	"product_id": "X89",
	"product_price": "700"
}, {
	"mobile": "9999999999",
	"username": "Alice",
	"bill_id": "AC54",
	"bill_date": "2019-12-20",
	"product_id": "X10",
	"product_price": "109"
}]

 

2. We will then use JoltTransformJSON to make the shift stuff, the specification of such JOLT is listed as:

 

[ {
  "operation" : "shift",
  "spec" : {
    "*" : {
      "*" : "@(1,username)@(1,mobile)@(1,bill_id)@(1,bill_date)@(1,product_id).&"
    }
  }
}, {
  "operation" : "shift",
  "spec" : {
    "*" : {
      "$" : "users.[#2].username",
      "*" : {
        "$" : "users.[#3].mobile",
        "*" : {
          "*" : {
            "$" : "users.[#5].bills.[#3].bill_date",
            "$1" : "users.[#5].bills.[#3].bill_id",
            "*" : {
              "product_*" : "users.[#6].bills.[#4].products.[#2].&"
            }
          }
        }
      }
    }
  }
} ]

 

NOTE:
- For the JOLT spec, we gonna group the raw data into organic fields, which means we will get the data which are grouped by "username + mobile + bill_id +bill_date + product_id " fields.
- After that, we get the phase-1 shifted data that are ordered by unique keys.
- For the phase-2 shift, we gonna list out the data as predefined logic which is inferred from your attached sample output.


3. For details on JOLT spec, you may refer to the following link: 
https://jolt-demo.appspot.com/

Hope this helps.
Thanks 
Oliver Gong

View solution in original post

4 REPLIES 4

avatar
Contributor

Thank you for your question.
We can try following steps to make it work as expected. 
1. Convert csv to json ( you may use ConvertRecord ), to get JSON as below

 

[{
	"mobile": "9876543210",
	"username": "John",
	"bill_id": "AB10",
	"bill_date": "2020-12-23",
	"product_id": "X34",
	"product_price": "500"
}, {
	"mobile": "9876543210",
	"username": "John",
	"bill_id": "AB10",
	"bill_date": "2020-12-23",
	"product_id": "X35",
	"product_price": "230"
}, {
	"mobile": "9876543210",
	"username": "John",
	"bill_id": "AB22",
	"bill_date": "2020-11-14",
	"product_id": "X89",
	"product_price": "700"
}, {
	"mobile": "9999999999",
	"username": "Alice",
	"bill_id": "AC54",
	"bill_date": "2019-12-20",
	"product_id": "X10",
	"product_price": "109"
}]

 

2. We will then use JoltTransformJSON to make the shift stuff, the specification of such JOLT is listed as:

 

[ {
  "operation" : "shift",
  "spec" : {
    "*" : {
      "*" : "@(1,username)@(1,mobile)@(1,bill_id)@(1,bill_date)@(1,product_id).&"
    }
  }
}, {
  "operation" : "shift",
  "spec" : {
    "*" : {
      "$" : "users.[#2].username",
      "*" : {
        "$" : "users.[#3].mobile",
        "*" : {
          "*" : {
            "$" : "users.[#5].bills.[#3].bill_date",
            "$1" : "users.[#5].bills.[#3].bill_id",
            "*" : {
              "product_*" : "users.[#6].bills.[#4].products.[#2].&"
            }
          }
        }
      }
    }
  }
} ]

 

NOTE:
- For the JOLT spec, we gonna group the raw data into organic fields, which means we will get the data which are grouped by "username + mobile + bill_id +bill_date + product_id " fields.
- After that, we get the phase-1 shifted data that are ordered by unique keys.
- For the phase-2 shift, we gonna list out the data as predefined logic which is inferred from your attached sample output.


3. For details on JOLT spec, you may refer to the following link: 
https://jolt-demo.appspot.com/

Hope this helps.
Thanks 
Oliver Gong

avatar
Contributor

Attached the corresponding screenshots to cover the detail on implementation.
0. The whole data flow is like:

OliverGong_0-1642403823058.png

 



1.GenerateFlowFile:

OliverGong_1-1642403058287.png


2. ConvertRecord:

OliverGong_2-1642403099646.png

2.1 CSV Reader(LG CSVReader):

OliverGong_3-1642403167599.png

2.2 JsonRecordSetWriter (LG JsonRecordSetWriter):

OliverGong_4-1642403271422.png

3. JoltTransformJSON

OliverGong_5-1642403330677.png

 3.1 JoltTransformJSON - Advanced Mode:
OliverGong_0-1642402987008.png


Should you have any questions, just feel free to contact me.
Thanks,
Oliver Gong

avatar
New Contributor

Thanks for the solution! It solves my use case perfectly.

Just one question though. I expect the input JSON to the Jolt processor to be somewhat large (larger than the available RAM). In that case, is it absolutely necessary to split the input into multiple files?

avatar
Contributor

Glad to hear that works for you!
Though we can split the raw csv or json content into a smaller size just in case of OOM issue when doing the shifting stuff on JoltTransformJSON, it may cause other issue like:
- result from JoltTransformJSON is not complete. That means, some part of the same user data ( with different bill_date)  may be wrapped in the other flow files. In such case, we will need to merge them back as a whole.

- It would be better if we can keep raw csv data in some DB table. Then drag out the data rows with a specified limit--> we can use "split pages" way to query on such table, 
- We can then easily fetch a logical completed data to do the rest shift things. With such completed result, we don't need to worry about the data is not an info-incomplete one.