Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

NiFi: trying to create record set from one field in and embedded array in JSON

avatar
Contributor

I have this JSON:

{
  "store": {
    "book": [
      {
        "category": "reference",
        "author": "Nigel Rees",
        "title": "Sayings of the Century",
        "price": 8.95
      },
      {
        "category": "fiction",
        "author": "Herman Melville",
        "title": "Moby **bleep**",
        "isbn": "0-553-21311-3",
        "price": 8.99
      },
      {
        "category": "fiction",
        "author": "J.R.R. Tolkien",
        "title": "The Lord of the Rings",
        "isbn": "0-395-19395-8",
        "price": 22.99
      }
    ]
}

want a list/record set of categories. CSV, plain text, doesn't matter, but let's say CSV:

category
"refereces"
"fiction"
"fiction"

I have tried many things, too many to repeat them all here. But basically, I have a GenerateFlowFile where the JSON is hard-coded, then a QueryReport where the reader is a JsonPathReader where I have properties for all fields in the JSON:

store     $.store
book      $.store.book[*]
category  $.store.book[*].category
etc.

Just to see what's being returned, I currently have the writer set to an all-defaults JsonRecordSetWriter.

With this in mind, in the QueryRecord, select * returns the JSON unaltered. select store returns the JSON unaltered. select book returns "no column named 'book'". I can use an EvaluateJsonPath with $.store.book[*].category as the property value, and it returns this:

["references", "fiction", "fiction"]

If I switch over to an all-defaults CSVRecordSetWriter and do select store, I get this:

store
MapRecord[{book=[Ljava.lang.Object;@24bda2f0}]

I know there are other ways to configure EvaluateJsonPath so it does parse the data correctly, but in doing so, it creates a FlowFile for each record. I don't want that; I want a single recordset in one FlowFile because this is just a proof of concept. With the real data I'm looking at tens of thousands of records.

I also know I could take this to Groovy and get it done. I'd like to avoid that and only use to the bare minimum of native NiFi processors.

I've also tried some things with a ForkRecord, but as I said, I've kind of lost the bubble on everything I've tried. I believe this is possible but running out of energy and ideas and think I've exhausted the wisdom of the web. Is it really this difficult?

Let me know what I'm doing wrong.

1 ACCEPTED SOLUTION

avatar

Hi @noncitizen ,

I think using  JsonPathReader is not the right choice for this requirement. This service according to the documentation will always evaluate against the root element:

"...If the root of the FlowFile's JSON is a JSON Array, each JSON Object found in that array will be treated as a separate Record, not as a single record made up of an array. If the root of the FlowFile's JSON is a JSON Object, it will be evaluated as a single Record..."

ref: https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-record-serialization-services...

So since your root element "store" is an object it will always return a single record , and if it happens that one of the fields is an array , it will be returned as a single record as an array representation:

["references", "fiction", "fiction"]

It seems the JsonPathReader is more suited when your root element is an array. If you want to make it work so that it returns multiple records you probably need to do jolt transformation to dump all book array items into root array. Keep in mind if you want just the category or any other certain field you have to define an Avro schema with the expected fields otherwise all none specified fields will be returned with blank values.

What you need is JsonTreeReader service instead where its configured to use Nested Field as Starting Field Strategy, then specify "book" as the Starting Field Name, as follows:

SAMSAL_0-1706609795720.png

The QueryRecord then simply can be configured as follows:

SAMSAL_1-1706609912610.png

which will give you desired output in CSV format:

category
reference
fiction
fiction

If that helps please accept solution.

Thanks

 

View solution in original post

4 REPLIES 4

avatar

Hi @noncitizen ,

I think using  JsonPathReader is not the right choice for this requirement. This service according to the documentation will always evaluate against the root element:

"...If the root of the FlowFile's JSON is a JSON Array, each JSON Object found in that array will be treated as a separate Record, not as a single record made up of an array. If the root of the FlowFile's JSON is a JSON Object, it will be evaluated as a single Record..."

ref: https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-record-serialization-services...

So since your root element "store" is an object it will always return a single record , and if it happens that one of the fields is an array , it will be returned as a single record as an array representation:

["references", "fiction", "fiction"]

It seems the JsonPathReader is more suited when your root element is an array. If you want to make it work so that it returns multiple records you probably need to do jolt transformation to dump all book array items into root array. Keep in mind if you want just the category or any other certain field you have to define an Avro schema with the expected fields otherwise all none specified fields will be returned with blank values.

What you need is JsonTreeReader service instead where its configured to use Nested Field as Starting Field Strategy, then specify "book" as the Starting Field Name, as follows:

SAMSAL_0-1706609795720.png

The QueryRecord then simply can be configured as follows:

SAMSAL_1-1706609912610.png

which will give you desired output in CSV format:

category
reference
fiction
fiction

If that helps please accept solution.

Thanks

 

avatar
Contributor

Wow, I was excited when I saw this as it looked like the kind of simple elegance I was looking for, and I wondered why I hadn't noticed the Starting Field Strategy property, because in trying to work this out, I had previously turned to JsonTreeReader. But in implementing it, I see why. We're on version 11.9.0, and the JsonTreeReader is version 1.14.0.i, meaning, I don't have those capabilities.

Because moving to a more recent version is not possible, I will go down the JOLT pathway and see what I can work out. Even though I couldn't test it out, I will accept your solution because I believe if I had the latest and greatest, it would be the one. Plus, your description of the JSON hierarchy in play here was helpful.

avatar
Contributor

UPDATE: I'm working on an enclave, so this initial test was at jolt-demo.appspot.com, but moving it over to NiFi, I had to add one addition level in the JOLT transformation. What appears below is now correct.

That was quick. This JOLT transformation . . .

 

[
  {
    "operation": "shift",
    "spec": {
      "*": {
        "*": {
          "*": {
            "*":
            {
              "@": ""
            }
          }
        }
      }
    }
  }
]

 

. . . transform the JSON to this . . .

 

[ {
  "category" : "reference",
  "author" : "Nigel Rees",
  "title" : "Sayings of the Century",
  "price" : 8.95
}, {
  "category" : "fiction",
  "author" : "Herman Melville",
  "title" : "Moby **bleep**",
  "isbn" : "0-553-21311-3",
  "price" : 8.99
}, {
  "category" : "fiction",
  "author" : "J.R.R. Tolkien",
  "title" : "The Lord of the Rings",
  "isbn" : "0-395-19395-8",
  "price" : 22.99
} ]

 

And with an all-defaults JSONTreeReader and CSVRecordSetWriter, "select category" returns exactly what I need. I was thinking about JOLT, but haven't done much with it, and was fearful of the complexity. So thanks again, @SAMSAL for pushing me in the right direction.

 

avatar

My Pleasure . I'm glad you were able to get it with the help of Jolt. I agree jolt is a little intimidating initially, but with practice , trial and error you grow to love it 🙂 . A simpler way to represent your spec:

[
  {
    "operation": "shift",
    "spec": {
      "store": {
        "book": {
          "*": "[]"
        }
      }
    }
  }
]

Feel free to post any jolt question or challenges in the future.