Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Extracting List(Array) element using JOLT

avatar
Expert Contributor

Hello,

Can you assist me in retrieving the values stored in the "customfield_10161field from JSON input? My expectation is to consistently receive the values and name it "DEFECT_ROOT_CAUSE" , whether they are stored as a list or as a single value."

How to write JOLT which will handle the both the case. let say If I will get list then extract last element of it, if not list then record as it is.?

Extracting List: (works fine when input is list)

saquibsk_0-1713731749160.png

when input is not list: 

then DEFECT_ROOT_CAUSE column is missing My expectation to get the records in DEFECT_ROOT_CAUSE columns whether its list or not.

saquibsk_1-1713731973678.png

 

 

 

 

 

1 ACCEPTED SOLUTION

avatar

Based on the provided new input and the expected output as I was able to understand where you always want to consider the last element of both DEFECT_ROOT_CAUSE & SPRINT_LIST which both may or may not be an array of values, I would take a different approach to the jolt spec which is much simpler as follows:

 

 

[
  // First spec is convert everything into list regardless
  {
    "operation": "cardinality",
    "spec": {
      "*": {
        "SPRINT_LIST": "MANY",
        "DEFECT_ROOT_CAUSE": "MANY"
      }
    }
  }
  ,
  // From the generated list above return last element
  {
    "operation": "modify-overwrite-beta",
    "spec": {
      "*": {
        "SPRINT_LIST": "=lastElement",
        "DEFECT_ROOT_CAUSE": "=lastElement"
      }
    }
  }

]

 

 

 

View solution in original post

15 REPLIES 15

avatar

Hi @saquibsk ,

The below spec should put you on the right path if it doesnt completely solve your issue:

 

 

[
  // Assign isArray flag field when there is an array (index starts at 0)
  {
    "operation": "shift",
    "spec": {
      "id": "INTEGRATION_ID",
      "*": {
        "*_*": {
          "@": "&(1,1)",
          "0": {
            "#true": "isArray"
          }
        }
      }
    }
   }
  ,
  //If the input is not an array assign default isArray=false
  {
    "operation": "default",
    "spec": {
      "isArray": "false"
    }
    }
    ,
  //Depending on with isArray true of false:
  //Transpose values into DEFECT_ROOT_CAUSE_List array
  
  {
    "operation": "shift",
    "spec": {
      "customfield": null,
      "*": "&",
      "isArray": {
        "true": {
          "@(2,customfield)": {
            "*": {
              "value": {
                "@": "DEFECT_ROOT_CAUSE_List[]"
              }
            }
          }
        },
        "false": {
          "@(2,customfield)": "DEFECT_ROOT_CAUSE_List[]"
        }
      }
    }
    }
  	,
  // Finally, get the last element of the array
  {
    "operation": "modify-overwrite-beta",
    "spec": {
      "DEFECT_ROOT_CAUSE": "=lastElement(@(1,DEFECT_ROOT_CAUSE_List))"
    }
    }

]

 

If that helps please accept solution.

Thanks

avatar
Expert Contributor

Hi @SAMSAL,

Thank you for the reply.

There are still no columns showing in the non-list scenario.

saquibsk_0-1713762938776.png

 

avatar

In my jolt spec its expecting customfield_10616 , so in my spec if you notice in the first shift Im using the expression *_* basically to make more dynamic but at least you have to have an underscore but if you think otherwise you can change it to whatever works for your data, for example you can change it to "customfield" , "customfield_*" , "*" , etc.

 

avatar
Expert Contributor

This customfield_ will not change because it is static. I'm handling this aspect in NiFi right now, however I get into trouble when I acquire more fields as lists and have to change the NIFI Flow.

avatar
Expert Contributor

This is how I am currently handling the records. All I'm doing is flagging the rows and sending them to the appropriate process. However, suppose in the future that a new field or an existing one begins to capture List data; in that case, I would need to modify the flow and add more conditions. I just needed a single field, therefore let's say that if it captures the list, it will show up as the last record or as a single record.

saquibsk_0-1713869512819.png

 

avatar

I think the issue here is stating what the problem is vs the actual problem. since you starting posting the issue, I noticed the json input has changed 3 times. Remember as someone said "Asking the right question is half way the path to finding the right solution".

Allow  me give you some pointers:

1- Please specify the different json input for all possible scenarios and the expected output\s.

2- If you have compound problem then divide and conqueror. Break the big problem into smaller one where each is isolated to its own input, output and  assertions..

3- Simplify if you can, for example if you have complex json input and you are only facing issue with particular field or nested object then you dont have to post the whole json , instead you isolate where the issue is and provide that part only or create new one that mimic the same structure.

4- If you have code or some formatted data please use the code block "</>" from the menu item for better visibility and readability. Dont post code\data as screenshot.

5- Use screenshots for when suitable , like showing nifi flow, processor configuration ..etc.

6- Refer to the community guidelines for more info: https://community.cloudera.com/t5/custom/page/page-id/Community_Guidelines

 

Hope that helps.

 

 

avatar
Expert Contributor

Hi @SAMSAL ,

My requirement is very simple. 

I anticipate receiving the single records in the output. That's DEFECT_ROOT_CAUSE, regardless of whether my input is a list or sinlge records.

(The primary source is JIRA; a bug ticket has been generated there, indicating that the issue may be in the front end or back end. However, occasionally a reporter will tag the back end before the front end, leading to the creation of a list. )

saquibsk_0-1713890872587.png

> my input is almost 200 line however I am only putting the required column only. I have posted two image one is for list records and one is single records with same column (only records is getting changes that's it.

avatar

Based on my json spec above:

1- customfield_* with multiple records list

SAMSAL_0-1713912768426.png

 

2- customfield_* with single records list:

 

SAMSAL_1-1713912843333.png

3- customfield_* with no list:

SAMSAL_2-1713912974290.png

 

The  asterisk in customfield_* means anything as long there is a customfield , underscore (_) and whatever comes after.

 

avatar
Expert Contributor

Hi @SAMSAL ,

Am I missing something on the JOLT?

Not getting output as expected. (DEFECT_ROOT_CUASE) columns is missing from output.

===========================================
--Input
===========================================
{
  "id": "33414",
  "fields": {
    "customfield_10161": [
      {
        "value": "Backend"
      },
      {
        "value": "Frontend"
      }
    ]
  }
}

===========================================
--JOLT Spec
===========================================

[
  // Entries (where "flat" "data" when there is an array) (now starts at 8)
  {
    "operation": "shift",
    "spec": {
      "id": "INTEGRATION_ID",
      "*": {
        "*_*": {
          "@": "&(1,1)",
          "0": {
            "#true": "isArray"
          }
        }
      }
    }
},
 //If the input is not an array, assign default isArray=false 
  {
    "operation": "default",
    "spec": {
      "isArray": "false"
    }
  },
 //Depending on with isArray true for false
 //Transpose values into DEFECT_ROOT_CAUSE_List_Array 
  {
    "operation": "shift",
    "spec": {
      "customfield": null,
      "*": "&",
      "isArray": {
        "true": {
          "@(2,customfield)": {
            "*": {
              "value": {
                "@": "DEFECT_ROOT_CAUSE_LIST[]"
              }
            }
          }
        },
        "false": {
          "@(2,customfield)": "DEFECT_ROOT_CAUSE_LIST[]"
        }
      }
    }
},
 // Finally, get the last element of the array
  {
    "operation": "modify-overwrite-beta",
    "spec": {
      "DEFECT_ROOT_CAUSE": "=lastElement(@(1,DIRECT_ROOT_CAUSE_LIST))"
    }
 }
]

===========================================
--Output
===========================================

{
  "INTEGRATION_ID" : "33414",
  "DEFECT_ROOT_CAUSE_LIST" : [ "Backend", "Frontend" ]
}