Support Questions
Find answers, ask questions, and share your expertise

How to search and remove null values in flow file while loading to elastic search in nifi

Explorer

Hi

How to search and remove NULL value containing fields in incoming flow files in nifi . I was used replace test processor but it was removing only particular field. Here the Incoming flow file is JSON file format . please provide your thoughts in theis regard

1 ACCEPTED SOLUTION

Accepted Solutions

Super Guru
@Suresh Dendukuri

One way of doing is to use Replace text processor to search for "NULL" in json record key/value and replace with empty string(blank), by using this way we can replace all NULL key/value pairs from flowfile content.

Method1:-

Input:-

[ { "color": "black", "category": "hue", "type": "NULL", "code": { "rgba": [255,255,255,1], "hex": "#000" } }, { "color": "NULL", "category": "value", "code": { "rgba": [0,0,0,1], "hex": "#FFF" } }, { "color": "red", "category": "hue", "type": "primary", "code": { "rgba": NULL, "hex": "#FF0" } }]

ReplaceText Configs:-

68572-replacetext.png

Search Value

("type": "NULL",|"color": "NULL",|"rgba": NULL,) //search for all possible NULL key/value pairs(| is used as OR)

Replacement Value

//replace the with empty

Character Set

UTF-8

Maximum Buffer Size

1 MB //needs to change the size if your flowfile is more than 1 MB

Replacement Strategy

Regex Replace

Output From Replace Text Processor:-

[ { "color": "black", "category": "hue",  "code": { "rgba": [255,255,255,1], "hex": "#000" } }, {  "category": "value", "code": { "rgba": [0,0,0,1], "hex": "#FFF" } }, { "color": "red", "category": "hue", "type": "primary", "code": {  "hex": "#FF0" } }]

All NULL values are replaced from the Flowfile Content.

(OR)

Method2:-

This way we are using ConvertRecord processor with JsonTreeReader as Record Reader and JsonSetWriter as Record Writer with the below property as

Suppress Null Values

Always Suppress //if the record having null value then we are not writing that into Output Flowfile.

Flow:-

68573-flow.png

GenerateFlowFile:-
This processor is used to generate sample records
Custom text property as

[ { "color": "black", "category": "hue", "type": "NULL", "code": { "rgba": [255,255,255,1], "hex": "#000" } }, { "color": "NULL", "category": "value", "code": { "rgba": [0,0,0,1], "hex": "#FFF" } }, { "color": "red", "category": "hue", "type": "primary", "code": { "rgba": NULL, "hex": "#FF0" } }]

add new property

schema.name

sch

ReplaceText:-

As we are going to use ConvertRecord processor and your input file having "NULL",NULL as values ,but these are not treated as null in Avro Schema Registry. so we are going to preparing json records which are going to understand by ConvertRecord Processor.

---

Search Value

(NULL|"NULL") //search for NULL (or) "NULL" in the flowfile content 

Replacement Value

null //replace with null

Maximum Buffer Size

1 MB

Replacement Strategy

Regex Replace

Evaluation Mode

Entire text

Output:-

[ { "color": "black", "category": "hue", "type": null, "code": { "rgba": [255,255,255,1], "hex": "#000" } }, { "color": null, "category": "value", "code": { "rgba": [0,0,0,1], "hex": "#FFF" } }, { "color": "red", "category": "hue", "type": "primary", "code": { "rgba": null, "hex": "#FF0" } }]

All the "NULL",NULL and replaced with null in the flowfile content.

ConvertRecord Processor:-

Record Reader/Writer controller services are configured and enable the controller service and add AvroSchemaRegistry as

we need to configure the avro schema with can match your input json record.

JsontreeReader Configs:-

68575-jsontreereader.png

ArrayAvroSchemaRegistry Configs:-

sch

{
  "type" : "record",
  "name" : "schema",
  "namespace" : "avroschema",
  "fields" : [ {
    "name" : "color",
    "type" : ["null","string"]
  }, {
    "name" : "category",
    "type" : "string"
  }, {
    "name" : "type",
    "type" : ["null","string"]
  }, {
    "name" : "code",
    "type" : {
      "type" : "record",
      "name" : "code",
      "fields" : [ {
        "name" : "rgba",
        "type" : { 
          "type" : "array",
          "items" : ["null","int"]
        } 
     }, {
        "name" : "hex",
        "type" : ["null","string"]
      }]
    }
  } ]
}

JsonSetWriter Configs:-

68574-jsonsetwriter.png

Change the Supress Null Values to Always Suppress(we are not writing the null values for the json key's in the record)

Output:-

[{"color":"black","category":"hue","code":{"rgba":[255,255,255,1],"hex":"#000"}},{"category":"value","code":{"rgba":[0,0,0,1],"hex":"#FFF"}},{"color":"red","category":"hue","type":"primary","code":{"rgba":[],"hex":"#FF0"}}]

all the null values in the json records are not written in the output flowfile but the only issues is with rgba array even we are having null values for the array still jsonsetwriter writer an empty array for rgba

ReplaceText:-

Now we are replacing the empty rgba empty array in this processor

Search Value

"rgba":[], //search for this value

Replacement Value

//same as method1 replacetext configs

Maximum Buffer Size

1 MB

Replacement Strategy

Literal Replace

Evaluation Mode

Entire text

Output:-

[{"color":"black","category":"hue","code":{"rgba":[255,255,255,1],"hex":"#000"}},{"category":"value","code":{"rgba":[0,0,0,1],"hex":"#FFF"}},{"color":"red","category":"hue","type":"primary","code":{"hex":"#FF0"}}]

I have attached my template below, save/upload the template and change as per your needs.

remove-nulls-187756.xml

-
If the Answer Addressed Your Question, Click on Accept button below to accept the answer, That would be great help to Community users to find solution quickly for these kind of issues.

View solution in original post

5 REPLIES 5

Explorer

Hi

I have following JSON flow file and trying to remove null fields in the file and store into elastic search. here we need to remove Null containing fields from any type integer or character

[ { "color": "black", "category": "hue", "type": "NULL", "code": { "rgba": [255,255,255,1], "hex": "#000" } }, { "color": "NULL", "category": "value", "code": { "rgba": [0,0,0,1], "hex": "#FFF" } }, { "color": "red", "category": "hue", "type": "primary", "code": { "rgba": NULL, "hex": "#FF0" } }]

Flow used ==> Getfile ==>replacetest==>putelasticsearchrecords

Explorer

Hi

I have following JSON flow file and trying to remove null fields in the file and store into elastic search. here we need to remove Null containing fields from any type integer or character

[ { "color": "black", "category": "hue", "type": "NULL", "code": { "rgba": [255,255,255,1], "hex": "#000" } }, { "color": "NULL", "category": "value", "code": { "rgba": [0,0,0,1], "hex": "#FFF" } }, { "color": "red", "category": "hue", "type": "primary", "code": { "rgba": NULL, "hex": "#FF0" } }]

Flow used ==> Getfile ==>replacetest==>putelasticsearchrecords

Desired Output

[ { "color": "black", "category": "hue", "code": { "rgba": [255,255,255,1], "hex": "#000" } }, { "category": "value", "code": { "rgba": [0,0,0,1], "hex": "#FFF" } }, { "color": "red", "category": "hue", "type": "primary", "code": { "hex": "#FF0" } }]

Super Guru
@Suresh Dendukuri

One way of doing is to use Replace text processor to search for "NULL" in json record key/value and replace with empty string(blank), by using this way we can replace all NULL key/value pairs from flowfile content.

Method1:-

Input:-

[ { "color": "black", "category": "hue", "type": "NULL", "code": { "rgba": [255,255,255,1], "hex": "#000" } }, { "color": "NULL", "category": "value", "code": { "rgba": [0,0,0,1], "hex": "#FFF" } }, { "color": "red", "category": "hue", "type": "primary", "code": { "rgba": NULL, "hex": "#FF0" } }]

ReplaceText Configs:-

68572-replacetext.png

Search Value

("type": "NULL",|"color": "NULL",|"rgba": NULL,) //search for all possible NULL key/value pairs(| is used as OR)

Replacement Value

//replace the with empty

Character Set

UTF-8

Maximum Buffer Size

1 MB //needs to change the size if your flowfile is more than 1 MB

Replacement Strategy

Regex Replace

Output From Replace Text Processor:-

[ { "color": "black", "category": "hue",  "code": { "rgba": [255,255,255,1], "hex": "#000" } }, {  "category": "value", "code": { "rgba": [0,0,0,1], "hex": "#FFF" } }, { "color": "red", "category": "hue", "type": "primary", "code": {  "hex": "#FF0" } }]

All NULL values are replaced from the Flowfile Content.

(OR)

Method2:-

This way we are using ConvertRecord processor with JsonTreeReader as Record Reader and JsonSetWriter as Record Writer with the below property as

Suppress Null Values

Always Suppress //if the record having null value then we are not writing that into Output Flowfile.

Flow:-

68573-flow.png

GenerateFlowFile:-
This processor is used to generate sample records
Custom text property as

[ { "color": "black", "category": "hue", "type": "NULL", "code": { "rgba": [255,255,255,1], "hex": "#000" } }, { "color": "NULL", "category": "value", "code": { "rgba": [0,0,0,1], "hex": "#FFF" } }, { "color": "red", "category": "hue", "type": "primary", "code": { "rgba": NULL, "hex": "#FF0" } }]

add new property

schema.name

sch

ReplaceText:-

As we are going to use ConvertRecord processor and your input file having "NULL",NULL as values ,but these are not treated as null in Avro Schema Registry. so we are going to preparing json records which are going to understand by ConvertRecord Processor.

---

Search Value

(NULL|"NULL") //search for NULL (or) "NULL" in the flowfile content 

Replacement Value

null //replace with null

Maximum Buffer Size

1 MB

Replacement Strategy

Regex Replace

Evaluation Mode

Entire text

Output:-

[ { "color": "black", "category": "hue", "type": null, "code": { "rgba": [255,255,255,1], "hex": "#000" } }, { "color": null, "category": "value", "code": { "rgba": [0,0,0,1], "hex": "#FFF" } }, { "color": "red", "category": "hue", "type": "primary", "code": { "rgba": null, "hex": "#FF0" } }]

All the "NULL",NULL and replaced with null in the flowfile content.

ConvertRecord Processor:-

Record Reader/Writer controller services are configured and enable the controller service and add AvroSchemaRegistry as

we need to configure the avro schema with can match your input json record.

JsontreeReader Configs:-

68575-jsontreereader.png

ArrayAvroSchemaRegistry Configs:-

sch

{
  "type" : "record",
  "name" : "schema",
  "namespace" : "avroschema",
  "fields" : [ {
    "name" : "color",
    "type" : ["null","string"]
  }, {
    "name" : "category",
    "type" : "string"
  }, {
    "name" : "type",
    "type" : ["null","string"]
  }, {
    "name" : "code",
    "type" : {
      "type" : "record",
      "name" : "code",
      "fields" : [ {
        "name" : "rgba",
        "type" : { 
          "type" : "array",
          "items" : ["null","int"]
        } 
     }, {
        "name" : "hex",
        "type" : ["null","string"]
      }]
    }
  } ]
}

JsonSetWriter Configs:-

68574-jsonsetwriter.png

Change the Supress Null Values to Always Suppress(we are not writing the null values for the json key's in the record)

Output:-

[{"color":"black","category":"hue","code":{"rgba":[255,255,255,1],"hex":"#000"}},{"category":"value","code":{"rgba":[0,0,0,1],"hex":"#FFF"}},{"color":"red","category":"hue","type":"primary","code":{"rgba":[],"hex":"#FF0"}}]

all the null values in the json records are not written in the output flowfile but the only issues is with rgba array even we are having null values for the array still jsonsetwriter writer an empty array for rgba

ReplaceText:-

Now we are replacing the empty rgba empty array in this processor

Search Value

"rgba":[], //search for this value

Replacement Value

//same as method1 replacetext configs

Maximum Buffer Size

1 MB

Replacement Strategy

Literal Replace

Evaluation Mode

Entire text

Output:-

[{"color":"black","category":"hue","code":{"rgba":[255,255,255,1],"hex":"#000"}},{"category":"value","code":{"rgba":[0,0,0,1],"hex":"#FFF"}},{"color":"red","category":"hue","type":"primary","code":{"hex":"#FF0"}}]

I have attached my template below, save/upload the template and change as per your needs.

remove-nulls-187756.xml

-
If the Answer Addressed Your Question, Click on Accept button below to accept the answer, That would be great help to Community users to find solution quickly for these kind of issues.

View solution in original post

Explorer

@shu , Thanks you very much for your detail explanation and options to solve the problem . Appreciate your effort . Thanks for your time and help

New Contributor

I have used jolt transformer to transform json to json.

convertAvroToJSON => JoltTransformerJSON => ....

Here is sample specification

[ { "operation": "default", "spec": { "*": "XXXXX" } }, { "operation": "shift", "spec": { "*": { "XXXXX": null, "*": { "@1": "&2" } } } } ]