Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

NIFI extract text from JSON

NIFI extract text from JSON

New Contributor

I'm using the NIFI ExtractText Processor and I'm trying to come up with the regular expression to extract a Header and its value from a JSON String. Sample JSON Data is:

  1. {"_id": "5b42fe8f7f663540330b3bdc","index": 0,"guid": "60358c95-e50c-4f5e-ad48-00d9e1f9a849","isActive": true,"balance": "$2,483.56","picture": "http://placehold.it/32x32","age": 32,"eyeColor": "green"}

I want to extract only the Header (eyecolor) and its value (green), ID field(ID) and its value (5b42fe8f7f663540330b3bdc). What would be a valid regular expression to get this value?

Thanks!

2 REPLIES 2

Re: NIFI extract text from JSON

Super Guru
@Suhas Reddy

You can use either Extract Text processor (or) EvaluateJsonPath processor to extract the values and keep them as attributes.

Extract Text Configs:

80434-extract-text.png

Add new dynamic properties in ExtractText processor

eyecolor

eyeColor":\s"(.*)"

id

"_id":\s"(.*?)",

EvaluateJsonPath Configs:

80435-evaljson.png

Add new dynamic properties as

eyecolor

$.eyeColor

id

$._id

Both of these configs results same output flowfile with eyecolor,id attributes associated with it

80436-output.png

if you want to keep both attribute values in one attribute then use UpdateAttribute processor and create new attribute like

80437-updateattr.png

Output flowfile will have attr attribute with both id and eyecolor values with : as seperator

80438-output-ff-attributes.png

Re: NIFI extract text from JSON

Super Guru

@Suhas Reddy

Use Replace Text processor to change the contents of flowfile.

79433-replacetext.png

Search Value

("_id":.*?,).*("eyeColor":.*")

Replacement Value

$1$2

Character Set

UTF-8

Maximum Buffer Size

1 MB //change the value according to flowfile size

Replacement Strategy

Regex Replace

Evaluation Mode

Entire text

Input:

{"_id": "5b42fe8f7f663540330b3bdc","index": 0,"guid": "60358c95-e50c-4f5e-ad48-00d9e1f9a849","isActive": true,"balance": "$2,483.56","picture": "http://placehold.it/32x32","age": 32,"eyeColor": "green"}

Output:

{"_id": "5b42fe8f7f663540330b3bdc","eyeColor": "green"}

If your required output flowfile is

{"eyeColor": "green","_id": "5b42fe8f7f663540330b3bdc"}

then change the replace text configs to

Search Value

("_id":.*?),.*("eyeColor":.*")

Replacement Value

$2,$1

In addition to achieve the same case by using record oriented processors(Convert Record (or) Update Record) if you know the schema of your json message then use ConvertRecord/UpdateRecord processor to read your incoming data with jsontree reader controller service and configure the only required fields(eyecolor,_id) in JsonRecordSetwriter so that your output flowfile will going to have only the configured fields and this processor works with array of json messages

Refer to these links regarding convert record and update record processors.

-

If the Answer helped to resolve your issue, Click on Accept button below to accept the answer.